bit Obscene Episode 1: What Developers Need To Know About Transactions

bit Obscene Episode 1: What Developers Need To Know About Transactions



In the inaugural episode, Joe Obbish and Erik Darling talk about how developers use and abuse transactions, and some surprises you might run into for them.

Links:

  • https://erikdarling.com/the-art-of-the-sql-server-stored-procedure-transactions/
  • https://erikdarling.com/batching-modification-queries-is-great-until/
  • https://www.youtube.com/watch?v=CuZSoZb8ziE
  • https://michaeljswart.com/2017/07/sql-server-upsert-patterns-and-antipatterns/

Video Summary

In this video, I delve into the world of SQL Server transactions and share insights on what developers need to know about them. Joining me is my special guest co-host, Joe Obish, who brings a unique perspective from his time in the Wisconsin State Penitentiary library. Together, we tackle common misconceptions and bad habits that can lead to data loss or corruption. We explore scenarios where simple delete-insert patterns can fail, emphasizing the importance of using transactions to protect your data integrity. By understanding these potential pitfalls, developers can write more robust code and appreciate the value of transactions as a safeguard in an unpredictable database environment. Whether you’re just starting out or looking to refine your TSQL skills, this video offers valuable lessons on handling uncertainties and ensuring data reliability.

Full Transcript

Hello and welcome to the very first recording of our podcast, probably yet to be titled. And today, my special temporary guest co-host is Joe Obish. And Joe Obish, if you would like to introduce yourself a little bit, so people who may have forgotten who you are and the break you’ve taken from publicly berating SQL Server, why don’t you tell the nice folks why they should pay attention to you? Well, as we said before, this is part of my public rehabilitation. I even, you know, dusted off my, my most impressive SQL Server shirt, right? You see the speaker here? Yeah.

I started SQL Server work in 2011. I remember when I used to think that wasn’t very long ago. You know, now I’ve got like gray hair and everything. It’s terrible. I’ve done a lot of great performance tuning. I’ve worked on servers as small as four cores and as large as 108. I’ve done some columnstore ETL work, some OLTP.

Don’t undersell that columnstore ETL work, you bozo. That’s some of your most impressive stuff. I know. So, the best presentation I ever gave and will ever give is on that subject. It’s way more impressive than what Eric did. Eric and I are going to do here. So, you know, if you make it to the end and think, you know, that was okay, but I want something even better. It’s on YouTube.

Eric and I’ll put, I’ll put the link in the show notes. Okay. Cool. So that people don’t have to go figure out how to spell your name and then figure out what you were talking about on YouTube. Okay. We’ll make, we’ll make that easy on people. Okay.

What, are you done? I mean, yeah. I don’t know. Did I forget something important? I don’t know. You tell me. It’s your, it’s your bio. You can, you can say what, you can keep going, say whatever you want.

No. No. No. All right. Well, I forgot to mention Joe is live from the Wisconsin State Penitentiary today. He’s in the library right now.

My name’s Erik Darling. I am a consultant extraordinaire. I do all SQL Server performance tuning work. And I have a blog and a YouTube channel. And those links will be in the show notes too, just in case anyone who finds me was unaware of the fact that I have, I have those things. So that’s fun today. We’re going to be talking about what developers need to know about transactions, because apparently developers really don’t know anything about transactions.

That’s been my experience that many developers when, when we’re, when I’m working with clients are using transactions and more like abusing transactions. And today we’re hoping to share some of our professional wisdom about transactions so that hopefully the developing, the developer community at large can start using transactions and stop abusing transactions. So Joe, take it away. And I’m just here for color commentary. Joe’s going to provide all the, all the audio genius for this one. So let’s, let’s go for it.

Joe, you’re going to share that screen or you’re using a nice way of, you know, saying that Eric’s making me do most of the work. Yeah. As usual. So specializing, you know, there’s, there’s not knowing a lot about transactions. And then there’s like, like knowing the wrong thing or like having bad habits.

Like I’ve worked at companies where the things developers, you know, they, I think they know they’d be better off. Knowing nothing. The primary relationship that developers had with the transaction log was it’s important to write as little data as possible to transaction log.

That was their like overriding consideration for how they wrote the code. And, you know, to be fair, I mean, there is some value in that, right? If, if you can do the same amount of work in a way that writes less data, that’ll be good for performance.

But this was taken to an extreme, you know, even to the point where in a big, you know, complicated ETL process, they would always batch their transactions. Both into the final destination of the recording database, but even on the staging database, which was, you know, a simple recovery model. It’s supposed to be a temporary workspace, but they were, you know, they’re batching those transactions because.

They were fighting against the transaction log. They weren’t working with it. They viewed as their enemy.

They, they, they wanted to give it as little data as possible. And that caused a lot of problems. Performance is bad. Scalability was bad.

Sometimes they would try to add new features to the ETL and they just couldn’t. They had some process. I forget what it’s called.

It was like some fixed process where, you know, if the ETL process for a table was, was canceled midway, they would then have a bunch of code that would try to undo the partial progress that was made because, you know, they, they, they weren’t using transactions in the way that they should. They were fighting against the whole concept. And I’m sure you can imagine how complicated and horrible that procedure was.

I sure can. So, so where, so in, in, in cases like that, where, you know, they just wanted everything to roll back, what, what, what prevented them from using something like set exact abort on so that they got, they got a more, they got a better rollback scenario from things. It was because, you know, to take the, one of the, well, it was because they would do something like, you know, say you’re loading a million rows into this table.

They’d load like a hundred thousand at a time, each in their own separate transaction. Oh. So it’d be something like, you know, if they load half a million, but then the customer cancels it or something, they would then have a fixed procedure that would run, which would, which would delete those 500,000 rows that they had inserted.

Oh. So yeah, like it was really something. That is, that is charmless for being honest.

I ended up. So that among other things. Makes me want to say that, you know, to, to any developers out there who, who are, who are listening or watching transactions are your friends.

They’re not something to fight against. They’re there for your benefits. Um, you know, it’s part of the way that the database can serve you.

Um, it, you know, it’s a way to protect the data and make sure that it is there. It’s a way for you to get the results you’re expecting. And it’s a way for DBAs and customers and your boss to bug you less.

Cause you know, if you write code and your code loads data wrong, well, that can be a big problem. So, you know, if you take, if you take nothing else away from this, I would, you know, leave with the sentence of transactions are your friends. Um, you know, and you know, like Eric has a lot more friends than me.

So, you know, like I’m kind of reduced to thinking of, you know, SQL Server transactions as, as being my friends, but, um, that’s, that’s, that’s, that’s just how it is. That’s, that’s why Joe names all those transactions. Oh yeah.

That actually is a thing. That is actually a thing. We’re, I, uh, making an executive decision that is not important enough to cover here, but you know, maybe in part two. All right.

So, all right, I’m going to try to share exhibit a, yeah, let’s do it. And. All right.

Can you see, uh, some. I see. We’re starting to share. We got some purple dots. Oh no. You know, it worked so well when you weren’t recording.

It really did. All right. Yeah. Let’s try it again. Um, maybe, maybe you’re, are you, are you connected to a VPN or something? No, I’m not connected to VPN.

Is it working now? There you go. Look at that. Worked instantly. All right. Great. Um, so Eric, this is going to be used in the quiz. So you need to pay real close attention to this.

Um, so let’s say, you know, your, your, your test or a civil procedure that either creates new record. If there isn’t one or updates an existing record and you, you know, you’re knowledgeable enough to know that that kind of pattern is called an upsert.

You, uh, do some searching on Google. You find yourself reading some confusing. Blog posts about like what the right upsert pattern you should use is. You know, it uses all kinds of things.

You don’t know if you’re not really big on reading in the first place. So you think, okay, whatever, I’m not going to do updates. I’m just going to delete the record if it’s there and then insert it. I’ll solve all my problems.

Now this code is kind of dangerous where if the procedure fails for whatever reason after the delete, but before the insert, you, you’ve then lost data. Mm-hmm.

So Eric, it’s a time for your quiz. Time for my quiz. Oh boy. What are the reasons you can think of as to why this procedure could fail after the delete, but before the insert?

Uh, I would say maybe if there is a trigger or a foreign key or something else that, uh, that would cause a conflict with the, with the delete. All right.

I actually didn’t have those. So you’ve already passed. Uh, can you think of any other reasons why? Any other reasons why? Um, let’s see why, why the delete would fail and cause the insert. The, the, the, the delete would succeed, but the answer would fail.

Uh, well, uh, we’re deleting an ID. So my, like, you know, the first thing that would come to mind would be some sort of constraint violation, but you know, uh, I guess, you know, if you’re already deleting, like, so I’m assuming that ID is the primary key of the table.

So if you’d already deleted that and then you went to insert the same value, then it wouldn’t be a primary key violation. Uh, yeah, I can’t think of, I can’t think of any quick reasons why the insert would fail. Maybe a deadlock, maybe, uh, maybe, uh, I mean, I’m going, I’m going back to triggers and foreign keys, but that’s, that’s, that’s, that’s about what I can come up with.

All right. Well, you have passed. Congratulations.

So, you know, just to go over the scenario again, it’s, you know, if, if, if we’re trying to update an existing record and the delete succeeds, but the inserts fails, we’ve then just wiped out that old record. So we, we’ve lost data. So that’s why this code pattern is dangerous. Now in terms of, you know, like I’m a pessimistic person when it comes to writing TSQL, I try to view it defensively, you know, like assume the procedure can fail at any point for any reason. And if it does make sure that the data you expect is still in the database.

So the things that I thought of were, um, you know, you could just have a hardware failure or a power outage. It’s a bit severe, but sure. I mean, Hey, I mean, you know, these, these things happen, right.

Janitor’s mopping knocks the power cord out. Yeah. No, I mean, I mean, Hey, it’s, I mean, you know, these, these, these, these, these things happen, maybe not all the time, but they happen. That’s true. And you wouldn’t want to lose your valuable data in table one. Yeah. Right. That would be a death blow for the business.

Um, maybe some DBA doesn’t like you and he, uh, kills the process. Um, sure. Long running or worse. They have, they have an automated script that looks for blocking and says, if there’s any blocking whatsoever that kill that speed.

I mean, and I’m, I’m, I’m sure that, uh, someone’s done that. Um, of course you could have a lock wait for the insert and you hit a client, uh, timeout. Sure. Um, someone could be patching windows or SQL Server, or, you know, maybe, maybe the trans or maybe 10 TB fills up and they just restart SQL Server all together to, you know, fix it. Cause that’s how you fix that. Yeah, totally. Oh, a failover.

Yeah. Uh, your transaction log could fill up, you know, um, if you make it all the way to the end, then hopefully, you know, developers watching won’t be causing those issues anymore. Um, yes.

There could just be some bug in, in a SQL Server that that makes it fail. Sure. Like you have a non yielding scheduler or just some, you know, just some horrible error. Um, yeah.

So I think, I think, I think what you’re telling people is that sometimes when they’re, they’re writing code, they need to think out something. So I think that’s what, what’s inside the code is what could go wrong with, with, with stuff, with stuff that’s happening in there. There is a greater world out there.

Which I think is what tripped you off. Cause it seems like you were focusing on just the code. Yeah. Yeah. No, no, no, no, no. I was, I was very focused on stuff that’s going on in the database, but sure. You’re, you’re absolutely right.

I was, I was in the developer bubble. So it’s my, is that the right word? My, um, I think so. Yeah.

Sounds great. I mean, those, those, those, those, those aren’t the kinds of words that we should be using. Um, and you know, like I’m sure all the developers who are, you know, dedicated and, uh, ambitious enough to watch this, they all write perfect code, but you know, someone could modify your code later. Or someone could modify the, uh, table schema.

Like, sure. Like what if, like, what if someone shrinks the size of that, uh, fire car column? Yeah. You know, whenever, whenever there’s a problem with my code, like to say, it’s not, it’s my, it’s perfect code in an imperfect world. So, you know, we just have to set the deal with the imperfections around us.

So, you know, even the most innocent looking at procedures, like you can fail for all kinds of reasons. Sure. At any time, at any moment.

And. Just like in life. Transacts. And transactions, you know, they can help you. All right.

So this is our improved code. Now, admittedly, this is not code that you should be using as a template. There’s still lots of things to complain about. Especially the format. You know.

Um, that aside, you know, at a bare minimum, I’ve added a transaction. So now assuming like, you know, reasonable configuration, like if the delete succeeds, the insert fails, that delete gets rolled back. And now you’re not just losing data because of, you know, some horrible gremlin in the database made your code fail.

So this is, this is why I say that transactions are your friends, you know, like they protect you from losing data. You know, there are a lot of bad things that can happen at any time in a database. Um, you know, and transactions can help you.

They can protect you and they can make your job easier. Sure. Um, now, of course, you know, if you’re doing this for real, you should look at Aaron, uh, Bertrand’s absurd. There goes your rehabilitation.

There goes your rehabilitation. Just, uh, use that. Aaron Booth Baronski. Hey, look, I’m like the president. I’ve had a stutter my whole life and I’m, uh, doing my best, you know, even if you have a stutter, you can still, you know, fly to such great heights as being a speaker.

Speaking at past summer 2019. That’s right. You got it.

The, the, the, uh, last good past summer. That was a great. It was a great one. Yeah. You were there.

I was there. I was there. We, we hung out. It was wonderful. All right. So Eric has passed the quiz. Um, the whole quiz.

Yeah. Like I said, you, you, you only had to get one. Oh, okay. Well, you know, I do like to set a little bar. Um, so switching gears here. Oh boy.

You know, sometimes you might find yourself running a stir procedure in production, you know, as part of troubleshooting. Mm hmm. Um, or really just anywhere else. Um, or really just anywhere else.

If, if that procedure fails. Your transaction might not roll back. And you can find yourself in a place where, you know, you have an open transaction without realizing it. And if your database isn’t using accelerated database recovery.

That can effectively cause a production downtime. Because while your transactions open, you know, if you’ve modified any data, you can’t have log reuse. You know, the transaction log can’t wrap around and roll back and rewrite over your changes.

Mm hmm. So, you know, like, I mean, I, I’ve definitely experienced this. I’m sure Eric has too, where, you know, someone has, someone has open transaction. They don’t know about a, a, a, a few hours pass and suddenly all kinds of code are, are, is, is just failing across the board.

Yep. Um, you know, the only, the only thing I can really say here is, you know, if your code fails, make sure you don’t have an open transaction. Um, how would one do that, Joe?

What are some techniques? They could close the SMS tab and it’ll, it’ll, it’ll give you a warning about, oh, you have an open transaction. So you want to close it.

You know, you could just type rollback transaction. Sure. If you have one to close it, if not, it fails, but there’s a, no big deal. Um, that’s what I would come up with. Or, but most, most, well, I mean, most developers don’t, or rather most applications don’t function by developers running queries for people in SSMS.

So, uh, I would. I’m talking about a, a, a, a, a, a. Oh, like an ad hoc scenario where.

Yeah. Someone’s doing something. Sure. Yeah. Uh, I mean, if you’re gonna, if you, I mean, so I, I think that if you’re going to use a transaction, uh, the things that you need to do to prepare that transaction for success are, uh, to use set exact abort on. So that, that ensures that everything rolls back, uh, not just the, the thing that failed.

Uh, I, and I also think that you need, you need to, uh, have a catch block on your code that if there’s an error, you can, you can, so that you, when, when, if there’s an error, I mean, not saying that errors are inevitable. If there’s an error, uh, you can make sure that, uh, you, you, you have a rollback command in that catch block and that you also have a good record of what error occurred, uh, by, by using some of the built-in error functions. Uh, and, uh, I think that, uh, you know, the, the throw command is mighty helpful.

Uh, you know, if you can throw, throw a raise error that maybe gives you some additional, so like if your code is using a loop or if your code is, uh, you know, setting any, uh, variables or using any parameters that you can, you can use, uh, raise error to sort of give you which, uh, which, like, part of the loop or like which, uh, parameter or variable values were part of the error. I think there’s a lot of stuff that, uh, people don’t put into code because, uh, it’s a lot of extra typing, but that all that extra typing can, can really, can really save your hide when, uh, you do have, uh, unexpected transaction failures. I think you’re absolutely right.

That’s why you’re here. The encounter with is though, you know, oh, well, but I have a thousand procedures and none of them do that. And, you know, I couldn’t possibly update them to do it all, or I’m calling some horrible third party procedure that I can’t modify. And, you know, they, they haven’t seen Euro wisdom, you know, they haven’t seen the light.

So they don’t know to do those things. Uh, what do you think about setting transact abort on as like a default connection setting? Uh, I think, you know, I, I wouldn’t be, I wouldn’t, I wouldn’t say no to it.

Uh, I wouldn’t say no to it out of hand. Uh, it might be overkill for some procedures, but, you know, for the, for the, even for the procedures that it’s overkill for, like, who cares? Like, like, like, I’d, I’d rather have it there to safeguard the ones that it’s not overkill for than to, you know, uh, have it, have it not on for ones that, you know, like, if it’s just a select query in a procedure, whatever.

Right. Like, or if it’s just like a reporting procedure, like, uh, then whatever, like who, who, this, if it, if like, what, like inserts to it, like the temp tables or whatever are going to roll back. I don’t care like, like for, for the, like, I view it as a good general guardrail for an application.

Uh, and you know, you can, you can make a choice in certain procedures to turn it off if you, if you really want to get crazy. But, you know, I think, I think that it would be a good general guardrail for, for most applications that I see, especially ones driven by third party vendors who are generally clueless as to how to work with SQL Server. If you had it on at the connection level, would you still add it at the top of every new procedure you’re making?

Uh, I don’t know. Uh, that might be a whole lot more work. I mean, it’s sort of like, sort of like if you think about, uh, you know, the difference between, so like, let’s, I mean, let’s say you had RCSI and snapshot turned on.

I wouldn’t go turn snapshot on for procedure unless I absolutely need it. Like, unless something specific about that procedure needed, like the specific behavior of snapshot isolation. Like I wouldn’t, I wouldn’t go and put that at the start of every procedure as well.

So like, you know, uh, I can’t think of a great reason to, to go into every procedure if you’re setting it on at the connection level and also set it on there. But, you know, maybe, maybe you’ve seen something interesting that I haven’t. No, I mean, I was just, you know, I find it a little bit odd that it seems like most of the online discussions about this are you should add to every procedure.

And very rarely do people say, oh, we’ll just turn it on at the, uh, connection level and then you’re done. You don’t have to do anything. Um, you know, like to your point, like technically you could get some bad behavior.

I think if you have, if you have an application already going, like you might have some badine error, like, you know, someone has some, someone tries to like close a cursor twice. Cause they just have something copy and paste or something. Sure.

And, you know, now that would be unable to execute. But I think in general, like, you know, it’s hard to imagine many scenarios where you want that setting turned off. I did do that once.

You, uh, should I, uh, talk about that? Yeah, go for it. Talk about that. So you can, you can talk about why you turned it off, turned it off once. And I’ll tell you, uh, how I use it. So you go ahead.

So, so this is the, uh, transact or what’s it, what’s the actual word? It’s kind of hard to say it’s a transact abort. Yeah. Exactly.

So, so for somehow, somehow X, somehow trans got turned into X. So it’s exact abort. Maybe it’s execution abortion. I don’t know. I’m pretty sure it’s exact abort. Cause you know, I mean, you wouldn’t want to, we wouldn’t have to type out trans every time.

Yeah. Right. That’s just too much. Exact abort. Um, so, you know, this is one of those things where, you know, I was reading, uh, a very lengthy and complete blog post about transactions by a, uh, famous European whose name I actually never known. Is it Erlen Somerskog?

Yeah. Yeah. That’s not, it’s not that hard. Well, it’s not that hard because you’ve, you’ve been there a bunch of times. You’re a very well-cultured person. Well, also, also it’s just, it’s just letters. Somerskog.

I mean, all words are just letters. Fairly easy to figure out at this point in life. All right. Well, like, like, I always have restraints. I have you here to pronounce the European names, so I don’t have to. Okay.

Um, so I, I think I ended up doing a thing that he explicitly warned against, but you know, it’s, I haven’t gotten fired yet. So, you know, it’s true. Um, so my scenario was I, I had a DML trigger and if that DML trigger failed for some reason, I didn’t want to roll back the, you know, insert or update. Sure.

You know, like, like we viewed it as better for the business to very occasionally not have that trigger fire compared to making the trigger failure cause the data not being inserted. Sure. So the way I did that was, uh, it was, it was horrible.

Um, I explicitly turned off exact whatever it’s called. Yeah. Exact. I used the transaction save points. Okay.

I, that I then checked if it was possible to roll back to the save point cause in some cases you just can’t. Mm-hmm. So, you know, for a certain small class of errors, if that trigger fails, we just roll back to the save point and we don’t cause the parent insert or update to fail. And it’s reasonable.

It, I don’t know about that, but it seems to work. From my point of view on this, you were tasked to do a somewhat unreasonable thing in SQL Server and you had a reasonable reaction to that set of unreasonableness. Sometimes when you have to do unreasonable things, they require unreasonable techniques.

So, uh, which, which actually like cancels out like two negative numbers. It makes the unreasonable technique reasonable. That’s a, that’s a very generous view to say, but it’s actually a great segue into my next one, which is bashing transactions in the database.

Okay. Um, I like had like a fire extinguisher analogy that I, I like really want to use, but it’s, it’s not quite perfect. I’m going to go for it anyway.

That’s fine. I don’t care. You know, a few things in life are perfect. I feel like batching transactions is often like using a fire extinguisher, you know, like, like it solves the immediate problem. But most of the time you might have a mess.

Yeah. To, uh, to a cleanup later. Um, I think that it’s easy to fall prey into the wrong mindset. Like, you know, you’re a developer, you’re writing your code, you don’t want people to bother you.

Some evil DBA says you’ve met developers before. Messages you and says, Hey, you know, you’re a code, fill up the transaction log. You, uh, you, you need to fix it.

You, you, uh, Google it. You find many other developers who have suffered in the same way you have. Yeah. And, you know, there are some smart people out there. It’s, oh, you know, instead of doing one big transaction, you can do a bunch of little transactions.

And then they just make that change. Yeah. And I feel like in some cases that’s perfectly correct. There’s no downsides, but I also feel like it gets into the minds of, you know, transactions are my enemy.

I need to work around the, uh, the, uh, the, uh, transaction log. So like, I’m a little bit concerned about the whole thing with that respect. Now, in the blog posts of yours that I’ve read, I remember you always saying something like, you know, it’s important to make sure that you are using transactions when you need to.

Like if, if a set of, you know, demo statements, either all needs to commit or no needs to commit, you need to use transactions. So in other words, you know, like don’t use batching when it isn’t correct from a business perspective to do so. Sure.

Well, I mean, my, my, my point of view on batching is that, uh, like there are times for several reasons when it is absolutely necessary to use, uh, any sort of, you know, archival process that you don’t have a partition table for that. You can quickly switch stuff out, uh, any sort of data migration process, uh, you likewise probably want to, want to batch. And, uh, anytime that like, so like there, there are times when I’m performance tuning queries that do modifications.

And even if a lot of rows aren’t being deleted in a, like by the, by the, you know, by the, whatever, uh, you know, update, delete, uh, because of the way the database is implemented, um, the, the slowest part of the plan is the part where you actually physically change the data. So it could be over indexing. It could be, uh, you know, foreign keys.

It could be triggers. It could be any number of things. It could just be like, you know, you’re on, you’re in like an Azure managed thing with like, but hardware or like unfair caps on the transaction log that makes throughput to it really slow. You know, there’s like, there’s like lots of reasons why like the modification is slow, right?

You could just have your like slow ass transaction log. Uh, so there are a lot of reasons why that could be slow, where I might even take, like, I might even take something that’s only changing like 2 million rows and batch that up into like, you know, 500,000 row chunks because like the actual data change part of the, of the, of the plan is the slowest part. So it sounds like you’re getting back into your, you know, you’re asked to do an unreasonable thing, like update millions of rows on, um, low end hardware as we like to call it.

So you got to respond. That wasn’t the analog. So, um, I try to always, I think the way that I would summarize it is if you’re going to batch your DML statements, make sure it’s okay.

If only some of the batches happen, because as we discussed earlier, you know, you’re coping to fail for all kinds of reasons. Sure. Some of which aren’t your faults at all.

Sure. And like, I’ve had, I’m sure there’s scenarios where it, you know, it would be okay for some of them to happen. Some of them that’s happened. Like there are some really great use cases for batching. Um, ones that I came up with are, you know, you’re deleting old data from an archive from a log that no one could be looking at.

Yep. Or you’re, you’re creating a new column that the application isn’t querying yet. You know, there, there’s no reason to load it all at once.

Mm-hmm. Or like backfilling the column, you mean. Yeah, yeah, exactly. Or, you know, you’re, uh, you, uh, have a, you, uh, have a downtime, you have a maintenance window. There are no end users during that wonderful brief period of time, you know, do all the batches you want.

Or, you know, like if you have to, you’re maybe doing some horribly complex thing on some, uh, local temp tables. I mean, you know, if batching is the way to do it, then go ahead and batch it because, you know, no one can see that data. Yep.

So, you know, all those scenarios share one thing in common, which is there’s no end user who can see the data while it’s being processed. Sure. I think if that’s not true, you need to really, you should think critically, you know, like, is it okay for them to just see like half the data if they just still happen to query at, you know, an unlucky time? You mean in the, in the unlikely event that the code isn’t full of no lock hints anyway, Joe?

They could just see whatever? Uh, yeah. See, it gets weird out there. So, I mean, again, this is, this is sort of the value of optimistic isolation levels where, you know, these things come into play a little bit.

I think it’s, I think though, I mean, I’m not a lacking expert, but even if people are using your recommended and favorite isolation level, um, RCSI, like, aren’t you kind of creating an invalid snapshot to a degree where, you know, the snapshot they might get is we’ve batched half the data as opposed to all or nothing. So, I do, I do feel like batching can be a little bit defeating even with RCSI on too. Sure.

Uh, you know, I mean, you, you, I mean, you’re, you’re right. If, if, if, you know, if it’s a situation where like you just like, we’re like, you know, if, if a user sees like the first part of your batch, but not the second three parts of your, like the next part looks like three parts of your batch or whatever, then, uh, yeah, that, that could, that could potentially be awkward. But at the same time, I think one of the values of batching is that, uh, if the user hits refresh enough times, they’ll, they’ll, they’ll eventually see the right data, which is.

Yeah. Yeah. Eventually might be wrong time, but yeah, no, it’s like, you know, I, I think patching at this place, I have seen a few blog posts that don’t even talk about, you know.

Alternatives. Well, like, you know, the problem you can run to, which is then you just use part of the data. Right.

Maybe that’s a really big problem. Right. Well, I mean, I, I, I have one. It isn’t always, but you know, it could be. Yeah. But like, look, generally the, the stuff that I write that does batches, uh, does batching for, uh, at least very specific sets of data where it doesn’t matter if a, if, if a user were to see like a, like a partial thing.

But I also do have a post about, uh, how like batching modifications can be great until, you know, one of the batches fails and then you would need to undo everything you did before. And in that post, I go over using like the output clause with your modification query to like sort of save off the changes. So you can undo those changes.

Uh, if one of the batches fails and you want to like go back and unbatch things. So I am, I am, I am sympathetic to your, to your point that, you know, batching stuff that could cause incorrect data would be bad, but. And, you know, I usually don’t mess.

I usually don’t do. I usually don’t do. Listen, in an ideal world where we can think of transactions as our friends who are there to help us, you know, that entire problem goes away. Now, as we’ve alluded to, we’re often not an ideal world, but, you know, um, like one of the, like, uh, and this is a topic for another day.

Cause we definitely run out of time, but you know, when I was talking about that ETL story from before about how they’re batching even staging great. Yeah. And, you know, we ended up rerunning the entire application and one of the driving principles was, you know, like let’s use transactions in a way that’s helpful for us and for the data.

And like a lot of the. The years developers have, Oh, we’re going to fill up the transaction. You know, like we were updating like billions of rows and in some cases we would batch by partition, but that was it.

Like there wasn’t any cheating. Right. And that, you know, I don’t think there like was a case where the customer transaction like filled up.

Yeah. Well, I mean, we, we, not every, not every transaction log filling up scenario is the fault of a developer though. Like some people are just woefully unprepared for like the workload that their SQL Server is about to do.

They might have, you know, they could have, they could have like a hundred gig transaction log drive for like a 10 terabyte database. And that’s not going to go well under most circumstances. Or like, you know, they could be one of those foolish people who like sets, like sets like a max data size on their transaction log.

And that’s not the developers fault. Like, you know, like developers have to be free to work with data and the way that data is best, best worked with. And, you know, for whatever scenario the app of the application presents.

So I, I like, you know, not every, not every, you know, transaction log full error is like some developers idiot mistakes. Sometimes, you know, like there is, there are infrastructure issues or there are, you know, settings issues with, with the database that, that caused that. They are not like the developers shouldn’t have had to worry about.

You’re absolutely right. And that’s why you’re here. We had to, we had a little bit of work to do in that area too, because there were some scenarios, you know, for the detail thing, like customer had terabytes of data. Well, you know, like we would tell them, Oh, you know, you can have a transaction log max size of 20 gigabytes.

Yeah. So as part of our grand compromise, we like bumped up a little bit. Yeah. And, you know, like, it’s really, you know, not that big of a deal. Um, it reminds me of, you know, this wasn’t exactly a transaction log, but in a very important developer database, we would do an ETL and there is some comically small drive and it would, it would like always fill up.

Yeah. Cause the process to fail. Right.

We’re basically having like, um, daily failures. And it was like some 48 by drive. It just wasn’t big enough. Yeah. And I was, you know, trying to work with our it department who was very frugal. And they, uh, weren’t cooperating, you know, um, I, uh, hit my like five year tenure mark.

And I go, you hit five years, you, you, uh, get a gift, you know, would you like this bowl or this, like, um, this, this like leather portfolio thing. A bowl. I know a bowl.

And, uh, I just ignored it cause I didn’t want either. And I think my manager’s like, Oh, you know, you need to pick your gift. What do you want?

And my answer was, I, I, you know, I would, I would like to go to best buy, buy a hundred gigabyte hard drive, plug it in the server. And then that way, you know, we can, we can not have developer production go down every single day. And shortly after that, it finally cooperated and made the drive a little bigger.

Yeah. And the thing that the thing stopped failing. Yeah. So 41 gigs, you know, yeah. Maybe the things were unrelated, but you know, it’s, uh, something that can happen.

Um, yeah, no, totally. Uh, you know, uh, you know, infrastructure can be just as much to blame for, uh, things as developers can. And, you know, well, well, developers are certainly, uh, prone to doing strange things in the database.

Uh, there are, there, there are just as many, uh, you know, uh, either untrained DBAs or sysadmins posing as DBAs who, uh, can do just as many goofy and restrictive and, uh, harmful and offensive things to SQL Server. So, you know, it was, let’s not just, not just blame developers for this. Some developer out there is like, you know, I did everything right, but you know, this, this two gigabyte log transaction log is just not helping me.

You think there are people out there with, uh, counterfeit, uh, past summit speaker shirts? Uh, I mean, counterfeit in the sense that they probably, like they bought them from a goodwill store. Yeah.

I mean, maybe, yeah, sure. I mean, you know, not to be grim, but let’s face it. If you, if you were to die tomorrow, that shirt would end up in goodwill. Someone out there would be like, cool golf polo. I’m gonna have to.

Uh, update my will to make sure that doesn’t happen. Well, I don’t know who you’re going to give that to. I don’t think you have any friends who could fit in that. Are you saying, um, what’s that even mean?

I mean that, you know, you’re a, you’re a particularly slender man. Uh, not anymore. Uh, you know, you could, you could leave it to anime. Not anymore.

And the, the, uh, shirt still fits. All right. It’s stretchy. It’s a forgiving shirt. I think, I think most speakers have to be forgiving. Like when I was giving a company t-shirts at conferences, uh, like, like the, there would be like, you know, everything would be gone, but there’d be like a pile of extra smalls. All right.

All right. I don’t want us to get canceled on our first podcast. So I’m sharing some, uh, T SQL here. All right. Uh, so it’s possible to insert.

There is results of a story procedure into a table. I think everyone didn’t, doesn’t know about that. Um, it certainly has its perils.

Um, you, uh, can’t nest it, you know? So like, in other words, you, you couldn’t have the story procedure also do an insert into exact. So that just feels in there.

Um, in some cases you, uh, don’t have a choice. So for example, if you want to get the list of active trace flags, I think this is the correct way to do it. Yeah.

Right. Like there’s some DMV or anything. Right. You have to insert the DBCC’s trace stats results and do somewhere. Um, one thing to know is doing this will cause the entirety of that server procedure to be within a single transaction, which I always thought made sense. But by that, by that you mean the store procedure inside the exec, not the calling store procedure, the one inside the.

Yeah. Yeah. Yeah, exactly. Um, so this is something to keep in mind where, you know, you’re not like, you know, I suppose I should have had a, uh, another picture here to kind of illustrate it visually.

Um, you could just draw some arrows on this one. Um, you know, you, you’re effectively forcing. In this case, DBCC trace tests, which doesn’t matter to be in a, in a transaction.

So, you know, that could lead to. Unexpected behavior. So, you know, sometimes you have to use this like for. Trace flags or some third party procedure.

Um, sometimes I’ll use it for cases where. I need to use dynamic SQL to insert a new attempt table. I don’t want there to be like, you know, a million cash plans, like, like one for every session I did. Sure.

Um, I don’t, I don’t think that trap is always worth it. Um, but sometimes I’ll do it. Uh, do you think I need to explain that better? You want to explain that better? Which part?

Like, like, like why someone would. If someone has to load and do a local town to use a dynamic SQL. Why using incident to exact would be helpful for a plan caching point of view? Uh, are you, are you talking about like the, the, like the multiple cache temp tables thing?

Yeah. Yeah. Uh, that might, that might be a little bit much for this talk. We can, we can, we can, we can, we can do a whole thing about dynamic SQL and temp tables.

We can, we have, let’s not, let’s not, let’s not try to cram everything at once here. I mean, we, we, we still have some time left. All right.

Well, um, in any case, uh, sometimes you find yourself doing this and, you know, you should keep in mind that this will effectively force a transaction. Read my next.

Well, I, so for that, for that screenshot in particular, there, there, there are a couple of things that I would warn about and none of them is the table variable in this case. Uh, I will say though, that, uh, there are times when.

A table variable is in use like that. Where I will use exec with dynamic SQL so that I am not constrained by a serial execution plan for the select query inserting into the table variable, because sometimes those table variables get passed to store procedures as table value parameters.

And so like, I don’t want to like, like do the insert into a temp table, then do then go from the temp table to the, to a table variable to pass to the store procedure. Sometimes I’ll just hold onto the table variable, do the exec with whatever, and then, uh, do the exec with dynamic SQL and do the insert that way. So I get like, I get that awful looking, like, here’s your, here’s your select execution plan.

And here’s your parameter table scan plan. Uh, with this though, like, like the dangers that I would see for this is what if Microsoft decides to change DBCC trace status tomorrow. And all of a sudden you have to use like with table results to get back the tabular results instead of something else.

Or like, what if Microsoft adds a column to DBCC trace status and all of a sudden your, your insert starts failing. So I think one of the, like the two biggest perils aside from the transaction part for doing insert with exec is. If the store procedure definition underneath changes, but your table definition doesn’t change, all of a sudden you’ve got failures.

And, uh, yeah, that could be, it could be unfun for you. So with, with the, with the, with results that’s cause you can actually specify a subset of columns. Is that right?

No, no, like I’m saying with, well, with, oh, with results sets for a store procedure. Yeah. But who the hell is using those? All right. Well, I mean, I mean, like, I, I don’t think I have ever used a, I think I tried to use it once and it was a big failure. Well, then it was just like, eh, so.

Um, but to handle your snare where Microsoft improves DBCC trace status. I didn’t say improve. I said change. I’m, I’m, I’m, uh, I’m being nice about yourself.

I want to start my real estate. Oh, okay. Um, how would you write code that would protect yourself against that scenario? Uh, can you use the, the, the result set thing?

Like, does that do it? I’m sure you could, if you, uh, you could, as long as the store procedure, well, I mean, I guess would have to for this produces a single result set. Uh, you know, there’s some weird stuff about it that I remember from, I remember running into while I was using it that I don’t remember now.

Like, I just remember thinking, this is too weird for me. I don’t want to deal with it. Uh, that sounds familiar.

Yeah. And like, well, cause, but like, I was trying to use it with like, like some analysis store procedure to dump it into attempt table and like even returning a single result, the single result wasn’t predictable because the final results that was dynamic SQL. And depending on like version and addition and some like parameter things, you could get different columns back from the, like, I just remember it not working well for that, but I could be misremembering and going off on an unnecessary tangent.

To further add to my rehabilitation. Um, I do have to give Microsoft credit for reducing their breaking changes caused by version upgrades. Sure.

Cause I remember reading, I remember reading release notes where it’d be like, I don’t know what the last version was. Maybe it was SQL Server 2016 where they have like this very long list of like, here are all the things that, you know, changed that, that might bring. Right.

Right. You know, like re, like renaming DMV columns, all kinds of things. And I remember like, like one release, maybe 2017, that, that list was effectively like nothing. Yeah.

So it seemed like there was some kind of policy change where they stopped doing all those breaking changes kind of willy nilly. Right. They seem to get that under control. Um, now they keep deprecating big features, but you know, I, I think that is an area where they have improved.

So, so I consider every SQL Server feature deprecated until proven otherwise, because there, there are certain things that, uh, have been half implemented for a very long time that, uh, that are not deprecated, but just see no further development. Uh, you know, I think the, if I, if I, if I had to pick like the big three from that, it would be indexed views, partitioning and Hecaton. So like those features have all been around for, well, I mean, Hecaton about 10 years, partitioning forever, uh, index views forever.

And, uh, there has not been a lot of very active development and improvement on those features. And quite a while. So those features to me are, are deprecated.

You can still use them. They’re not, they’re not removed from the product, but they’re not undergoing any active development. So I don’t, I just, I consider every feature deprecated until something, until something happens with it. They’re, they’re in cold storage.

I’ll take a lot more time to unpack that on a different occasion. All right. That’s, I don’t, I never heard a philosophy of that before. Um, speaking of table variables, you remind me of something which I, which I don’t have in my notes here. That’s okay.

Table variables. If you have, if you have a scenario where you need to roll back a transaction, but you want to keep data from before it, you can use table variables. Yes. Tim tables will roll back.

I mean, yeah. Table variables and just like, like local variables too, I guess. Like, you know, I think those are the only ways to do that really. Um, so if you need to do some like fancy error logging or something, that could be a scenario.

I think it’s not very common. I have very, very used it myself. No, but it’s something that people always bring up when they talk about table variables as if it’s some saving grace. Oh man.

I don’t want to be one of those, you know, no, it’s a valid point. Like, no, it’s just a valid point. If you, if like, if you, if you were to use the contents of a table variable to drive some, uh, change, then it would be useful to have that survive an error so that you could undo that change. That’s totally valid.

I mean, it’s, it just makes me think of one of the first presentations I attended, which I remember asking the speaker a difficult question. He know the answer. Yeah.

And years later I realized, wait a minute. Like that guy wasn’t actually an expert. He was just some guy, you know, given the presentation as a SQL starter, which, you know, there’s nothing wrong with that. I mean, in fact, it’s, it’s great. We, you know, always need more.

I don’t know if there are SQL starters anymore, but you know, they’re called data Saturdays now. They’re a little bit less common than, you know, I mean, it’s a little bit less common than, you know, it’s a little bit more. Uh, SQL Saturdays were data Saturdays. Great data.

Um, but you know, not just, or not only SQL. I mean, in some cases, this is a little bit too, uh, pedestrian, I think. Um, where, you know, like, no, I’m not like you who wants to be like, you know, evergreen and useful to the community. Like, I like talking about unorthodox things or things you might not know about, or, you know, stuff like that.

Well, you know, there’s, there’s, there’s ever, so there’s evergreen, which is the stuff that people always need to know about. But people also need to know about the stuff that you’re the unorthodox stuff as well. They need to know about the surprises because, uh, you know, it doesn’t do you any good to produce or absorb evergreen content that leaves people in a naive state.

Like they, like there, there, there’s always going to be something in the, there always has to be something in the evergreen content that is unknown or surprising to someone at the end. Otherwise, you know, you’re just, otherwise you’re just reading the documentation. That’s, I’m really glad you said it because I think I have a few surprises in my final topic.

All right, let’s do it. Maybe, you know, it spends an extra time to you. So, um, so, you know, like, so the final thing I wanted to bring up is, you know, so the trans-doctoral song is your friend. You should work with it, but at the same time, there’s no need to log data that isn’t useful, that you don’t need to, you know?

Sure. Like why write the data twice when you can write once? Sure.

You know, back, back in the old days, people would talk about how important it was to get minimal logging. Um, you know, you remember, you remember the minimal logging. That white whale. Yeah. Um, now, you know, like, so like, what are some things that you can do?

Um. Yeah. You know, like one simple thing is if you’re writing temporary data, use a temp table. I’ve seen developers not use temp tables.

They, you know, just create a keyed, like, Yep. User table. And, you know, if that’s part of the availability group, all the data gets sent over to the secondary and then eventually deleted for no reason. Mm-hmm.

So, you know, like just use temp tables. Or if you’re lucky enough, you have an ETL application, you have a staging database. That’s staging database is simple recovery. Mm-hmm.

You know, use minimal logging. Or, you know, um, like, like avoid patterns where you’re inserting once, then updating like, like, like 20 times. Yep.

Yep. After it. Which I’m sure you’ve seen. Because, you know, it’s, you know, to, I don’t know how all this example works, but like, like, imagine you, you imagine you have to like write a report on everything you learned from this presentation. You know, you, you, if you, if you are right, or if you write your first draft, then print it on your printer, which you probably don’t have.

I do have a printer right here. Well, you know, you’re a professional, but. It’s got printer, scanner, and fax all along.

I think many. Oh, geez. Fax. I don’t have a phone line for that. It’s legit. I’m not. Um, you know, so then, like, if you, if you want to make ads to it, you wouldn’t make the ad then print it again. Right.

Then make an ad then print it again. And so on, like, you’re going to have this big stack of paper, which, uh, am I. Yeah. You guys get a little blurry. It’s all right. All right. Well, whatever. Um, no, but like, like, like this, it’s a good analogy because like, you wouldn’t print it out, like make the edits on the paper and then transcribe the edits to the document and then reprint it and follow that you would just.

Yeah. Like, like, like, like, admit them. It’s like re re proofread the document on the, on the computer and make your edits.

You know, or, you know, as some people say, you should, uh, you should get there right the first time. Right. It’s true.

Um, yeah. You know, I think you can argue that doing an insert then 20 updates is effectively printing it over and over again. Because every time you, you know, you update one column at a time. Mm-hmm.

No, I, I see, I see a lot of times when people do that because it would make like the insert logic very complicated. And like, you know, some of the, some of the results of like some of the updates have to happen as a result of what the values are after they get inserted or just some goofy thing where like, you know, there’s an insert into a temp table. And then there’s like, you know, 20 updates to the temp table to adjust certain things based on other stuff.

And it’s like, like, I, I, I understand why from a development point of view, that would lead to one very complex insert query that no one wants to write. And it’s simpler for them to like mentally grasp. Uh, well, your hand is very clear.

Yeah. That’s all right. There we go. We’ve made, we’ve, we’ve, we’ve achieved stasis. No more blurry witness protection, Joe. It’s back to back to convict Joe.

Uh, yeah. So like, I dunno, I, I, I get it from the point of view that complexity is hard and like writing one very big complex insert statement is prone to error. And, you know, uh, making a change to a complex thing is harder than making a change to like the 19th update that you do after the next.

Afterwards or something, but like, yeah, I mean like in general, you’re right. And too, like even to, even to a certain extent, I would rather do like 20 inserts than like one insert in 19 updates. Like I would, I would rather just continue to refine the data in new chunks rather than do 20 updates, because then I can validate each of those chunks a lot easier than I can.

20 updates. On the subject of non-updating updates, in other words, you’re updating a column value to be the same as before.

Some scenarios that won’t run anything in transaction law. Recently, though, I’ve had to do plenty of code fixes where I’m changing that code to fill drives with non-updating updates.

The reason I’m doing that is in production, we have our databases and availability group. My understanding is there’s no such thing as non-updating update for availability groups.

We’re having these processes where they’d run once an hour or whatever. If it wasn’t in AG, it would be fine. Because it was, we were always basically sending the entire table over every hour.

We actually saw big gains from a simple work clause where instead of sending the thing always to 1, make sure it’s not equal to 1 or not. It’s not an hour or whatever. That really helps.

Right. And there’s other reasons you can… I think if you have… Is it Snapshot?

Mm-hmm. If you have that on, you don’t get non-updating updates too? Or change tracking? Or some other things you can turn on that caused that too. Don’t remember that.

Which I don’t remember either. I mean, change data tracking and… Sorry, change tracking and change data capture might make that weird. I forget exactly which one and how.

I mean, it’s not important. You can look it up. You can look it up. You can blog about it. I don’t see a very accomplished New Zealand-based blogger.

I already wrote about it. But I don’t see the availability of your thing mentioned very often. And that is something to know.

Yeah, totally. So that’s one of those cases where it’s just a net benefit. The last, you know… The last one, I think, is even more niche.

Where we had an ECL. This was in the SageNateBase. You know, we’d have like a… Like, say, imagine a one terabyte SageNateBase.

And we had to create a bunch of times to transform the data. I actually had a very long-standing misconception where… I thought that if you’re selecting, you know, everything from one terabyte table, you would get lock escalation.

But you probably won’t. Because, you know, like, as the data is read, the locks are released. Right. Yeah, the locks don’t accumulate to the point where reads would escalate. So, man, I was just wrong in that for so many years.

But, you know, the point being that locks are written to the transaction lock. So, we actually saw a noticeable improvement in, you know, in CPU caught used by the transaction log writers. Just by adding, like, you know, with tab lock to our staging tables.

Because we weren’t logging all those locks anymore. Yep. There have been a few times recently where even with temp tables… Well, actually, I shouldn’t say even with temp tables.

I should say there have been a few times recently with updates where just adding page lock hints has improved things noticeably. Because SQL Server was attempting to start with row lock hints. Because we got a nice index seek to find our data.

But the locks would accumulate very quickly. There would be a lot of row locks. And then there would be attempts at lock escalation that would fail because of competing locks.

But, yeah, I’ve noticed that even just for some things using page lock hints has improved stuff a bunch. Because SQL Server, like, you can… Like, row lock hints are individual locks, right?

So, you can have lots of rows on a page. And all those individual row locks are there. But if you have… If you use page lock hints, you can just lock a whole bunch of pages. And, like, you can have, you know, many rows on a page.

So, locking pages was a better granularity for things than locking individual rows. So, there definitely are, you know… During your development journey, you may run into cases where you need to reduce the amount of data written international lock.

In some cases, you can just do that. And it just makes things better. All right. So, I’m going to wrap things up here. All right.

Eric said he didn’t want to go over an hour. He’s a busy man. He has to fix his fax machine, I think. Yeah, yeah. So, you know, we don’t want to run too long. Mom picked up the phone again while I was on AOL and ruined my download. So, to quickly summarize, transactions are your friends.

They make it easier for you to do your job. Your code can fail at any time, at any place. Even very simple code, which you think is perfect.

The failure might have nothing to do with the code you wrote. In some cases, building up big transactions into small transactions is the right thing to do. It can be an improvement.

But if it’s possible for end users or other processes to see the data in between batches, you should think critically and make sure that that’s actually okay. I was thinking about scenarios where, you know, as I said before, your code can always fail.

So, it is okay if half the batches occur. If you’re using insert into exec, remember that the exec part, the server procedure, will be in its own transaction. If you need to log data that doesn’t get rolled back by transaction, you can use table variables.

And, you know, in some cases, you may need to reduce the amount of data written to the log for performance reasons, especially if you’re using availability groups. And there are various things you’re doing that, you know, using a logging, temp tables, all the stuff we just talked about.

Anything you would like to add? No, that’s a wonderful summary, Joe. You did better than chat GPT.

Thank you. You’re welcome. All right. Well, thank you all for joining us and listening to me and Joe go on and on about transactions. I don’t know.

Are we going to record another one of these next week? Do you have another nevergreen topic to talk about or semi-green topic to talk about? I don’t want to be one of those guys who is like, yeah, you know, we’re going to do one like every week. And then, you know, we immediately start failing.

So, all right. Let’s keep it as a surprise. Surprise. What do you think? Right. Just like transaction errors, transaction logs ending up, filling up, these recordings will be a surprise to everyone, including us. We’re just going to ambush ourselves with that.

So, thank you for joining us. And remember to have those index rebuild jobs spayed and neutered. We’ll catch you in the next episode. And I think I have to hit, where is the record button here?

Where’s the stop button? Oh, there’s the stop button. We’ll see you next week. Martin coverage rational

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Indexing SQL Server Queries For Performance: Fixing Blocking and Deadlocking

Okay, Look


Before I go on about how adding an index can fix your blocking and deadlocking problems, let me warn you about how indexes can cause you problems.

Without fail, whenever I suggest adding indexes to clients, someone gets real worried about write speeds.

Like, the server can be burning down, spurting blood everywhere, and cursing our mothers in ancient Aramaic, but no, let’s stop and worry about write speeds going up by a few milliseconds.

You have to add quite a few indexes (20), and be modifying quite a few rows (millions) before the number of indexes really starts to hurt “write speeds”.

I haven’t seen a server whose biggest problem was write speeds (outside of Managed Instance and SQLDB) since spinning disks were measured in RPMs. The real problem I see many servers face from having “too many” indexes is increased locking.

The more indexes you add, the more you have to lock when modifications happen, even when you’re changing the same number of rows as you were before. You also increase your changes for lock escalation attempts.

Having a handsome young professional with reasonable rates (me) review your indexing is a good idea.

But you can end up with locking and deadlocking problems when you err in the opposite direction of “too many” indexes, especially if your modification queries don’t have good supporting indexes to help them find the data they wanna change.

It Started With A Scan


I never spent a summer at camp Scans-Are-Bad, but scans can tell us something important about modification queries.

In a very general sense, if the operators in your modification queries are acquiring data from the tables they need to modify via a scan, they’ll start by locking pages. If they start by seeking to rows in the table they need to modify, they’ll start by locking rows.

Once lock escalation thresholds are hit, they may attempt to lock the whole table. If the optimizer thinks your query needs to do a lot of work, it may use a parallel plan, which may increase the likelihood of lock escalation attempts.

Let’s say we have this query:

UPDATE
    p
SET
    p.Score += 1000
FROM dbo.Posts AS p
JOIN dbo.Users AS u
  ON u.Id = p.OwnerUserId
WHERE u.Reputation >= 800000
AND   p.Score < 1000;

And this is the query plan for it:

sql server query plan
we care a lot

We would care very much about the Posts table being scanned to acquire data, because the storage engine doesn’t have a very granular way to identify rows it’s going to modify.

We would care so very much, that we might add an index like this:

CREATE INDEX
    p
ON dbo.Posts
    (OwnerUserId, Score)
WITH
    (MAXDOP = 8, SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);

To get an execution plan that looks like this:

sql server query plan
line up

Now, this isn’t awesome, because we’re updating the Score column, and we need the Score column in our index to find the rows we care about, but if the query goes from taking 5 seconds to taking a couple hundred milliseconds, we’re going in the right direction.

There is reasonably sound advice to avoid indexing columns that change a lot, but if you need to find rows to modify in columns that change a lot, you may need to follow the also quite reasonably sound advice to make your queries faster so there’s less chance for them to become entangled.

The longer modification queries run for, the longer they have to interfere with other queries.

It Started With A Tran(saction)


I often see transactions (ab)used by developers. Here’s a lazy example:

BEGIN TRAN;
    UPDATE 
        b
    SET 
        b.UserId = 138
    FROM dbo.Badges AS b
    WHERE b.Date >= '2010-12-25'
    AND   b.Date <  '2010-12-26';

    /*Go do some other stuff for a while, I hear ACID is good.*/
COMMIT; /*No error handling or anything, just screw it all.*/

The query plan for this update will look simple, harmless, never did nothin’ to nobody. But while we’re doing all that ~other stuff~ it’s holding onto locks.

sql server query plan
waterfall

This isn’t always the end of the world.

Right now, the only index is the clustered primary key. Some queries will be able to finish immediately, as long as they’re located in they’re in the clustered primary key prior to the rows that are being locked.

For example, this query can finish immediately because the date value for its row is a touch before the pages we’re updating.

SELECT
    b.*
FROM dbo.Badges AS b
WHERE b.Id = 1305713;
sql server query plan
take the cannoli

But this query, and any queries that search for an Id value on a locked page, will be blocked.

SELECT
    b.*
FROM dbo.Badges AS b
WHERE b.Id = 1306701

That Id value is just outside the range of dates we’re modifying, but because the storage engine is locking pages and not rows, it has to protect those pages with locks.

sql server query plan
step on

With this index, both queries would be able to finish immediately, because the storage engine would know precisely which rows to go after, and a more granular locking arrangement (rows instead of pages) would be available.

CREATE INDEX
    woah_mama
ON dbo.Badges
    (Date)
WITH
    (MAXDOP = 8, SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);

In this example, we’re not updating the Date column, so we don’t have to worry about the sky falling on write speeds, or write speeds crying wolf, or a thousand other cautionary tales about overreaction.

NOLOCK, And Other Drugs


Let’s say we’re idiots unaware of how bad uncommitted reads are, and we decide to use NOLOCK hints to avoid our select queries being blocked.

SELECT
    b.*
FROM dbo.Badges AS b WITH(NOLOCK)
WHERE b.Id = 1306700;

While that transaction is open, and we’re still doing all those other highly ACIDic things to our database, this query will return the following results:

sql server query plan
jailbird

We can see the in-flight, uncommitted, change from the update.

Maybe we return this data to an end user, who goes and makes a really important, life-altering decision based on it.

And then maybe something in that transaction fails for some reason, and everything needs to roll back.

That really important, life-altering decision is now totally invalidated, and the end user’s life doom-spirals into sure oblivion.

And it’s all your fault.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

How Database Vendors Can Make It Easy For New Users To Get Started On Their Platform

New And Improved


There are a lot of databases out there. All of them, of course, better than their competitors.

Faster, easier, cloudier, smarter, shardier. Blah blah blah. Sounds great! But how do you get new users to try your product and make any reasonable assessment of it?

Not everyone comes to the table with a bunch of data they can jam right in and get partying with.

Worse, they might be on a platform where exporting their data to use on your platform (because, let’s face it, no two products have the same backup format) is a real and complete pain.

The number of hurdles that users face just getting to the point where they can make an assessment of your product are pretty huge.

Let’s talk about how you can make that easier.

Working Data Sets


Stack Overflow provides public access to their data sets, both as XML dumps, and a web interface.

I’m not saying that their relational data makes sense for every system out there, but it’s a big data set that’s good for analysis.

If you own a database product, you could:

  • Grab the dumps and turn them into a format that makes sense for your platform
  • Have a hosted playground for new users to run queries against interactively

This helps potential new customers get comfortable with the inevitable proprietary syntax, gauge query efficiency.

Microsoft, for all its silliness, gives SQL Server users a couple different sample databases to work off of. They even update them for new versions to show off all the new memes features they’ve added.

They even have a free developer edition of the product that you can install and run with pretty quickly. You don’t need this if your product is all cloud-based, but you get the idea.

Hands-down, the most annoying part of testing any database platform, is getting reasonable data to test against in there.

Portability


If you are an installer-based life form, and your database as a lot of settings that might matter for performance and reliability, or uses a specific OS, you should consider having a few different VM images available for download.

This lets you easily distribute a golden copy of an ideal environment for your product, with the OS, database, and data all packed together.

Oracle does this, and for the short time I had to experiment with some stuff on their platform, it was incredibly handy.

If you don’t want to go this route, because you don’t quite have Oracle money, being a fledgling database product, have a dedicated install and config page:

  • Recommended hardware
  • OS version
  • Database install steps
  • Any additional dependencies
  • Recommended database configurations
  • Where to get ample sample data to play with

While we’re talking about sample data, why not have a few different sizes of data? Not everyone wants to set up a 64 core, 2TB of RAM virtual machine just to mess with a petabyte set of time series data.

Have some small, medium, large, and extra large sets available for testing.

Sure, prospective clients might opt for small and medium, but the folks you want to evangelize your product are going to love you for having bigger data sets to show more complex problems and solutions.

If part of the sell for your product is how great data ingestion is, have data ready to ingest based on whatever format you excel at, even if it’s Excel files.

More likely it’s csv, parquet, json, or uh… something.

Visibility


A lot of folks are used to having more than a command line to interact with their database.

Postgres has pgAdmin, Oracle has SQL Developer, Microsoft has SQL Server Management Studio and Azure Data Studio, and there are many third party tools that can connect to a variety of platforms, too.

Writing large, complex queries in a CLI is a drag. It might be great for administration, and simple stuff, but correcting syntax errors in them is like threading needles blindfolded.

You may not want to build a whole bunch of tooling up front for developers to work in, but a lightweight browser-based tool with a “run” button can go a long way.

Take db<>fiddle as an example. You can’t do any database management with it, but you can pretty much fully interact with the database by sending queries in, the way a developer would write and test queries.

Nerdery


I love playing with other databases, but I do not love all the foreplay it takes just to get moving with one.

The more things are different about your platform — and those differences may be spectacular — the harder it is to lure away folks who are inexperienced with the stack you’ve chosen.

You might even have an amazing sales tech team who will come in and do a lot of the heavy lifting for prospective clients, but some companies out there want to do a test run before investing a lot of time and getting hourly check-ins from sales reps about how things are going and when the contract will get signed.

That also ignores one of the most powerful segments of any database community: the developers who will build content around your product, and go out in the world to blog, present, and develop training content.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Join me at DataTune in Nashville, March 8-9 2024

Spring Training


This March, I’ll be presenting my full day training session The Foundations Of SQL Server Performance Tuning.

All attendees will get free access for life to my SQL Server performance tuning training. That’s about 25 hours of great content.

Get your tickets here for this event, taking place Friday, March 8th-9th 2024 at Belmont University – Massey Center 1900 Belmont Blvd, Nashville, TN 37212

Here’s what I’ll be presenting:

The Foundations Of SQL Server Performance Tuning

Session Abstract:

Whether you want to be the next great query tuning wizard, or you just need to learn how to start solving tough business problems at work, you need a solid understanding of not only what makes things fast, but also what makes them slow.

I work with consulting clients worldwide fixing complex SQL Server performance problems. I want to teach you how to do the same thing using the same troubleshooting tools and techniques I do.

I’m going to crack open my bag of tricks and show you exactly how I find which queries to tune, indexes to add, and changes to make. In this day long session, you’re going to learn about hardware, query rewrites that work, effective index design patterns, and more.

Before you get to the cutting edge, you need to have a good foundation. I’m going to teach you how to find and fix performance problems with confidence.

Event Details:

Get your tickets here for this event!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Why Partially Fixing Key Lookups Doesn’t Work In SQL Server

Why Partially Fixing Key Lookups Doesn’t Work In SQL Server


Video Summary

In this video, I dive into why adding columns to an index to fix every key lookup might not always be the best approach. I share insights from my recent blog post on how fixing predicates in key lookups can sometimes be sufficient to avoid performance issues but highlight that there are scenarios where it’s not practical or efficient. Using the Stack Overflow database as a case study, I demonstrate through query plans and statistics histograms why adding every column to an index might introduce unnecessary overhead, especially for frequently updated tables with large string columns. This video aims to provide a deeper understanding of SQL Server’s costing mechanisms and encourage thoughtful index design rather than knee-jerk reactions.

Full Transcript

Erik Darling here with Darling Data. Still muddling through the holidays, if you can believe that. Anyway, in today’s video, it will hopefully be a nice brief video, I want to talk about why you can’t fix every key lookup just by adding some columns to the index, to an index. And this is, I do have a blog post fairly recently, I don’t actually, I’m not even sure right now if it’s, actually no, it did publish, about how fixing predicates in key lookups can be good enough to avoid some of the really slow crap that can happen with key lookups. However, not all key lookups have predicates in them. And there are some cases, where you might really be, like, gung ho about fixing the key lookup completely by adding, like, a bunch of columns from the select list to the include list of your nonclustered index. And that can have a lot of downsides for a lot of reasons. Right? Because now you, now, every time you modify the base table, you have a, I mean, inserts and deletes, for sure. But, you know, updates, if the columns that you’re now adding to the key of this index are in the include, now you have some additional considerations for when you, for when you update the table. All right? Take up more space on disk, more space in memory. If you are changing things frequently, then, you know, you have some additional transaction logging stuff to think about, some additional locking and blocking stuff to think about.

So, and, you know, and especially if you have columns in your table that are strings, particularly big strings, you know, as you approach the numerical max of nvarkar or varkar columns, either 4,000 or 8,000, or if you have max data types in your table, then all of a sudden you’re looking at potentially storing and having a lot of just really big indexes in your table. So, really big indexes if you start trying to fix key lookups that way. So, what I want to show you is the costing mechanism behind lookups and something that might even be surprising to you because it was certainly surprising to me when I learned about it or when I discovered it through vigorous testing, staring at query plans.

So, I’ve already got an index on my users table in the Stack Overflow database and I’ve updated the statistics to use a specific distribution. I mean, it’s not that I got it naturally once, but then like when I went to redo the demo, sometimes I would get different statistics and sometimes those different statistics would make the lookup versus non-lookup demo fail kind of miserably. So, in order to avoid that, I use a specific statistics histogram that I know gives me the outcomes that I want.

That’s this whole crazy thing here that makes for a really long scroll bar. This is the most interesting thought to ever go through my mind, which is nicely lined up with my ears. Anyway, so what I want to show you right now is how the reputations 23 and 51 are distributed in the users table.

So, if we run this and we look at the results, for reputation 23, there are 13,542 records. And for reputation 51, there are 13,116 records. Very close, right?

Almost suspiciously close in values. There’s reputation fraud going on there. Anyway, if I run this query to look for users with a reputation of 51, and remember 51 has 13,116 rows associated with it.

If I run this, we look at the query plan, and we quite naturally get a key lookup plan. Right here. 13,116.

Oh, look, that’s a good statistics histogram. We nailed that. That was correct. That was on the nose. And, of course, we have this missing index request, where SQL Server says, just go ahead and add every column to this index. All of them.

So, that includes, well, let’s make this a little bit easier for everyone to see all in one go here. If we look at what SQL Server wants in there, we have about me. That’s a bar car max.

We have location. We have website URL. We have display name. These are all string columns. They’re not the location, display name, website URL. Those aren’t maxes, but, you know, it’s a lot of work just to have, to avoid a key lookup.

And this query isn’t slow either, right? If we look at how long this thing takes, it’s 31 milliseconds. I don’t know if I need to maintain an index of that girth over a 31 millisecond query.

Maybe not the best thing in the world there. And if you have questions about why I have 1 equals select 1 at the end of these queries, you are free to visit my website.

Because the question that everyone asks every time they see a query of mine that uses that is, what’s the point of 1 equals select 1? And I have to say, what is the point of 1 equals select 1? What could it possibly be?

Could it be explained to you in a blog post on my site? Did you perhaps type, what’s the point of 1 equals select 1 into the wrong place? Did you mean to leave a comment or did you mean to type that into Google? Or Bing, DuckDuckGo, or whatever search engine wants to sponsor my next video.

That’ll be the one that I ask about. But when we run that same query for reputation equals 23, I’m going to run this two different ways.

I’m going to run this once. I apologize that it’s still highlighted. I know that looks a little hacky, but you’ve dealt with worse from me, so we’re just going to roll with it here. So this one, of course, is just a regular select and where.

And this query down here tells SQL Server that I want to use my nonclustered index instead. All right, so use the nonclustered index. Gosh darn it.

Now, if we look at the query plans for these, you know, this one takes 200 milliseconds about. I don’t know where my thing is. I mean, I don’t know where my cursor is.

Other things I’m acutely aware of. My books, my phone, where these lights are blinding me. Anyway, this query takes about 200 milliseconds.

The cardinality estimate on that is, again, spot on, right? Exact abundo. We’re not missing anything. The stats histogram that I created, beautiful. Doesn’t get much better than that.

Of course, because SQL Server’s query optimizer is so incredibly biased against random IO, it chooses to scan the clustered index rather than seek into our nonclustered index, dive right in, find a few rows, and do lookups to match them in the clustered index.

That’s this whole section here, right? We have our index seek, where I told SQL Server which index to use. And as usual for a key lookup, oh, that wasn’t what I wanted.

Let’s redraw this one. Let’s make sure we’re holding the control key. And let’s put a square here. And as usual with a key lookup, we have a nested loops join. So we take one row from over here, put it into the nested loops join, and then we go find it down here.

That’s what a key lookup is. It joins two indexes together. And unlike in the blog post that I had written recently about predicate lookups, if we look at the key lookup here, all we have is this big old output list.

And all we have in that green text up here that we already looked at is SQL Server asking for us to include every single column in the table in our nonclustered index.

Which again, we don’t really want to do. Now, I totally get it. Select star is a bad practice. You do select star queries, you’re asking for trouble, you’re a bad DBA or developer or whatever you fancy yourself, whatever title your company gave you so they can chronically underpay you.

That is what you are a bad version of. The thing is, when it comes to key lookups, they have the exact same cost regardless of if you’re getting all the columns or if you’re just getting one of the columns that is not part of your nonclustered index.

So coming back up here, this is the index that we created. The reputation column. And since the clustered primary key of the users table is on a column called ID, only that column is inherited by the nonclustered index.

No other column in the table is magically part of our nonclustered index definition. So the problem is that whether we get every column or we just get one column that’s not part of our nonclustered index, key lookup has the exact same per loop cost.

So if we come over to the query plans for these and we look at the key lookup, it has an estimated operator cost of 37.715 query bucks for the select star query and it has an estimated operator cost of 37.715 query bucks for the query that just asked for one additional column.

So technically speaking, to SQL Server, it makes no difference at all to the query optimizer whether you’re selecting one column that’s not the nonclustered index or you’re selecting all the columns that aren’t the nonclustered index.

Doesn’t care. Each loop through has the same fixed cost. So it doesn’t matter at all how many columns you’re selecting.

One column missing from your nonclustered index that’s required by the query accrues the exact same cost as 100 columns missing from your nonclustered index that are required by the query.

And that is absolutely fascinating. So let’s talk for a quick moment here. It would be really stupid, tremendously, monumentally, outrageously stupid if you had every nonclustered index include every column in the table or try to account for every column that you might someday maybe perhaps select from a query to do a thing with maybe.

Per chance, they say. We can sort of figure out why by asking ourselves a couple questions. First, what is the clustered index?

What is a clustered index in SQL Server? It’s every column in your table logically ordered by your clustered index key column or columns.

So higher table logically ordered by for the user’s table would be the ID column which for various reasons makes a pretty good clustered index. A nonclustered index is every column you may include in the clustered index.

All right? And then the key of the whatever columns you put in the key of the nonclustered index. All right? So we order all this stuff. We have the includes down here.

If you let’s say thinking about the user’s table let’s scroll back up a little bit and let’s just look at this select star query. So right now the clustered index is everything in here everything every column in the table logically ordered by this column.

If we were to make a if we were to follow the missing index recommendation that SQL Server wants and we were to put a nonclustered index keyed on reputation and include every other column in the table we would have every column in the table logically ordered by reputation which would just be like in everything but in everything but the name because you can’t have two clustered indexes on a table that nonclustered index would be almost like another clustered index wouldn’t it?

It would be every column in the table ordered by some column. You don’t need two clustered indexes. You don’t. Simply don’t. If you needed two clustered indexes Microsoft would allow you to create more than one clustered index.

So please don’t do that. Please don’t follow the missing index recommendations that include every single column in the table.

It’s often unnecessary. Please spend some time evaluating the speed and efficacy of your queries before you go creating super wide indexes to cover various things because you’re not going to be happy after you create like 10, 15, 20 of those suckers on your table and all of a sudden everything is blocking the deadlocks because every index is getting locked all the time.

Cool. All right. So it’s a snowy day here in New York City and I don’t know maybe I’ll go make some hot chocolate spike it with something extra spicy.

I hope you what is it enjoyed yourselves suppose I hope you learned something if you sat here for 15 minutes and didn’t learn anything well.

Oh maybe maybe I’ll do better in the next video. I hope you enjoyed yourselves. If you like this video at all in any way whatsoever even if you have to rewatch it with the mute button on or if you I don’t know have to have some nitrous before you watch it if you like it in any way shape or form feel free to give the thumbs up button down below a little taparoo.

If this is the sort of SQL Server content that you have any sort of appreciation for I do like getting new subscribers so that I can get more likes and so that I can annoy more people every time I publish a video with Hey Erik Darling did something did you know?

So yeah that’s that thank you for watching

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Indexing SQL Server Queries For Performance: Fixing A Bad Missing Index Request

Making Change


SQL Server’s missing index requests (and, by extension, automatic index management) are about 70/30 when it comes to being useful, and useful is the low number.

The number of times I’ve seen missing indexes implemented to little or no effect, or worse, disastrous effect… is about 70% of all the missing index requests I’ve seen implemented.

If they’re all you have to go on, be prepared to drop or disable them after reviewing server and index usage metrics.

Here’s what you’re way better off doing:

  • Find your slowest queries
  • See if there’s a missing index request
  • Run them, and get the actual execution plan
  • Look at operator times in the execution plan
  • Ask yourself if the index would benefit the slowest parts

Or, you can hire me to do all that. I don’t mind. Even the Maytag Man has an alarm clock.

Poor Performer


Let’s start with a query, and just the base tables with no nonclustered indexes added. Each table still has a clustered primary key on its Id column.

Initially, I thought showing the query plan in Row Mode over Batch Mode would make issues more clear, but row mode operator times are a real disaster.

sql server query plan
this does not add up.

They’re supposed to be cumulative going from right to left, but here we go from 9 to 4 to 10 to 27 to 22 to 41 to 32. Forget that. Batch Mode it is.

Anyway, here’s the query.

SELECT
    u.Id,
    u.DisplayName,
    TopQuestionScore = 
        MAX(CASE WHEN p.PostTypeId = 1 THEN p.Score ELSE 0 END),
    TopAnswerScore = 
        MAX(CASE WHEN p.PostTypeId = 2 THEN p.Score ELSE 0 END),
    TopCommentScore = 
        MAX(c.Score),
    TotalPosts = 
        COUNT_BIG(DISTINCT p.Id),
    TotalComments = 
        COUNT_BIG(DISTINCT c.Id)
FROM dbo.Users AS u
JOIN dbo.Posts AS p
  ON u.Id = p.OwnerUserId
JOIN dbo.Comments AS c
  ON u.Id = c.UserId
WHERE EXISTS
(
    SELECT
        1/0
    FROM dbo.Votes AS v
    WHERE v.PostId = p.Id
    AND   v.VoteTypeId IN (1, 2, 3)
)
AND  u.Reputation > 10000
AND  p.Score > 10
AND  c.Score > 0
GROUP BY 
    u.Id,
    u.DisplayName
ORDER BY
    TotalPosts DESC;

The goal is to get… Well, pretty much what the column names describe. A good column name goes a long way.

If you had free and unfettered access to these tables, what row store indexes would your druthers lead you to?

I’m limiting your imagination to row store here, because that’s what the missing index requests are limited to.

Underwhelm


The optimizer has decided two indexes, on the same table, would really help us out. There are two very clunky ways to see them both.

You can always see the first one in green text at the top of your query plan, when a missing index request exists.

You can look in the plan XML:

<MissingIndexes>
  <MissingIndexGroup Impact="20.3075">
    <MissingIndex Database="[StackOverflow2013]" Schema="[dbo]" Table="[Comments]">
      <ColumnGroup Usage="INEQUALITY">
        <Column Name="[Score]" ColumnId="4" />
      </ColumnGroup>
      <ColumnGroup Usage="INCLUDE">
        <Column Name="[UserId]" ColumnId="6" />
      </ColumnGroup>
    </MissingIndex>
  </MissingIndexGroup>
  <MissingIndexGroup Impact="20.7636">
    <MissingIndex Database="[StackOverflow2013]" Schema="[dbo]" Table="[Comments]">
      <ColumnGroup Usage="EQUALITY">
        <Column Name="[UserId]" ColumnId="6" />
      </ColumnGroup>
      <ColumnGroup Usage="INEQUALITY">
        <Column Name="[Score]" ColumnId="4" />
      </ColumnGroup>
    </MissingIndex>
  </MissingIndexGroup>
</MissingIndexes>

Or you can expand 75,000 nodes in SSMS:

sql server query plan
sigh

If you prefer something human readable, this is what they would translate to, with a little bit of hot sauce courtesy of yours truly.

CREATE INDEX 
    c
ON dbo.Comments 
    (Score)
INCLUDE
    (UserId)
WITH
    (MAXDOP = 8, SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);

CREATE INDEX 
    c2
ON dbo.Comments 
    (UserId, Score)
WITH
    (MAXDOP = 8, SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);

Big Reveal


With all that in mind, let’s look at the query plan before adding the indexes. We’re going to skip ahead a little bit in the bullet points above, to the last two:

  • Look at operator times in the execution plan
  • Ask yourself if the index would benefit the slowest parts

Here’s the plan, which takes ~10 seconds in total. The arrow is pointing at where the optimizer thinks a new index will help the most.

sql server query plan
wasted effort

Since the operators in this plan are mostly in Batch Mode, every operator is showing CPU time just for itself.

The exceptions are the Nested Loops join operator, which doesn’t currently have a Batch Mode implementation, despite Microsoft’s consistently shabby Cumulative Update notes saying they cause deadlocks, the scan and filter on the inner side of the Nested Loops join operator, and the compute scalar immediately following the Nested Loops join operator.

That entire portion of the plan is responsible for about half of the total execution time, but there’s no index recommendation there.

And look, I get it, missing index requests happen prior to query execution, while index matching is happening. The optimizer has no idea what might actually take a long time.

But if we’re looking at the only pre-execution metrics the optimizer has, you’d think the estimated costs alone would push it to ask for an index on the Posts table.

Perhaps missing index requests should be selected after query execution. After all, that’s when the engine knows how long everything actually took.

Generous Soul


Okay, so those two indexes on the Comments table up there? I added both of them.

The query plan changes, but it doesn’t get any faster.

sql server query plan

Once again, a missing index request is registered, but only one this time.

On the Votes table.

Not the Posts table.

CREATE INDEX 
    v
ON dbo.Votes 
    (VoteTypeId)
INCLUDE 
    (PostId)
WITH
    (MAXDOP = 8, SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);

Okay SQL Server, you got me. I’ll add it.

Impatience


With that index in place, what sort of totally awesome, fast query plan do we get?

sql server query plan
we don’t.

Every time we add an index, this query gets one second slower. Part of the problem, of course, is that the optimizer really likes the idea of joining Posts to Votes first.

All of the query plans we’ve looked at have ad a similar pattern, where Vote is on the outer side of a Nested Loops join, and Posts is on the inner side, correlated on the pre-existing clustered primary key on Posts.

But Posts has a much more important join to the Users table. If we were to make that more efficient, we could perhaps change the optimizer’s mind about join ordering.

And there’s no missing index request to tell us that. We have to use our damned eyes.

Maybe something like this.

CREATE INDEX
    p   
ON dbo.Posts
    (Score, OwnerUserId)
INCLUDE
    (PostTypeId)
WITH
    (MAXDOP = 8, SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);

Let’s give that a shot.

“Self-Tuning Database”


When vendors tell you about their self-tuning database systems, they’re lying to you.

Maybe Oracle isn’t. I don’t know.

But I’m so confident in this new index that I’m going to get rid of all the indexes that SQL Server has suggested so far.

They were bad. They made our query slower, and I don’t want them interfering with my awesome index.

sql server query plan
for a dollar

Now the query is twice as fast, at 5 seconds (down from the original 10 seconds). The two operators that take up the majority of the query execution time now are the Hashes; Inner Join and Aggregate.

They don’t spill, but they are likely ill-prepared for the number of rows that they have to deal with. One may infer that from the estimated vs. actual rows that each one sees.

HTDELETE


The primary wait type for the query is HTDELETE, which has had limited documenting.

SQL Server 2014 now uses one shared hash table instead of per-thread copy. This provides the benefit of significantly lowering the amount of memory required to persist the hash table but, as you can imagine, the multiple threads depending on that single copy of the hash table must synchronize with each other before, for example, deallocating the hash table. To do so, those threads wait on the HTDELETE (Hash Table DELETE) wait type.

My friend Forrest has helpfully animated it here.

I tried many different indexing schemes and combinations trying to get the terrible underestimate from the Comments table to not cause this, but nothing quite seemed to do it.

In cases where you run into this, you may need to use a temp table to partially pre-aggregate results, and then join to the troublesome table(s) using that data instead.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Indexing SQL Server Queries For Performance: Fixing An Eager Index Spool

Bugger


Probably the most fascinating thing about Eager Index Spools to me is how often the optimizer will insert them into execution plans, often to a query’s detriment.

In a sane world, a non-loop join plan would be chosen, a missing index request would be registered that matches whatever would have been spooled into an index, and we’d all have an easier time.

While I understand that all of the optimizer stuff around spools in general was written before storage hardware wasn’t crap, and 32bit software couldn’t see more than a few gigs of memory, I do find it odd that so little revision and correction has been applied.

Of course, there are use cases for everything. I was involved in a data warehouse tuning project where rewriting a query to corner the optimizer into using a nested loops join was necessary to build an Eager Index Spool. Maintaining a nonclustered index on the staging table made data loads horrible, but letting SQL Server build one at query runtime was a massive improvement over other options. All that had to be done was to rewrite a simple inner join to remove any direct equality predicates.

While the below queries don’t even come mildly close to reproducing the performance improvement I’m talking about above, it should give you some idea of how it was done.

/*How it started*/
SELECT
    p.Id,
    UpMod =
        SUM(CASE WHEN v.VoteTypeId = 2 THEN 1 ELSE 0 END),
    DownMod =
        SUM(CASE WHEN v.VoteTypeId = 3 THEN 1 ELSE 0 END),
    PostScore = 
        SUM(p.Score)
FROM dbo.Posts AS p
JOIN dbo.Votes AS v
  ON v.PostId = p.Id
WHERE p.Score > 1000
GROUP BY
    p.Id;

/*How it's going*/
SELECT
    p.Id,
    UpMod =
        SUM(CASE WHEN v.VoteTypeId = 2 THEN 1 ELSE 0 END),
    DownMod =
        SUM(CASE WHEN v.VoteTypeId = 3 THEN 1 ELSE 0 END),
    PostScore = 
        SUM(p.Score)
FROM dbo.Posts AS p
JOIN dbo.Votes AS v
  ON  v.PostId >= p.Id
  AND v.PostId <= p.Id
WHERE p.Score > 1000
GROUP BY
    p.Id;

With no equality predicate in the join clause of the second query, only a nested loops join is available. But again, this is the type of thing that you should really have to push the optimizer to do.

sql server query plan
spool me once

Of course, for the above queries, the second plan is a disaster, like most Eager Index Spool plans tend to be. The non-spool query with the hash join finishes in about 600ms, and the Eager Index Spool plan takes a full 1 minute and 37 seconds, with all of the time spent building the spool.

sql server query plan
spool me twice

So, like I’ve been saying, one should really have to go out of their way to have this type of plan chosen.

Matter Worse


Compounding the issue is that the optimizer will sometimes choose Eager Index Spool plans when they are entirely unnecessary, and indexes exist to fully support query requirements.

The below join doesn’t actually work, because it’s not how the tables are related, but it’s a good example of that I mean.

SELECT
    u.Id,
    u.DisplayName,
    p.*
INTO #p1
FROM dbo.Users AS u
OUTER APPLY
(
    SELECT
        Score = SUM(p.Score),
        AnswerCount = SUM(p.AnswerCount)
    FROM dbo.Posts AS p
    WHERE p.Id = u.Id
) AS p;

The Id column in both the Users table and Posts table is the clustered primary key. There’s no sensible reason for an index to be created at runtime, here.

Of course, the Posts table relates to the Users table via a column called OwnerUserId, but whatever.

The point is the resulting query plan.

sql server query plan
crappy

If we tell the optimizer that’s it’s being a dunce, we get a better, much faster, Eager Index Spool-free query plan.

SELECT
    u.Id,
    u.DisplayName,
    p.*
INTO #p2
FROM dbo.Users AS u
OUTER APPLY
(
    SELECT
        Score = SUM(p.Score),
        AnswerCount = SUM(p.AnswerCount)
    FROM dbo.Posts AS p WITH (FORCESEEK) /*I am different*/
    WHERE p.Id = u.Id
) AS p;

Sometimes this is the only way to solve spool problems.

sql server query plan
muscular

Option One: Adding An Index


In most cases, Eager Index Spools are just really irritating missing index requests.

Here’s an example of one. The query itself touches the Posts table three times. Once to find questions, a second time to find answers related to those questions, and a third time to make sure it’s the highest scoring question for the answer.

SELECT TOP (100)
    QuestionOwner = 
        (
            SELECT 
                u.DisplayName 
            FROM dbo.Users AS u 
            WHERE pq.OwnerUserId = u.Id
        ),
    QuestionScore = 
        pq.Score,
    QuestionTitle = 
        pq.Title,
    AnswerOwner = 
        (
            SELECT 
                u.DisplayName 
            FROM dbo.Users AS u 
            WHERE pa.OwnerUserId = u.Id
        ),
    AnswerScore = 
        pa.Score
FROM dbo.Posts AS pq
JOIN dbo.Posts AS pa
  ON pq.Id = pa.ParentId
WHERE pq.PostTypeId = 1
AND   pa.PostTypeId = 2
AND   pa.Score >
(
    SELECT
        MAX(ps.Score)
    FROM dbo.Posts AS ps
    WHERE ps.ParentId = pa.ParentId
    AND   ps.Id <> pa.Id
)
ORDER BY
    pa.Score DESC,
    pq.Score DESC;

Are there many different ways to write this query? Yes. Would they result in different query plans? Perhaps, perhaps not.

Right now, this query has this index available to it, along with the clustered primary key on Id.

CREATE INDEX
    p
ON dbo.Posts
    (PostTypeId, ParentId, OwnerUserId)
INCLUDE
    (Score)
WITH
    (MAXDOP = 8, SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);

Because we don’t have an index that leads with ParentId, or that allows us to easily seek to ParentId in the MAX subquery (more on that later, though), the optimizer decides to build one for us.

sql server query plan
el yuck

We can see what index the spool is building by looking at the tool tip. In general, you can interpret the seek predicate as what should be the key column(s), and what should be included by what’s in the output list.

There is sometimes some overlap here, but that’s okay. Just ignore any output columns that are already in the seek predicate. And of course, we can generally ignore any clustered index key column(s), since the nonclustered index will inherit those anyway.

sql server query plan
get it for less

Adding this index will get rid of the Eager Index Spool:

CREATE INDEX
    p2
ON dbo.Posts
    (ParentId, Score)
WITH
    (MAXDOP = 8, SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);

In this case, I’ve chosen to add the Score column to they key of the index to allow for an ordered aggregation (SUM function) to take place without a Sort operator.

sql server query plan
spool’s out for summer

Option Two: Over Communicating


Let’s take a step back. We currently have this index, that leads with PostTypeId.

CREATE INDEX
    p
ON dbo.Posts
    (PostTypeId, ParentId, OwnerUserId)
INCLUDE
    (Score)
WITH
    (MAXDOP = 8, SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);

The section of the query that generates the Eager Index Spool is this one:

(
    SELECT
        MAX(ps.Score)
    FROM dbo.Posts AS ps
    WHERE ps.ParentId = pa.ParentId
    AND   ps.Id <> pa.Id
)

What we know, that the optimizer doesn’t know, is that only rows with a PostTypeId of 2 are answers. We don’t need to compare answers to any other kind of post, because we don’t care about them here.

If we change the subquery to limit comparing answers to other answers, it would also allow us to take care of the existing index by locating the right type of Post, and give seekable access to the ParentId column.

(
    SELECT
        MAX(ps.Score)
    FROM dbo.Posts AS ps
    WHERE ps.ParentId = pa.ParentId
    AND   ps.PostTypeId = 2
    AND   ps.Id <> pa.Id
)

That changes the full query to this:

SELECT TOP (100)
    QuestionOwner = 
        (
            SELECT 
                u.DisplayName 
            FROM dbo.Users AS u 
            WHERE pq.OwnerUserId = u.Id
        ),
    QuestionScore = 
        pq.Score,
    QuestionTitle = 
        pq.Title,
    AnswerOwner = 
        (
            SELECT 
                u.DisplayName 
            FROM dbo.Users AS u 
            WHERE pa.OwnerUserId = u.Id
        ),
    AnswerScore = 
        pa.Score
FROM dbo.Posts AS pq
JOIN dbo.Posts AS pa
  ON pq.Id = pa.ParentId
WHERE pq.PostTypeId = 1
AND   pa.PostTypeId = 2
AND   pa.Score >
(
    SELECT
        MAX(ps.Score)
    FROM dbo.Posts AS ps
    WHERE ps.ParentId = pa.ParentId
    AND   ps.PostTypeId = 2 /* I am new and different and you should pay attention to me */
    AND   ps.Id <> pa.Id
)
ORDER BY
    pa.Score DESC,
    pq.Score DESC;

Which changes the section of the query plan that we’re concerned with to this:

sql server query plan
understanding

Sometimes the optimizer just needs a little but more information from you.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

A Little More About Isolation Levels In SQL Server

A Little More About Isolation Levels In SQL Server



In this video, I talk about some of the misguided expectations that Read Committed has associated with it, and query patterns to watch out for under Read Committed Snapshot Isolation.

If you’d like to see my full day performance tuning session, here are upcoming dates:

Video Summary

In this video, I dive into the often-overlooked world of SQL Server isolation levels, particularly focusing on read committed snapshot isolation (RCSI) and its alternatives. With a humorous twist, I highlight common misconceptions about isolation levels and demonstrate how they can lead to issues like race conditions under different scenarios. By walking through practical examples and real-world queries, I aim to clarify the nuances of these settings and help you make informed decisions for your database environments. Whether you’re dealing with complex query patterns or just looking to avoid potential headaches, this video offers valuable insights into optimizing SQL Server performance without overcomplicating your setup.

Full Transcript

Erik Darling here with Darling Data, a surrogate style of Beardgut Magazine. Long, beautiful relationship. Today I want to talk about isolation levels a little bit because something that keeps coming up, and the most annoying things about isolation levels keep coming up, things that I really loathe having to repeat or try to do. tell people about over and over again. And we’re gonna do that today. Because Friday, there’s nothing like a good Friday talk about isolation levels. So first, no isolation level is perfect for everything all the time. I wish that one was because then we could all just use that and we could stop having these conversations. But usually, SQL Server, where we’re going to go, we’re going to go to the next level of the world. So we’re going to go to the next level of the world. The things that I have to talk people out of is one using no lock hints, because that’s where you get bad data potentially under concurrency. And the other one is the switch between read committed, the default pessimistic isolation level in SQL Server, and read committed snapshot isolation, which is an optimistic isolation level in SQL Server. The reason why that’s the usual choice is because most people is because most people have to like, like, like, don’t want to like, like, like, you can use snapshot isolation, but then every query would have to ask for snapshot isolation in some way, either like, like, like, when it like, if it’s an application, when it connects to SQL Server, if it’s a stored procedure, you have to add it to the code, stuff like that. And that just, you know, unless you have a lot of control, unless you have a lot of time and patience, to figure out which queries you want to use an optimistic isolation level, then that’s kind of a tougher one to talk folks into. That also assumes that you have control over those things, right? Like some, if you have a third party application, you might not, your only hope might be to use read committed snapshot isolation, which kicks in for every read query that comes in and hits the database. So that’s the most common choice. The thing is, every time I talk about switching over, where someone will go to some due diligence, and they’ll read some blog posts where all they can talk, they think that there are bugs in SQL Server with read committed race conditions and things like that. And that’s really not the case. The case is that there are certain query patterns you have to watch out for where you might hit race conditions.

Now, I’m not saying that read committed snapshot isolation is perfect, because again, it’s not. But for servers that I look at, where there is a lot of bad blocking between readers and writers, and a lot of deadlocking between readers and writers, read committed snapshot isolation is perhaps the safest way to solve all of those problems in one go. You’ll still have write queries block one another, but the read queries fighting with write queries and write queries fighting with read queries goes away completely. So it is a great isolation level for most SQL Server workloads. And most mature database platforms out of the box use some form of multi version multi version concurrency control by default, right? Even Azure SQL DB uses it by default, because Microsoft probably like, like, hey, we’re going to release this product, we’re going to manage it. We don’t want people complaining about blocking all the time, because then they have to go solve blocking problems. And that’s not fun, because that’s query tuning, index tuning, stuff like that. So read committed, the pessimistic isolation level has a lot of problems that one might consider to be race conditions, if one were to be really concerned about application and query concurrency. So here’s the first example, and I’m going to figure out which way I have to turn. There we go. That should get us in the frame.

Or get get all the text in the frame rather. So if you have a query that just finished reading what used to be row C, but it’s now a ghost record because it was deleted. And your query just finished reading it and your query just finished reading it and has now moved on to row D, that row C will still show up in your query results because your query already grabbed it. It doesn’t disappear from the results. All right. And these slides are all from my foundations of SQL Server performance tuning class. I delivered it a pass. I’m delivering it a couple more times. Data tune in Nashville and up in Boston in May, as part for the New England SQL Server user group. If you’re in either of those areas, it would be a pleasure to see you.

The second reason why read committed isn’t really all that promising of an isolation level is let’s just pretend that again, we just finished reading row C, but then it gets updated. And now we have two row H’s. Our query will show one letter C and two H’s in the results. Right? That’s not great either. Right? That would seem like a race condition. That would seem like a bug. But that’s the way read committed works. Read committed the non-snapshot pessimistic isolation level. It takes very brief read locks on things, but data can change on either side of those locks whenever it wants. Right? Because those locks don’t hold on for very long. There’s no lock escalation with read queries, at least without hints or whatever. But all of this stuff is open to change as soon as those locks get released.

Another reason why it’s not great is because similar situation. Let’s say we’re reading row E. Row F gets updated to be another value C. So now we have two C’s over here. But all our query will see is one C and no F. Right? And those are, again, something that could very much be interpreted as a race condition in your queries if you are really concerned about concurrency. Now, this is stuff that read committed snapshot isolation fixes. But we have to talk about some other stuff first.

The first thing we have to talk about are some query patterns in under read committed the pessimistic isolation level that can also cause things that look like race conditions, but are really just, again, the lack of promise that read committed has as far as what data is going to return. So if you do something like this, right, you, you know, in your store procedure and your query, whatever it is, if you set some variable value equal to something based on a select, locks on that select, unless you add locking hints into transaction, don’t hold on once that query is done.

Actually, even like once it finds that row, like data, data in the table can change all over the place. So if you were to take this, like whatever this gets set to and use it to like, you know, insert into another row, use it to like find data that you’re going to update, that data could be completely irrelevant by the time your query gets to it.

Again, under concurrency, if you’re just running it in isolation, everything’s going to look great every time. But under concurrency, the data in there could change really quickly. Another pattern that could have similar effects as a race condition, big air quotes on that, is if you dump data into a temp table and then you use that temp table to go update things, because whatever’s in that temp table is maybe invalid by the time you go to do that update.

So these are things that a lot of folks don’t think about when it comes to matters of concurrency. And this is a lot of the lack of understanding about the promises that Reid committed the pessimistic isolation level makes are way overblown. A lot of people think that Reid committed the pessimistic isolation level behaves like snapshot isolation, like where your query takes a picture of the data and it’s perfect, or like serializable where nothing can change while your query’s reading the data, right?

Because everyone thinks that what it’s returning is this magnificent piece of data, but all it really promises is blocking and deadlocking with modification queries. And all it promises is that the data that it read was committed at the time that the read happened.

So remember, just like in the slides, modifications can happen all around it. The only thing that’s guaranteed is if you hit a lock, your query will wait for that lock to release before reading that. That’s the only real guarantee that Reid committed the pessimistic isolation level makes.

Now, the next thing I got to talk about is query patterns that might exhibit what seems like a race condition under an optimistic isolation level. But a lot of the times when you hit this, these are also things where there is some potential for these as well under a pessimistic isolation level. They’re just a little bit more rare.

They’re actually pretty rare under even an optimistic isolation level because you have to write really dumb queries for stuff like this to happen. So I’m in my database crap. At least I hope I am.

Home is where the crap is. And I’ve got this table called dinner plans. And I’m going to populate that table with, well, when I wrote this demo originally, there were a bunch of people who I thought I was going to have dinner with at pass.

It turned out the only person I had dinner with was Kendra. But that was nice anyway. It was a great, great, actually, it was like a few dinners with Kendra. She’s like the only person who would hang out with me.

Whatever. Not bad company, though. And so right now in that table, our dinner plans table, this is what things look like. You got a list of people.

And for some reason, seat number one is free. But the other five seats in the table are all taken. So we’re going to make sure that RCSI is off for our crap database. And we’re going to look at, I have these queries set up in the other two windows over here.

But we’re going to look at just real quick what the query does. So this is the kind of sort of dumb query pattern that would exhibit a race condition under RCSI. And it might not necessarily hit under a pessimistic isolation level, but could still happen if things got weird enough.

So what we’re doing is updating a table alias, our table dinner plans, which is aliased as DP. If you are not a very mature person, you want to make jokes, go ahead. But it’s dinner plans, nothing more.

So we’re updating our dinner plans table. And for some reason, rather than just doing a regular update, we’re doing this exists check through the base table to look for an ID where the seat is free. So this is where things get interesting because we have two references to the table.

We have one for the update and one for the select. Only the reference for the update will ever have the type of exclusive locks on it that will prevent a read query under a pessimistic isolation level from executing and getting data. The reference in the select portion, that can read whatever.

So if you have a no lock hint in there, you’re screwed. If you, you know, if in that inner query, you know, any of the stuff happens that we talked about, that like where data can change around where the reads under the pessimistic read committed isolation level happens, you could still hit what feels like a race condition. All right.

So let’s go look at what happens when we do this. So I’m going to say begin tran. I’m going to run that. And we’ve output this. And now I’m going to come over here and I’m going to run this.

And this is going to get blocked. Right. This query is now blocked because we have this update in a transaction, updating dinner plans. And this query wants to update dinner plans and read from dinner plans.

We come back over to this first window and we commit this. This query will come back and return no results. All right.

Because that other query blocked it, updated that row to find a free seat. This query did not find a free seat when it went to run. So let’s commit this now so that we don’t have anything weird going on.

Let’s make sure this is fully committed. Now let’s change the crap database to turn read committed snapshot isolation on. All right.

This takes a second to run. That’s okay. It’s worth it. Okay. So now that’s turned on. Good. If we repeat that same demo, right, we’re going to run this. Oh, you know what?

I didn’t reset things. Let me admit that. Totally forgot to reset the table between runs so that that didn’t fail. And the joys of remembering stuff.

All right. So now let’s run that. And see, this one finds this. All right. And now let’s run this. Now this is still going to get blocked because that update is still happening.

On the other window, right? This transaction is still not committed. But because of the way an optimistic isolation level works, when this query did its update, the last known good version of the row got sent to either if you’re not using accelerated database recovery, it gets sent to tempdb.

If you’re using accelerated database recovery, it gets sent to the version store, local to the user database you’re in. And now, because this query is going to read a versioned row that this query is updating, when we, right, this one found this seat, right, and we commit this, now we come over here, and what did we find?

We found a seat. We know we found a seat. We found a seat. Because this thing got updated to the reverse Eric down here. All right.

So now if we commit this, we’ll have a little bit of an awkward situation. Because forward Eric will think that he got a seat at the dinner table, but backwards Eric will have the golden ticket and say, I get to sit there.

You don’t get to sit there. And then forward Eric and reverse Eric will, like, matter and anti-matter fight. And, I don’t know, some sort of universe death probably will happen. So these are the kind of query patterns that can cause things that, again, look and feel like race conditions under RCSI that you wouldn’t necessarily hit under, read committed, the pessimistic isolation level.

But, again, the promises that read committed, the pessimistic isolation level make are really flimsy. So, under most circumstances, for most query workloads where people aren’t writing completely idiotic queries, or if you have no-lock hints everywhere anyway, you’re probably better off using an optimistic isolation level because there’s far less, like, room for error than there is when you’re using no-lock hints.

And there’s far less pain than if your read queries are blocking and deadlocking with your write queries all the time. Right?

So, like, this is most queries function better than any database using an optimistic isolation level. If you have queries that don’t, if you have queries that need to read the most up-to-date version of data, just keep in mind that those queries are going to be subject to blocking and deadlocking.

If you want to enable read committed snapshot isolation and you want to have certain queries, not use row versioning, there’s a perfectly good read committed lock hint you can add to those queries or other locking hints that would make sense for those queries.

But just like a direct update like this wouldn’t have the problems that we were looking at. The problem really is the subquery doing the select reads a version of the row that it looks like it wasn’t supposed to because that should have been taken.

But, you know, again, it takes pretty high concurrency for you to find these problems. And it takes pretty stupid looking queries for you to find these problems. Right?

So, in this case, you know, that self-join, completely unnecessary. If you write modification queries that do things like that, you kind of deserve what you get. That’s not a smart way to write queries.

But there are times when you would have to write a query sort of like that, like a different query pattern. Like in this one, it’s particularly stupid because it’s just one table that we want to update and there’s no reason to do a subquery to touch another table.

But if there were like a different table where like we had to like update from like, you know, some reservation list or like a guest list or like, you know, like a list of reservations where you could possibly like go, you know, go to different restaurants or whatever.

Those are circumstances when having a subquery would be necessary. But I just want to remind everyone that like those subqueries under read committed could also read some weird data.

Like data could change before or after and because this table is the only one. It’s not going to have any sort of exclusive locks taken in there or around it. This wherever the select is.

So if you join tables together to do updates and there’s a no lock hint on like one of the on the table that’s not getting updated or you’re just using read committed the pessimistic isolation level.

You can still see weird stuff that’ll look and feel like race conditions under a pessimistic isolation level. Well, actually specifically under read committed the pessimistic isolation level. Serializable and repeatable read offer way more guarantees.

They also offer way more blocking. So you have that to look forward to. Anyway, this was a little bit longer than I expected. And like I said, it’s Friday and got some stuff I got to go do.

So I’m going to go do that. I’m going to get this started uploading as usual. I hope you enjoyed yourselves.

I hope you learned something. I hope that if you are not currently using an optimistic isolation level and you have a lot of problems with locking and blocking, you’ll consider using an optimistic isolation level. I hope that if you are slathering your queries with no lock hints, some bizarre cargo culting about saying that’s a best practice that you’ll consider using an optimistic isolation level and removing those.

If you enjoyed the video, pretty pleased give it a like. If you enjoy this sort of technical SQL Server content, feel free to subscribe to my channel. I’m always happy to have new folks coming in and learning stuff.

And I don’t know. Gosh, I think that’s it. I can’t think of a single other thing to say. Happy Friday. Hope everyone has a great weekend.

Thank you for watching and I’ll see you in the next video. Have a good one.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Join me at DataTune in Nashville, March 8-9 2024

Spring Training


This March, I’ll be presenting my full day training session The Foundations Of SQL Server Performance Tuning.

All attendees will get free access for life to my SQL Server performance tuning training. That’s about 25 hours of great content.

Get your tickets here for this event, taking place Friday, March 8th-9th 2024 at Belmont University – Massey Center 1900 Belmont Blvd, Nashville, TN 37212

Here’s what I’ll be presenting:

The Foundations Of SQL Server Performance Tuning

Session Abstract:

Whether you want to be the next great query tuning wizard, or you just need to learn how to start solving tough business problems at work, you need a solid understanding of not only what makes things fast, but also what makes them slow.

I work with consulting clients worldwide fixing complex SQL Server performance problems. I want to teach you how to do the same thing using the same troubleshooting tools and techniques I do.

I’m going to crack open my bag of tricks and show you exactly how I find which queries to tune, indexes to add, and changes to make. In this day long session, you’re going to learn about hardware, query rewrites that work, effective index design patterns, and more.

Before you get to the cutting edge, you need to have a good foundation. I’m going to teach you how to find and fix performance problems with confidence.

Event Details:

Get your tickets here for this event!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Indexing SQL Server Queries For Performance: Fixing A Non-SARGable Predicate

Nausea


Okay, deep breath. Deep, deep breath.

Would you believe I still have to fix queries with a bunch of functions in joins and where clauses? Like every day? All day?

Where things get tough is when there’s some third party vendor app where code can’t be changed, but customers have become so fed up with performance that they’re willing to make other changes to help things along.

This isn’t a big fireworks demo. I could have spent a lot more time finding a worse scenario, and I’ve hinted it to exacerbate the issue a bit.

Sometimes my hardware is too good for bad demos, probably because it’s not in the cloud.

That whole “cloud” thing has some real problems.

Setup


Anyway, let’s say we have this index (because we do, I just created it).

CREATE INDEX
    p
ON dbo.Posts
    (CommunityOwnedDate, Score)
WITH
    (SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);
GO

Let’s further presume that we have this stored procedure.

This stored procedure could have been a query.

CREATE OR ALTER PROCEDURE
    dbo.FixNonSargable
(
    @CommunityOwnedDate datetime,
    @Score integer
)
AS
BEGIN
    SET NOCOUNT, XACT_ABORT ON;

    SELECT
        p.*
    FROM dbo.Posts AS p
    WHERE COALESCE(p.CommunityOwnedDate, '17530101') >= @CommunityOwnedDate
    AND   p.Score >= @Score
    ORDER BY
        p.Id DESC
    OPTION
    (
        USE HINT('QUERY_OPTIMIZER_COMPATIBILITY_LEVEL_140'), /*No batch mode*/
        MAXDOP 1 /*No parallelism*/
    );
END;
GO

When executed with such great gusto, it’s a bit slow.

EXEC dbo.FixNonSargable
    @CommunityOwnedDate = '20130101',
    @Score = 10;

At ~1.4 seconds, we’re very upset with performance. Throw the whole thing in the trash. Install DuckDB.

sql server query plan

Index Reversal


If we change the key column order of our index, so the column with a seekable predicate can go first, we can get faster query execution.

CREATE INDEX
    p
ON dbo.Posts
    (Score, CommunityOwnedDate)
WITH
    (SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE, DROP_EXISTING = ON);
GO

Now you don’t have to wait 6-never months for your vendor to fix their garbage code.

sql server query plan
i could be happy

In this case, changing the order of key columns was a more beneficial arrangement for this particular query.

Results may vary. Not query results! I mean like, if you go do this in your database. On “your” hardware. In the “cloud”.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.