Query Archives | Page 29 of 141

All About SQL Server Stored Procedures: Data Types

Posted on March 17, 2025January 22, 2026 by Erik Darling

All About SQL Server Stored Procedures: Data Types

Video Summary

In this video, I delve into the intricacies of data types in SQL Server stored procedures, a topic that often causes headaches for many database administrators and developers. I share my experiences from real-world scenarios where improper data type handling has led to performance issues and other complications. By discussing common pitfalls such as mismatched date and datetime types, and exploring more nuanced problems like max string types and ORM data typing, I aim to help you avoid these traps in your own projects. Additionally, I cover how certain data types can lead to unexpected results and the importance of using temp tables for better query performance when dealing with dynamic inputs. Whether you’re a seasoned DBA or just starting out, this video offers valuable insights into optimizing stored procedures and maintaining clean, efficient code.

Full Transcript

Erik Darling here with Darling Data. In this video we’re going to continue our rhapsody on store procedures. And in this one we’re going to talk a bit about how the data, just data types, okay? Data types. That’s what we’re going to talk about in this one. Because I see a lot of people screw a lot of stuff up with these and as the person who comes in well after these screw ups have been in place and occurring for them, for a very long time, who has to clean them up. It’s a terribly unpleasant task. Sometimes it is a little bit easier because sometimes there will just be a problem with whatever parameter was being passed to the stored procedure. Other times there’s a problem with the underlying table data types. And at this point the tables have grown quite large and, you know, sure you have options like creating computed columns that do data type conversions for you, but you still really do need to index those computed columns. So even if you like, you know, get the sort of lock free added non persisted computed column thing to your table that, you know, does your, you know, try cast, try convert, try parse, whatever to your column to a different data type to a more correct data type, you still incur that sort of, you know, locking, well, potentially locking if you’re on, if you’re on standard edition, you’re definitely locking, but in either case size of data index creation operation to create an index that now helps your query use that computed column in a efficient way. So we’re going to talk about some of this stuff today.

Today. And just some of the some of the ways that you can get around the more more basic common problems with incorrectly typed stuff in your stored procedures. Before we do that. Mm hmm. Mm hmm. Mm hmm. We should talk about how cool I am, how you should give me four bucks a month. How you can click a link in the video description to do that. If you think I’m cooler than four bucks a month, you can give me more than four bucks a month. But I think four bucks a month is like my entry level cool purchase. And that all goes towards supporting the channel in general, making sure that Bats here stays full of delicious Pez and that I can pop a Pez and have the energy to carry on creating this content.

If you disagree with Pez or if you just hate Bats, you can do other stuff. You can like, you can comment, you can subscribe. And if you want to ask reasonable questions that I will answer on an Office Hours episode, the link to do that is down in the video description as well. And if you are out there struggling with a ill-performing SQL Server and you need help at a reasonable rate, well, I’ve got nothing but reasonable rates. And you can hire me to do any of this stuff, which is a pretty reasonable set of stuff, again, at a reasonable rate. So we’ve got all this going for us.

Now, if you would like some training content about SQL Server performance tuning, I’ve got beginner, intermediate and expert level stuff. Really some noodle baking activities here. And you can get all of mine for about 150 USD for the rest of your life. Long live you. Fully assembled link for that is, of course, in the video description as well.

If you would like to see, well, I mean, I will be there in person. I’m not presenting anything unless someone gets sick or dies or doesn’t show up. Then I’ll fill in a session. But if you would like to come hang out and have me give you a bag of lunch, you can come to SQL Saturday, New York City 2025, taking place on May the 10th. With a performance tuning pre-con by Andreas Vorta on November. I’m sorry, not November. May the 9th.

May the 9th. I don’t know where November came from. November is not involved with this. So we have that going for us. But with that out of the way, let’s talk a little bit about data types and store procedures. Now, I see a lot of people screw up a lot of stuff.

Mismatching date, date time, date time to stuff is a big one. That’s going to be the first thing that we talk about because there are actually some like weird bugs. And some of them depend on, well, this, the one that I’m going to say, does depend on compatibility level, but whatever.

Of course, mismatching string data types. I’m going to say, well, I guess, bear care and bear care. I just have a hard time with the bear, even though I’m pretty, pretty harsh on the car there.

Character and bear, variable character, whatever. I don’t want to sound Southern. No offense to the Southern folks.

I just don’t want to appropriate your accent in an unfortunate manner. But then, like, you know, you just see absolute boneheaded stuff like this, where, you know, people have two tables. And they’re like, oh, eventually I’m going to join these two tables together on this column and this column.

But for some reason, even though both columns will have commensurate data in them, one of them is like a varchar 255. The other is an integer. And you’re like, how’d that happen?

And lots of shrugs. And they’re like, is that a big deal? And you’re like, huh? Yeah, suppose it is.

Since we’re talking, suppose it is a big deal. Anyway, let’s move on. So this demo comes from a question that got asked a little while back on dba.stackexchange.com. And it’s something about why date time value return that matches the predicate something.

There’s a lot of words in there. Kind of a noodle bash in that one. And then that ended up with the fellow who answered it, Martin Smith, brilliant fellow, creating a feedback item.

If I remember, the links will be in the video description. If I don’t, sorry. But if you run through this demo, it’s a very strange one.

And you might want to read the question and the bug report because they do add some more detail in there. But the general gist of this is that we declare a variable called dt, which is a date time, and we set that equal to something. And then we declare a variable called dt2, which is a date time 2.

And we set that equal to the date time parameter. Then we insert the date time local variable into a table variable with a primary key on it. And when we ask if dt equals dt2, right, coming from the table, or if we say where cast dt2 as date time, which it already is a date time, like here and here equals dt2, then we get some strange results back, right?

And if you go and you read through, right, like that doesn’t make any sense. But what’s really weird is if you go and you read through the feedback item, there is a comment in there by Sir Pablo Blanco where he says if you take the index out entirely, right? So let’s just delete where it says primary key.

Look what happens now. Now neither one returns any rows, right? With the primary key thing in there, we did return a row from one of them. But without the primary key, we lose that.

So I think Microsoft does potentially have some work to do here because that is a strange result. So be really careful with the date and date time stuff. But there are generally a few data types that will always make me nervous when I see them, either as a parameter or a variable or a column data type.

That will be max strings. That goes for any variety of string that is a max data type. Except for when you’re using it for dynamic SQL, then I am a-okay with it.

XML, always nerve-wracking. JSON apparently is a data type in Azure SQL DB, which is really just a synonym for VARCAR max anyway. So, but let’s not say too much.

But like those, like XML has been making me nervous for years. Also annoying me because now I’m like, great, I’m going to have to write more X query. Now I’m going to have to go look at all the X query I’ve ever written to make sure I’m doing it right.

And of course, SQL variant is another scary one to see because you just don’t know what you’re going to get. I just expect something bad to happen whenever I see any of this stuff show up in the words I see in SQL Server Management Studio. Another thing that is very common that gets screwed up is ORM data typing.

I wish I knew what this pattern looked like in Entity Framework. But when people don’t strongly type whatever arguments they’re feeding to their Entity Framework queries, they can end up as like all like VARCAR 8000 or NVARCAR 4000 or maxes.

Or sometimes what will happen is Entity Framework will at runtime infer the nature of the string. Usually it will like default to Unicode because Unicode is like more safe, like you don’t like just like in the in the sense that you’re not going to lose fidelity if you use Unicode.

Like if someone actually has a Unicode character in there and you don’t and you don’t use a Unicode string type or parameter type, you will end up with like question marks or like blank spaces or empty boxes or something. So they all usually use like infer that some string is going to be a Unicode thing.

But then it will infer the length of the parameter based on the length of whatever you pass in. So if you were to if you had the same query running without strongly type parameters and you passed in like a variety of different arguments, you would end up with a variety of different like entries in the plan cache and query store with all different parameter lengths in them.

So like if you said like where some name equals Eric, you’d end up with an EnVARCAR 4, Kendra would be EnVARCAR 5, Tom would be EnVARCAR 3, Al would be EnVARCAR 2. And if you were trying to find Q Lazarus, you would end up with an EnVARCAR 1.

And that like you would just end up with different like I mean, even if it was the same plan, there would be different plan cache and query store entries for the same query with different plans across all of them. So that’s not very fun at all.

Now, another thing that I will see people do quite a bit is when they are trying to have, you know, they’re like you can search anything with this store procedure and it ends up, you know, being something like this where they never know what this is going to be. And in some cases, you know, like just like, you know, like within the context of Stack Overflow, like the body column is EnVARCAR max.

Like even though I think internally like Stack Overflow sets a character limit on this, like 30,000 or 50,000 or something like that, like this is what you get. Like it’s still like declared or defined in the table as EnVARCAR max.

So what I see a lot of people do is just have like, you know, their universal search string and they’ll stick some parentheses around it. And then they’ll use that to search across a variety of columns with a variety of data types.

Some of them, like these three, might be strings. Then some of them might be dates, like the ones that say date in the name. And some of them might be numbers, like the ones that say ID in the name.

And this is just a whole host of problems that you don’t want to deal with. You are, of course, in these cases, much, much better off defining a, you know, having actual search parameters for each one, like not doing this, like just having a parameter for each one of these.

What you want to avoid unless you are okay with like a recompile hint is doing something like this with it, right? Like this has lots of problems in SQL Server.

SQL Server 2025 is sort of maybe supposed to give you some help with this. We’ll see how it goes. I haven’t seen how that all works yet, so I can’t say anything. But if you want to write like, you know, this kind of search procedure in a way that doesn’t suck, I have a whole video presentation.

It’s a talk that I do on defeating parameter sniffing with dynamic SQL, which gives you all sorts of ways to write these queries in a way that will perform well across a variety of execution.

So that’s nice. Another big anti-pattern that I see is people passing in lists of things, right? Usually comma-separated lists.

Now, table-valued parameters can be a good option to get around this, but table-valued parameters, unfortunately, are backed by table variables and have their own sort of weird set of issues that you might see there.

So you might need a temp table even if you’re already using a stored procedure, even if you’re already passing a table-valued parameter to it so that you can get statistics generated on your temp table and get like better performance when you join that thing off to other things.

If you’re not using your table-valued parameter in a way where performance is critical or where, you know, you get okay performance anyway, don’t worry about it. But if you’re ever using table-valued parameters and you’re finding the query performance behavior a little weird and hard to predict, you might want to start dumping the contents of your table-valued parameter into a temp table.

So a big anti-pattern is to do something like this, though, right? Like let’s say you have your search string, and let’s say this is a CSV list of like IDs, right?

And you do something like this, and you say, ah, well, I’m just going to say where p.id is in when I split this out. And this goes for like any kind of like string dissection method.

The most evil person in the world who contributes to the entity framework, Microsoft stuff, just tried to like fix a problem with in-clauses like this with JSON and JSON string splitting.

And I’ve just had like a bunch of my clients have like, you know, been like, oh, EF Core, it’s so great. But then they get to this JSON thing, and it’s like, why do these queries all take three minutes now?

Why do these queries suddenly take 15 minutes to run? Well, it’s because of the JSON parsing. Like SQL Server doesn’t have statistics on what the contents of this string are. Even if there’s one thing in this string, even if there’s one thing, it doesn’t matter.

Right? SQL Server has no idea what’s in it, right? It does not take every value out and do cardinality for each value. It’s a complete black box for it.

So like, don’t do this. This is a bad idea. That’s why this is anti-pattern up here, right? This is not good. A much better idea, like I was saying, is a good alternative. If you’re having problems with table-valued parameters, would be to do something like this, where you create a temp table with the correct data type for whatever data you’re putting into it, right?

Like you can even, you know, index that temp table in a useful way. And you can do things to protect your query from having, like the contents of your temp table from having problems, right?

So like what I end up doing a lot is something like this, where, you know, like, and I’m not like for, like, like obviously like select distinct on like, like big results with lots of columns in them would be like a, like nasty performance hit in some cases.

But on something like this, where I’m just pulling, I’m pulling a list of values out of a string. I’m not worried about distinct for this. This is fine.

But what I want to do is make sure that like, like, you know, I’m respecting the uniqueness of the primary key with distinct and that I am not doing anything that might result in an error. So like, let’s say like there’s something malformed in the string.

And something gets messed up. And like, you end up with like the letter A as a, as one of the CSV values.

This would obviously mess up if you try to insert the letter A into an integer column. So I always do tricast. And I use tricast instead of triconvert here, because triconvert is compatibility level sensitive, where tricast you can use anywhere.

And then I also stick a where clause on there. And now, granted, if I saw, this is another one of those things where like if I saw this in like a real query, like with tricast wrapped around like a column and a table, I’d, you know, obviously have some, we have to talk about this a little bit.

But in this case, just like the distinct, because I’m just pulling a list of that, like a list of values out of a CSV string with the built-in string split function, this isn’t going to be a big performance hit.

Even if you had a lot of stuff in there, this wouldn’t be that big a deal. But once all that data is in that temp table with the correct data type, and with a good useful index on it, stuff like this starts performing way, way better.

So we do have like, this is a much better pattern for the CSV thing. Another really unfortunate anti-pattern is when, so like it’s important.

What’s important here is that like, if the base table column data type is in VARCHAR max, you’re going to see this behavior no matter what.

Because SQL Server runs into the same problem, whether a parameter or variable data type is max, or whether the base column data type is max.

And that problem is that you can’t push those predicates down far enough into the query plan all the time. Even if you have a good index, you can only partially seek to some of the values.

You’re still going to see a late filter operator in the query plan. But if I run both of these, even with a recompile hint on them, where a SQL Server should be able to say, hey, you know what?

I think something’s weird here. I think this in VARCHAR max isn’t necessary because we don’t actually have a max data, like a max being passed into this. The query plans for both of these are just about identical, right, across the board.

They both have these filter operators in them, right? So we pull all the things out of the clustered index, like all the rows out of the clustered index, all 17 million of them. And then we filter them down over here, right?

And like even with the recompile hint, where we see the literal value, SQL Server has to apply that after the fact. And that’s no good at all. So like without the recompile hint, you would see the same thing.

The recompile hint doesn’t help here. What you want to do instead is use the right, like make sure that like, you know, you’re not using a max.

If you could use in VARCHAR 4000 here, it would be fine because it’s not a max. But like what you want to make sure is that you are as much as and often as you can using the exact right data type for both of these.

Now, this isn’t going to have a profound performance difference because we don’t have a good supporting index on the post table to search by title. But notice for this one, we can like at least we don’t have that filter happening afterwards, right?

We’re going to like for like, if we look at this one where it’s parameterized, like we can at least apply the predicate while we’re reading through data pages. We don’t have to pull out all the data pages and then filter them later.

If we had an index on the title column, you know, we could do some other stuff. And, you know, kind of coming back to like some of the stupidity of the Stack Overflow database, like we could also do some stuff with like post type ID since like both questions and answers are in the post table.

There’s not like a like post questions and post answers tables like that has them separate. Like there’s a post type ID column and only post type IDs of one will have a non-null title. So like you could even do some stuff with like, you know, indexing and like instead of like, like maybe, I mean, probably be a good idea if you’re searching on title that have title available in the index, but like you might want to do something, maybe like a filtered index with post type ID equals one or like, you know, like have that in your query so that you’re only looking for questions because answers can’t have titles.

But, you know, beyond the scope of what we’re talking about a little bit, but you might have, you might have a similar situation with some somewhat denormalized results in there. So when it comes to data types and store procedures, these are definitely things you want to keep an eye out for, whether they are completely mismatched data types or whether they are just, you know, string like max data types in there.

Fixing small things can have a lot of, a lot of big effects, right? Like SQL Server can suddenly do a whole bunch of stuff better. You know, I would, I would be especially careful about accidental max data types if you are, if you are hoping for good performance out of columnstore or batch mode query plans, because, you know, that, that sort of deep data problem can really screw up which operators can like, even like, like get, get or use batch mode effectively.

I’ve seen a lot of problems with that in queries that I’ve been tuning lately where, you know, some column that didn’t need to be a max data type is a max data type and like, like batch mode on rowstore, you still get like a row mode, like hash something for that data, for like joins to that table and it messes everything up.

Anyway, thank you for watching. I hope you enjoyed yourselves. I hope you learned a little something about data types and I will see you in the next video where we will talk about whatever topic number 12 is, because this is, this is clearly topic number 11.

So we, we still have some, some things, some things on our minds, don’t we? All right. Cool. Thank you for watching.

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

A Short PSA On Transaction Count, ROLLBACK, and COMMIT In SQL Server

Posted on March 14, 2025January 22, 2026 by Erik Darling

A Short PSA On Transaction Count, ROLLBACK, and COMMIT In SQL Server

Video Summary

In this video, I, Erik Darling from Darling Data, wanted to share a quick public service announcement about a common misconception in SQL Server coding practices. I noticed an instance where a developer had used a `WHILE` loop for transaction management, which led to confusion and unnecessary complexity. In reality, the code could have been simplified using an `IF` statement. The video demonstrates how, when you start 1000 transactions but only need one rollback, it’s more efficient and clearer to use a simple `IF` condition instead of a nested `WHILE`. By showing both scenarios—using a `ROLLBACK` within the loop and moving the logic outside—the video highlights the importance of understanding SQL Server’s transaction handling. Whether you’re a seasoned developer or just starting out, this example serves as a reminder to keep your code simple and effective.

Full Transcript

Erik Darling here with Darling Data. And in today’s video, we are going to do a very short PSA on an important difference. That’s a public service announcement. We are not going to be testing our prostate hormone chemical levels. That’s a different channel. Because I saw this the other day, working with some client code. And I looked at it and I said, huh, what happened to you? Where you would feel the need to do this? Now, it was a bit of a developer misconception. This while would have been perfectly fine as an if. Because what happens is SQL Server doesn’t really support nested transactions in this way. Let’s just run the demo. So if I run this, we’re going to begin a transaction 1000 times. And then here we’re going to say, while the transaction count is greater than zero, we’re going to print the current transaction count. And then we’re going to roll back a transaction. When we do this, note that we only print the number 1000. You only need one rollback.

If we’re going to print the current transaction count. If we were to change this to say, commit. Then we would end up with results that look like this. Now notice for commit, we do de increment, decrement. I’m not sure what the right word is there. The transaction count by one, every time we issue a commit. And if I had the print under the commit transaction, you would see this zero, of course. But when you begin a transaction, you do need to pair that with a commit transaction. If you begin a transaction, even if you begin 1000 transactions, you only need to roll back once. So if you’re ever curious out there, maybe you’re not sure what to do with rollbacks and commits. Maybe you stumble upon some code like that, that really should have just been if transcontent. If transcount is greater than zero, we will roll back our transaction. Maybe you’re working with the same people I did. And they just haven’t fixed all that code yet. Tough to say. Anyway.

Thank you for watching. I hope you enjoyed yourselves. Technically, I hope you didn’t learn anything. Because you would be a dangerous human being if you learned something here. But anyway. Thank you for watching.

Going Further

SQL Server Performance Office Hours Episode 4

Posted on March 13, 2025January 22, 2026 by Erik Darling

SQL Server Performance Office Hours Episode 4

You have said that table variables, CTEs, Change Tracking, and Azure Managed Instances all suck. Do you have a full list of “features” to avoid?

My company wants to move to Azure, but Azure DB performance sucks. What do I do?

Any experience with using a UTF8 varchar collation vs an nvarchar data type? Seems like in most circumstances UTF8 is just unambiguously better for for performance and storage on net-new development on SQL 2019+

why SQL Server sometimes change it execution plans when maxdop change from n to n+1 (n>1) (eg maxdop=14 and maxdop=15). No data changes, no statistics changes, no confif changes… only maxdop changes between runs

To ask your questions, head over here.

Video Summary

In this video, I delve into some of the questions submitted by our community during Office Hours. Starting off, we tackled the issue of using higher transaction isolation levels to resolve consistent deadlocks, emphasizing that while Snapshot Isolation can be beneficial in certain scenarios, stricter isolation levels generally increase lock contention and are likely to exacerbate deadlock issues rather than solve them. Moving on, I discussed various T-SQL constructs like table variables and Common Table Expressions (CTEs), highlighting their performance pitfalls and the reasons why they might not always be ideal choices for complex queries. The conversation then shifted to Azure SQL Database and Managed Instances, where I expressed my disappointment with these offerings, suggesting that while there are occasional use cases, in most scenarios, Azure’s products fall short compared to alternatives like Amazon RDS.

Throughout the session, we also touched on the nuances of UTF-8 collation versus N-VARCHAR data types, exploring how they might impact performance and storage in SQL Server 2019 and beyond. The discussion underscored that while UTF-8 can offer advantages for certain datasets, its benefits are not universally applicable, and careful consideration is necessary to determine if it’s the right choice for your specific use case.

Full Transcript

Erik Darling here, and gosh, you know what? It’s time for another Office Hours, where I answer your user submitted questions from the bottom of my heart. If you would like to support the things that I do here for you on YouTube, on your tube, you can sign up for a membership. They’re cheap, they’re easy. It’s a fun way to say thank you. About 60 other people have done it. So far, so, you know, you wouldn’t be alone in your gratitude. If you don’t have any spare change, you can’t hit the tip jar, because you’ve already tipped your barista $30,000 this month, you are more than welcome to do other things like like and comment and subscribe. And if you would like to be a part of the Office Hours magic, if you would like to ask me a question and have me answer it, that link also down in the video description. So I save you a lot of work. If you would need help beyond just asking a question on the internet, which many of you do, I am available as a consultant with very reasonable rates for all of these things. Health checks, performance analysis, hands-on query and index, server tuning, you name it. Putting out performance emergencies and training your developers so that you have fewer performance emergencies.

All together. If you would like to get some training from me to sort of reinforce any of the concepts and topics that we talk about here. I have about 24 or 25 hours of training available for about 150 USD when you use that discount code. And that lasts for life. It is not a subscription product. If you die and are reincarnated, you would have to resubscribe. I cannot offer. I cannot offer. I cannot. I cannot offer. I cannot offer. I cannot offer. I cannot offer. I cannot offer. I cannot offer transfers of the training to new souls. That is beyond my capabilities. I am not an omnipotent being. So I’m sorry. You’re just going to have to deal with that. SQL Saturday. SQL Saturday. New York City 2025 taking place May the 10th with a pre-con on May the 9th. Some performance tuning stuff with Andreas Volter. You can catch those things coming up in three months about. Around about three months. Close enough.

Yeah. Yeah. Today’s November. Sorry. February plus three is May. So that’s five. Good. All right. Finger math. All right. Let’s do these office hours. Okay. We’re going to hit up Excel and we are going to zoom in. We are going to answer this first question right here.

If I’m trying to resolve a consistent deadlock by increasing the transaction isolation level, am I doing anything obviously wrong? I thought it was a bad sign when I was stuck between picking snapshot and serializable. Gosh, you have got yourself all twisted up, my friend. No, higher isolation levels increase the lockiness.

So first, let’s separate snapshot and serializable. They do have some guarantees in common, but the mechanism by which they provide those guarantees are much different. So if you’re using serializable or repeatable read to try and reduce deadlocks, you’re not going to have a good time. Both of those isolation levels just increase the likelihood of deadlocks because they are far more intrusive.

Snapshot could certainly help deadlocks between read queries and write queries and to write queries, assuming that you are set up to accurately use the snapshot isolation level for both of those. And you are prepared to handle errors when there are conflicts between them. Read committed snapshot isolation. Since you did not provide further detail, you just said snapshot.

There are two optimistic isolation levels with snapshot in the name, read committed snapshot isolation and snapshot isolation. And snapshot isolation, the SI, could certainly help with rewrite conflicts as long as you are prepared to handle those or conflicting write queries that are deadlocking. Read committed snapshot isolation would not help with that.

Would not help with to write queries, but would certainly help with read queries and write queries. There’s not really much else for detail in this one, so there’s not really more I can say here. But I would urge you to not use stricter isolation levels to reduce deadlocks because that will have the opposite effect.

So, yeah, careful there. All right, moving on. Our next wonderful question here is, you have said that table variables, CTE, change tracking, and Azure managed instance all suck.

Do you have a full list of features to avoid? Well, we do have to separate these things into some categories. So, table variables and CTE are T-SQL constructs, which require a very carefully informed choice to pursue the usage of.

Table variables have many issues, performance issues when it comes to SQL Server. You can’t insert, update, or delete using a table variable. So, like with the table variable as a target, using a parallel execution plan.

So, if you have like a big heavy hitter insert query to find the data you need to put in there, having it run with a parallel query plan is probably going to be a lot better than having it run with a single threaded query plan. Table variables on their best day only get table level cardinality. You don’t get a statistics histogram with a table variable.

So, two fairly big downsides for the serious use of table variables in queries that have high performance demands. They can be very good for high frequency execution queries where the ultimate plan shape doesn’t affect query performance because all the stuff SQL Server does to temp tables to make them useful for performance reasons with more demanding queries doesn’t get done to table variables.

So, the caching, creating statistics, stuff like that. Change tracking, I mean, yeah, I tend to hate change tracking. I like change data capture a whole lot better.

And Azure managed instance is, it’s just a, it’s such a sad story with Azure managed instance. I really had high hopes for it and boy did that let me down. But these are all quite different categories of things.

I don’t really have a full list of things that you need to strictly avoid. I would, I would put Azure SQL DB and Azure managed instance on that list. But we’ll probably talk about that a little bit more when we get down to the next question.

Azure managed instance isn’t really a feature. Azure, it’s just a crappy way to host SQL Server. So, no, I don’t really have a full list of stuff.

But I do try to, you know, put it out there that, you know, all of these things are in the product for a reason. There are often reasonable uses for things. My main goal as a consultant though is performance tuning.

My job is to make queries go as fast as possible. And my disdain for a lot of the things that I talk about comes from the fact that they often don’t have good performance repercussions when used. So, like CTE in SQL Server, there’s no way to materialize them.

Unless you just dump them into a temp table on your own. Pound sign temp table. When you re-reference them, the whole query inside them executes over and over again.

That’s often not the behavior that a lot of people are after. So, like my experience with these things is generally just from a performance point of view. There are perfectly good reasons to use them when performance either doesn’t matter or when they’re not causing a performance.

They’re not the reason for a performance problem. Change tracking bugs me because it adds overhead to every single insert, update and delete. You have to not only maintain the table, whatever triggers are on there, whatever index views might have that table included in them.

Whatever foreign keys might be evaluated when that table gets modified. But, like all the indexes on that table. But now you also have to maintain the change tracking tables.

And like that all just sort of, like it just adds up and adds up and adds up. You’re just not doing yourselves a lot of, like any favors by adding more overhead to every transaction. That’s why I like change tracking because it’s asynchronous.

It works off the transaction log after things get done. So, like, you know, for like the T-SQL stuff, you know, again, if you’re going to use whatever, however you’re going to write your query, make sure that you are completely informed about the choices you’re making and the way you form the query.

Whenever you want to add in, like, a new thing, whenever you want to make SQL Server start doing a new thing on top of all the other stuff it’s doing, just remember that that’s not going to be free, right? Like, absolutely nothing is free.

Like, auditing stuff isn’t free, whether it’s triggers, like a built-in audit, you know, monitoring data changes, whether change tracking or change data capture, neither of those are free. Like, even change data capture, which I prefer, it’s a lot more data to hold usually, right?

Because you have this, like, shadow change copy of the table living in the CDC schema. Like, nothing is free. There are times when you can use these things and they will not have any awful performance repercussions on your server.

But, like, at some point, like, everything, with the exception of managed instance, and, like, I don’t know, if you have, like, a useless toy 5G database workload, maybe general purpose managed instance will do okay for you.

If you have actual needs, you’re gonna have to pay out the nose to the back teeth, up, up, up the backside, money-wise, to get the business critical version, which just isn’t worth it. Like, it’s just not worth it.

It’s too much money for what you get. And adding the fact that Microsoft support is, like, just a bunch of blind rats, like, running around, bumping into each other. Like, there’s no point in using Azure for anything.

So, but, you know, we’re getting a little bit ahead of ourselves. But, like, these are, like, you know, the way you write the query, that’s one thing. This is, like, you wanting to add something into the workload with, like, temporal tables or something.

Like, all these things have overhead and all these things have drawbacks. But let’s actually answer this next question. My company wants to move to Azure.

Lo siento. Microsoft must have offered them a sweet deal if they want to move to Azure. But Azure DB performance sucks. What do I do?

Like, really, it depends on how sweet of a deal they’re getting. Microsoft is not below or above bribing people to use their services. Like, cutting them deals, offering them money for, like, like, consulting, like, fake consulting bucks that they can, like, pay people, like, Microsoft partners to, like, do work with.

There’s all sorts of stuff that Microsoft will underhandedly do to get people to start using Azure so that they can have a bad time once they’re there. You know, and, like, really, like, the only alternative cloud provider that I think feel like does things right is Amazon. RDS is a much better product.

Not perfect, but I think a much better product than Azure SQL DB or Azure Managed Instance. And, you know, like, the VMs in Azure are, I mean, they’re probably about fine. But I think what kills me is every time I think about it, like, Microsoft owns SQL Server.

They could have just run away with an awesome hosted managed product for SQL Server. They could have run the board with it. But Amazon just did it so much better.

I don’t know how that happened. I don’t have an explanation for you. But, yeah, you’re right. Azure SQL DB and Azure Managed Instance are pointless, pointless products. I wish they’d disappear.

The only thing that you could possibly do is make, like, either a feature or performance comparison case with Amazon RDS, which I assume would be commensurate to what your company is looking at in Azure. But if Microsoft is just throwing money at people or throwing service at people, you might not be able to avoid it.

You just might be stuck. Where your company is going to be pissed is when, like, after a while, they start, they realize that, like, you know, either the Microsoft money runs out and the bill goes up to normal or they find out that, like, that money is not going to go as far because the stuff that, like, where you land in Azure wasn’t nearly what you actually needed in Azure. And all of a sudden the price goes way up, right?

Like, that’s where people start getting pissed. And then, you know, it costs an arm and a leg to move from one cloud to another. So, like, the only thing you can do is, like, get way ahead of the curve and offer, like, a price performance feature comparison and just pray that Microsoft hasn’t bought them already. That’s all you can do.

Next question here. Next question here. Any experience… There we go.

Any experience with using a UTF-8 VARCAR collation versus an N-VARCAR data type? Seems like in most circumstances UTF-8 is just unambiguously better for performance and storage on net new development for SQL 2019 plus. Yeah, so the new UTF collations are interesting because they’re still VARCAR, but they have, like, the extended character sets.

Not the special characters, but, like, they have a lot of, like, the Unicode… Like, they can have a lot of the Unicode characters in them. I am not a big collation person, so I actually, like, doodle around the internet a little bit trying to find, you know, some information that I could be useful to pass on for you.

And what I got over here is… I’ll put these in the video description. There’s a very good article on collation and Unicode support on the Microsoft Docs site.

There’s also a pretty good question on the DBA Stack Exchange website with a couple pretty good answers. One of them by a fellow named Solomon Rutsky, who is a big, big, big collation person, where they go through… Where they both talk about some stuff.

What I found really interesting in this answer was this chunk of stuff. And it probably helps to have a little bit more of this front first. So, like, where…

So, like, the only scenario that truly benefits from UTF-8 encoding is one in which all of the following conditions are true. And these are the following conditions. Data is mostly standard ASCII values, but has a small amount of varying range of Unicode characters. The column is currently or would otherwise be in VARCAR Max, meaning it wouldn’t fit into VARCAR 4000.

There’s a lot of data for this column or set of columns. It’s one gig or more when stored in VARCAR. So, I don’t know.

Once you start getting into some of the longer stuff with a little bit more nuance, I don’t know. Unambiguously better, I don’t know about that. But, you know, you might check enough of the boxes here that you might have a better time with it. So, potentially yes, but I don’t know for sure.

Alright. Let’s come back over here. And let’s try to answer this one. This is an interesting one.

Why SQL servers sometimes change at execution plans when max.dop change from n to n plus 1, n greater than 1? E.g. max.dop equals 14 and max.dop equals 15. No data changes, no statistics changes, no confif changes.

I’m gonna assume that’s config. I’m gonna take a wild guess at that one. Only max.dop changes between runs. So, I’m trying to parse this a little bit.

What I’m unclear on, a little unclear on, is if you’re asking me why the dop of a query might change if a plan changes, or if the execution plan changes if you specify different dops. The max.dop thing, like if you specify different dops for a query, you can certainly see different query plans. I actually have a blog post about that.

And I’ll put this in the show notes as well. And this was something that sort of surprised me a little bit. What I found out of it was that, like, the execution plan changed drastically when going from dops 1 to dops 2 and then above dops 2.

The reason why I found it interesting is because, you know, like, when you’re learning about parallel execution plan stuff, like, and you know, you start sort of like learning the process for it. What happens is someone will inevitably tell you that all queries start as serial execution plans, which is true. And if the cost of that serial execution plan is greater than your cost threshold for parallelism, and there’s nothing like a scalar UDF or an insert or modification where the table variable is the target that would prevent parallelism for the entire plan, then SQL Server will explore parallel execution plans.

If the parallel version of the serial plan is cheaper than the serial plan, then SQL Server will choose the parallel plan. Now, what kind of gets lost in there is that SQL Server, when it starts exploring parallel plans, will explore different parallel plans from the serial plan. It’s not just like, if we add parallelism to the serial plan, like, is it cheaper?

Like, though that is a choice, SQL Server will also explore other parallel plans. So the query plan for this thing at dops 1 looks like this, right? You have a couple scans, merge join, top, scan, oops, scan, another merge join.

And then at dops 2, the query plan looks like this. You get a couple scans, you get a hash join, you get a top, you get another top. And then you get over here, you get a nested loops join and a seek.

And then at dops higher than 2, the query plan changes again, where it almost looks like a parallel version of the original serial plan. Remember, this is the original serial plan, where it’s scan, scan, scan, merge join, top, merge join, right? And then the parallel version of this plan from dops 3 to 8 is scan, scan, merge join, top, scan, merge join, but now just with parallelism operators in it, right?

So if you change the dop of your query, if you say, you know, run it max.14, now run it max.15 or whatever, SQL Server might choose a different plan because you have more, like with more threads involved, a bit like the shape of the plan might make sense to do other stuff. And remember, you’re still going through all the cost-based optimization stuff at these different dops. And if you’re using different dops, you’re essentially telling SQL Server to make a new plan.

And so SQL Server is like, well, I’ve got this extra thread, and this other plan might be better with this other thread. I can do this more efficiently. Like, so, you know, it’s like you’re still going through all the cost-based optimization stuff.

The different dops just give SQL Server, like, another thing to think about what it might do some different work with. Now, you say that, like, data hasn’t changed, statistics haven’t changed. A lot of people say that, and it’s not exactly true, but I’ll go through and believe you on this one.

Most likely it’s just going to come down to SQL Server compiles a new plan, goes through cost-based optimization, right? Because you’re changing the allowed dops so SQL Server isn’t going to reuse a plan. It’s like sending a new query in effectively.

And when SQL Server goes and evaluates different plans with, like, even just one more dops, right? Like, this is dops 2, right? Where SQL Server makes these choices.

And then dops 3 to 8, SQL Server makes these choices. So you might be hitting, you know, one of the SQL Server’s famous tipping points where going from dops 14 to dops 15 changes SQL Server’s mind about, like, what would be the cheapest plan during cost-based optimization. So that’s what I got there.

But anyway, that brings us to the end of these five questions for this Office Hours episode. I’d like to thank you for watching. I’d like to think that you enjoyed yourselves.

I’d like to think that you had a good time. And I’d like to think that I’ll see you again in the next video. But I’ve been wrong before. I’ve been misled. Led astray.

Lied to. Let down. It’s a sad, sad life. It’s a sad life. Anyway. I’m going to delete these and try to figure out what to talk about next.

Find some nice uplifting SQL Server messages for everyone. But anyway. Thank you for watching.

Goodbye. Hello, I’m my friend. Bye-bye.

Going Further

All About SQL Server Stored Procedures: Isolation Levels

Posted on March 12, 2025January 22, 2026 by Erik Darling

All About SQL Server Stored Procedures: Isolation Levels

Video Summary

In this video, I delve into one of SQL Server’s most misunderstood concepts: isolation levels. As always, I aim to clear up common misconceptions and provide practical insights based on my extensive experience working with these settings. I start by explaining why read committed snapshot isolation (RCSI) is often the best default setting for most workloads, offering consistent results without the blocking issues that can arise from the default read committed isolation level. Throughout the video, I demonstrate how different queries behave under various isolation levels using a simple example in the `isolation level test` table. By showing you real-world scenarios and their outcomes, I help you understand when to use or avoid certain isolation levels based on your specific needs and workloads.

Full Transcript

Erik Darling here with Erik Darling Data, subsidiary of Batsmaroo, Inc. And in today’s video, we are going to discuss one of my favorite, least favorite topics when it comes to SQL Server store procedures, and that is isolation levels. The reason why it’s my favorite topic is because I am right about them. The reason why it’s my least favorite topic is because everyone else is wrong I spend a lot of my life trying to impose correctness on people where they are currently incorrect and struggling with just how incorrect they have been for their entire lives when it comes to isolation levels. Now, one thing that’s important to say up front is that everything I’m talking about here, of course, is specific to SQL Server. Nothing that I see here becauseэт Take it ways that there. It’s a good to remember to escape terrllen’t a little uあと. ecstall that a little bit of a little so that SQL Server for using a bit. ESPTest server stuff. But before we talk too much about isolation levels, we need to talk about our current financial arrangement. Mostly, I believe you should give me money. If you would like to become a paid member of this channel and support these videos, you can do that using the link in the video description. And for as little as $4 a month, you can maybe get me a better haircut eventually. If you like this content, but you’ve spent all your money on hair products, perms, bleaches, whatever it is you people do with your hair out there, you can do all sorts of free stuff like like and comment and subscribe, which also won’t get me a better haircut. But I don’t know, it’ll maybe blow my hair back. That would be cool. If you want to ask me a question privately that I will answer publicly, you can go to that link, which is also noted in the video description, and you can submit a question. And I will answer it here on a YouTube video to the best of my ability.

I am primarily focused on SQL Server performance questions. But I get a nice smattering of other stuff in there too. So you know, whatever works. If you need more help than a YouTube Q&A or video can assist you with when it comes to the performance and general well-being of your SQL Server, I do all of these things for money at a reasonable rate. I promise you, you will not be disappointed.

I guarantee your satisfaction. Another thing that I will guarantee your satisfaction with mostly is all of my training content. I have 24 hours of it available. And with that discount code, the final price tag is about 150 USD. And you have that for the rest of your life. There is no subscription required. But of course, if you want to keep giving me money, I will not argue with you.

That is one thing that I have learned not to do is when someone wants to give you money, you don’t say no. Strangers, whoever it is, say yes. Say yes to the dress money. Anyway, SQL Saturday, New York City is coming up May the 10th, taking place at the Microsoft offices in Times Square. There is a performance pre-con by the lovely and talented Andreas Volter taking place on May the 9th. That is a full day session. And you will see me on both days, assuming that my health and well-being maintains through May the 9th. So far, so good. Knock on that wood. But yeah, it would be lovely to see you there. We can give an appropriate amount of affection to each other, whether it’s a handshake or high five or fist bump or elbow bump or whatever you’re into. Whatever it is. Just no tongue stuff, because that would be cheating. Anyway, let’s get on with the party here and let’s talk about isolation levels. Now, I’ve talked about isolation levels a lot on this channel because there is a lot to talk about with them. There’s a full playlist called Everything You Know What Isolation Levels is Wrong, where I talk about many things at great length. If you would like that, you can go watch that for free. You know, finish watching this thing first. This is a nice video, but you can do that.

For most workloads that I see and work within SQL Server, read committed snapshot isolation at the database level is generally the way that things run best. With no interference, the majority of read queries and write queries will stop blocking and deadlocking and fighting with each other.

And you get more consistent results than would provide if you were to take the coward’s way out and just throw no lock hints everywhere. Because it gives you what most developers would expect from a database. Read queries won’t block write queries. That can happen under read committed, the default non-row versioning version of the default in SQL Server. And read queries, of course, get blocked by write queries under that one. And of course, read queries and write queries will deadlock with each other. Another nice thing that you get out of read committed snapshot isolation is consistent results. Like I said before, if you take the coward’s way out and you use no lock hints, you can get very inconsistent results. We’re going to look at a lot of this stuff in the demo portion of this video. You only get one. This is really important. You only get one of those things from read committed, the default isolation level. But you get all of them with read committed snapshot isolation. Read committed, the default isolation level, does not make any very strict guarantees. And of course, your read queries and write queries get into all sorts of terrible blocking and deadlocking entanglements. So the thing that I want to talk about, you know, I’m going to get a lot of today is sort of like, when you’re writing queries that read data, you should have a pretty clear idea in your head, how you would what results you would expect from those, like, like, what results you would expect those queries to return, when they’re running in concert, with modification queries, right, you need to understand, like, what is an acceptable level of fault tolerance for this query to return results quickly. And what results do you want to get back? Like, what results are you okay with this query? Get it, right? That’s an important thing. So I’m going to use the crap database for this one. I’m not going to use Stack Overflow, because a lot of the times when I make a database settings change like this one, I forget to reset it. And then I go to do something else. And I’m like, SQL Server changed. So we’re not going to do that. I’ve got a store procedure that just drops and recreates a table and populates it with some data. I’m just going to use that for convenience down here. Right? So we can we can run this, and it will reset this table called isolation level test. And it’s going to put some rows in there with some numbers, right? So not a big deal. It’s not anything that we need to care about too, too much.

But what I want to show you is different ways that different queries behave under different isolation levels, because that’s important stuff. So over in this window, I’ve got this I’ve got this quoted out for actually, you know, we don’t need this one. We’re not going to not going to get too far into that. But actually, you know what, I do want that for another thing. So we’re going to leave that there. Actually, I changed my mind. What I want to show you first is just that like, this select query running with no lock, nothing else is happening. So I can run this a million times, it’s always going to return return 84,000. Where you have to start asking questions about no lock is like, at what point in queries doing stuff, do you care about this, like seeing like, like uncommitted transactions, right? So like, if we come over here, and I start this transaction to update data, where ID equals nine, the no lock query isn’t going to get blocked, but it’s going to see that transaction uncommitted, right? So this this this number changed from 84,000 to 75,000.

Because we set some amount to zero for this one, then if I delete like all in the same transaction, if I delete ID 15 from the table, and I rerun this, this can immediately see the effect of that delete from the uncommitted transaction. And then if I run this final insert here, and I stick a row in there, like, again, this transaction still hasn’t committed, we still have this, like, this is still in progress, no lock can see, like all of that stuff, like up to like up to when that commit happens, right? So like, that commit finally happens. And like, I can now run this and I get 60,001 over and over and over again, that might not be what you want to happen. Like if you if you think that the correct results are, like, you should only see the end result of this unit of work, no lock slash read uncommitted are not for you, right? If this unit of work means like this signals that like you have completed work on the table in a way where now SQL Server can see the results. No lock is not a good choice, because you can see all the intermediate results from within this transaction.

That’s generally not a good thing. That’s generally not a good thing. If there is any like, like, if anything is going to happen in the world that’s important, based on what the like, like the query results are, not a good choice with no lock, right? So let’s reset the table. And now let’s look at read committed, right? So this database has read committed snapshot, read committed snapshot isolation enabled for it. So if I want SQL Server to obey the semantics of the default read committed isolation level, I need to add this read committed lock hint to both of these queries. Now this go like both of these are going to return the same results of 84,000. Right? So that’s fine. But let’s start, let’s start this transaction over again. And let’s run this update. Now, if I run this, we’re going to get blocked here. Right? So this query is blocked. And now this this delete, I can run this in this transaction. And this this insert, I can run this in this transaction. And now when I commit this transaction.

These queries are both of these queries, these queries are both going to return 60,001, which is like, you know, if you wanted that whole thing to be treated as that unit of work, cool, right? Because you got the final result after that unit of work completed in both of these queries sock. Where things get weird with read committed, though, is when things start happening in like different orders, where maybe like, like, you get blocked at one point, but then data changes in the in the in the table from data you’ve already read or data you haven’t read yet. So what I’m going to do this time is a little bit different. I’m going to begin the transaction. But instead of instead of running the update first, I’m going to run this. Right? So I’m going to up I’m going to delete I there. And I’m going to hit F5 here. And this query gets blocked. Now rather than getting blocked on ID nine, now this thing is blocked on ID 15.

So if I update ID 16. So if I update ID nine here, and then I insert ID 16 here, and then I commit this, these two queries are going to return slightly different results, because this query had already read data from the data from the table and said, I have these results. Right? So like it read that data calculated result from that data got blocked and then data changed from data that already read. So read committed has like, again, very weak guarantees and probably not something you want to rely on. Of course, now that everything is done, I can run these and they both get the same results. Right? So let’s, let’s make sure that that’s fully committed. And let’s reset the table. Now, where like other isolation levels, like serializable and repeatable read, you would get sort of similar results as read committed lock, except if you change data that these queries had already read, like if I did, if I could, if I do the update first, and then change the other stuff, both repeatable read and serializable will get blocked on ID nine, where the update happens, and then read results after everything changed. So it was like the initial run through a read committed lock.

Where that would be different is if I tried to change data that either one of those had already read, then you would probably get a deadlock that those two isolation levels are much more strict. But they’re also much more likely to cause deadlocks when you start like intermixing reads and writes because you have reads now all of a sudden holding locks. Read committed doesn’t hold read locks anywhere takes them and releases them very quickly. That’s why you can get data jumping around in your table, even when like even under read committed, right? So that’s that’s that. Now, if I take out both of these, both of these hints, right, both of these locking hints, and I run this, we’re back to 84,000, because I reset the table, where this is important now, and actually don’t need the second query for this one. Right? If I do this, right, I’m getting 84,000 back from this query, this query right now, because I changed the database default to read committed snapshot isolation. Now I’m using an optimistic row versioning isolation level. So what will happen is if I come over here, and I do like I could do repeat any of these in any order, and it would still be the same, but I’m just going to show you in like the order that I’ve done most of the other demos in. If I do the begin, tran and update, this query won’t get blocked, but it won’t be seeing data changes from that unit of work.

So the question you have to ask yourself is, are you okay with this data returning results from how this data looked prior to this unit of work happening, right? Because I ran the delete, and I ran the update, and we’re still getting 84,000 back. Right? So this is still returning the same number as we were getting when I put data fresh into the table. So the question you have to ask yourself is, when this happens, do I want my query to get blocked and wait for that unit of work to happen?

Or do I want my query to return results as you saw before that unit of work began? Now if I come over here and commit this, right, this is fully committed now. If I come and run this, now I’m going to get the updated result from 84,000 to 60,000. So the question to ask yourself is, are you okay with queries seeing data that’s in flight within this unit of work?

If you’re not, if you need data processing to finish before you want to report changes from that unit of work, NOLOC and read uncommitted are not for you. If you want to, if you want your queries to wait and return results from after this unit of work is complete, then you would want, or rather, then you would want a locking isolation level. Like meaning you would need read committed lock, which is like read committed the default database isolation level for SQL Server.

Depending on if you’re okay, and like, but again, very weak guarantees there, data can move all around in the table when that happens. So like, like those read locks get taken very briefly and then released. So if you’re okay with like data in the table changing all around your query, read committed might be okay for you.

If you’re not, you would probably need either repeatable read or serializable, but you’re going to see a lot more deadlocks and a lot more locking activity with those. Because those read locks like are held onto, right? SQL Server doesn’t just immediately release those the way it does with read committed locks.

If you’re okay with your query seeing the, like the last known good set of like data in the table until that unit of work completes, but your read queries finish immediately, like someone just sitting there like refreshing. And then this unit of work finishes and they get the updated result. Well, then that’s where optimistic isolation levels are okay for you, right?

That’s where you would want to choose is read committed snapshot isolation. If you want it to be the workload as a whole or snapshot isolation, if you want to choose specific queries to behave that way. So rather than thinking about, you know, like thinking sort of theoretically about which isolation level is okay for you across the board, think about the way you want most of your queries to function.

And then think about like when you’re writing a specific query, like what you’re okay with that query being able to see as other units of work, other queries, in other words, in the workload are doing their units of work, right? So like you need to think more about like expected outcomes and what’s a correct result and what’s not a correct result than specifically like which isolation level works everywhere because no isolation level works everywhere. Some isolation levels work great.

Like what I’m saying is optimistic isolation levels work great across the majority of queries and workloads that I see. There’s only some very specific parts of that work, those like only specific queries, only specific parts of that workload that have stricter needs where an optimistic isolation level might not be the best choice. So think about that stuff when you are thinking about what is appropriate for your query to be running under.

Most of the time it is not going to be, excuse me, no lock or read uncommitted. Most of the time it is going to be either what you would see in effect from an optimistic isolation level or what you would get at the end result of read committed, repeatable read or serializable. So anyway, I hope you enjoyed yourselves.

I hope you learned something. I hope that you will watch my full playlist. Everything you know about isolation levels is wrong because I go into a lot more depth on a lot of these different topics, much more than I can fit into this one video. But it’s a good place to start learning more.

Start or progress upon your long journey to being correct about isolation levels. It took me a while too, but eventually with enough trial, tribulation, practice, reading and thrashing my hands against the post, I eventually got things I think mostly figured out.

So we have that going for us. Anyway, where was I? Thank you for watching. Oh, I enjoyed this and I watched the other thing. Yeah, cool.

All right. Well, that wraps this one up. I’m going to get demos and everything prepped for the next video, which hopefully you will stay tuned for. And with all that, once again, thank you for watching. Goodbye.

And I hope you’re happy. Bye. Bye. Bye. Thank you.

Going Further

All About SQL Server Stored Procedures: Some Notes On Locking Hints

Posted on March 11, 2025January 22, 2026 by Erik Darling

All About SQL Server Stored Procedures: Some Notes On Locking Hints

Video Summary

In this video, I delve into the fascinating world of locking hints in SQL Server queries, addressing some common misconceptions and highlighting their practical applications. I start by discussing how even with a good index in place, locking hints are merely suggestions to SQL Server, which can escalate locks based on the situation. This leads us to explore useful scenarios where locking hints are particularly beneficial, such as preventing race conditions under read-committed isolation levels and ensuring fully parallel inserts into temporary tables. I also cover some potential pitfalls, like how clustered indexes or identity columns can interfere with parallelism, and demonstrate practical examples using T-SQL code to illustrate these points.

Full Transcript

Erik Darling here with Darling Data and got, I don’t know, mildly educational video here about some stuff with locking hints and SQL Server queries. This is by no means a deep dive into every single locking hint. Most of them are fairly easy to figure out, but I do want to show you some misconceptions about locking hints. And then some useful locking hints, situationally useful locking hints. And then some potential gutches with a couple other locking hints. Alright, so that’s what we’re going to talk about here. I probably should have progressed that slide before I started yakking about locking hints. But anyway, before we get into all that nonsense, let’s talk a little bit about the health and and and I class, thatare.

support my efforts to bring you these interesting educational SQL Server videos you can sign up for a membership right about here in the video description for as few as four dollars a month pre-tax before before it comes to me it does get taxed YouTube does take 30% out of that so I’m looking at basically like three one dollar scratch tickets and a Lucy from the bodega for every subscribe for every paying member if you can do that great if you have already blown your financial load this month on scratch tickets and Lucy’s at the bodega well you’re a person after my own heart and you would have to do something a bit less expensive in order to show your undying loyalty to being a data darling you can like you can comment you can subscribe and if you would like to ask a question on office on my office hours episodes where I take a question excuse me that you ask semi privately and I answer it very publicly this link is also available down in the video description as well and you can submit a question and I will answer it assuming that it is legible and vaguely understandable if you need help that goes beyond just asking strangers random questions on the internet I am of course available to do consulting for SQL Server all sorts of good stuff health checks performance analysis hands-on tuning responding to performance emergencies and of course making sure your developers smarten up and start not causing performance emergencies so all of that stuff is available on this beautiful face if you would like some recorded versions of this beautiful face teaching you about SQL Server you can get all 24 25 hours of my performance tuning content for about 150 US dollars and that will last you the rest of your life good for you there upcoming events we are still looking forward to SQL Saturday New York City launching May the 10th taking place in Times Square beautiful crime-free clean friendly Times Square where you can come and you can go to the Microsoft offices there and you can you can learn all about all sorts of data stuff so that’s a that’s a happy that’s a happy time for everyone you can get lunch I’m served by me not made by me but I’ll I’ll hand you a bag maybe maybe I’ll just let you take a bag if you’re really nice I’ll let you take two but you got to be nice anyway let’s talk about some locking-in stuff because it is fun and interesting now what I want to show you here is that even with a good index in place and even doing something like this where we are limiting the number of rows that come out of that we are like planning to modify the roll lock hint for a SQL Server is merely a suggestion all right so if we do this and we run the code in here SQL Server does not SQL Server rather SQL Server does maybe start taking roll locks but it does escalate those roll locks up to an object level lock if we were to change this to a much higher number like 90,000 and we were to rerun this SQL Server you’ll see that you know in this case SQL Server does actually take key locks right we do that stuff in here right so here’s all our key locks and well I mean there are some some page locks as well but all of the all of the X locks right we have IX locks on pages and objects and stuff those aren’t like the that those aren’t like the real exclusive locks but the X locks for these we do have those actually happening here right so for the actual X locks those are only taken at the row level but for everything else we have just intent exclusive but when we had like like when we had to sift through more stuff in that that that top 5,000 query when we had that set to 30,000 SQL Server was like well that’s a lot of key locks that’s a lot of roll locks we’re escalating that to a table level lock right remember that locks go from row to object or page to object they do not go from row to page to object so even the row lock hint can be ignored by SQL Server where I end up or rather the where I see people like forgetting that they need lock hints is covered in another great post by my dear friend and Canadian Michael J Swart I look forward to you joining me as an American in the 51st state my friend see you soon so we’ll have passport free friendship we do like I do see people messing this up a lot so when you use merge and that merge has multiple actions right tip cop most commonly update insert right if you just have a single action merge I mean granted there are like some interesting use cases for for that for doing that with inserts but usually like a combo merge thing you do need to either set the transaction isolation level for that merge statement to serializable or hint the serializable isolation level with your your the table you’re merging into okay so important stuff there a place where I love using locking hints is in places where people did not know that they were subject to potential race conditions in their code I’ve talked about this a bunch in some recent videos so I’m not going to spend too too long on it here because we have some other stuff to cover but if you are doing anything that could be negatively affected by concurrency either while your query is running remember read committed not a very strict isolation level allows all sorts of other stuff is not a snapshot of your data and you need to make sure that you are actually getting the current top thing to do something with like this then you need to make sure that nothing else can mess with your data while you are in the process of finding that data right so you need locking hints to prevent race conditions even under read committed the default pessimistic isolation level brace conditions are not alone with often in optimistic isolation level land it can happen to you anytime anywhere another place where I lay I end up using locking hints a lot is when I am putting data into temp tables now table variables without doing some real crazy tricks you can’t get a parallel insert into them but when I’m dumping data especially a large like a fairly large amount of data into a temp table well I usually want to make sure that I get a fully parallel insert while that’s happening there are some things that will prevent that for example if you have a clustered index on the table if you have a well clustered primary key but really any nonclustered index right or if you have an identity column on your table you will mess up SQL Server’s ability to do a fully parallel insert it’s worth noting that if you need something to behave like an identity column in your temp table just replace the I mean you can just like remove the identity attribute from whatever column that is and just use row number in your select list to generate that auto incrementing number row number does not have this limitation so you can get around a lot of problems with that so it’s any of this stuff of course any nonclustered index like I said that’ll also mess you up but we’ll get to that in a minute so let’s run this query and these there is not a tremendously big performance difference for what I’m doing here but there is but the execution plan is a good enough example and there is some benefit that you can see from these so in this query plan note that we do have a parallel a scan of the clustered index right we can have a little parallelism operator here but then we gather streams almost immediately afterwards and the insert into the temp table is done outside of the parallel zone right this marks the end of the parallel zone right here and this happens single threaded this all takes 1.4 seconds we can see that right there now we can get around that by adding a tab lock hint to the insert so again no no clustered index no identity column no primary key no nonclustered indexes it’s got to be a heap for this if we run this we will get a fully parallel insert and this will shave a bit of time off that notice now the gather streams operator is over here the insert is with before that and it has our little racing stripes on it and rather than taking like a second and a half it takes about 800 milliseconds so this is not for a ton of rows obviously the more rows you get involved with here the better this is as a hint to including your queries you can get very very similar behavior by doing select into right so we’re going to do we’re going to have a little select into magic here and we will see just about the same just about the same execution metrics on this one you know still 800 milliseconds with the table insert within the parallel zone there’s our little parallel racing stripes and the parallel zone ends here so you know I do get questions sometimes like hey is it better to create your temp table or just do select into and you know my answer is that it is somewhat situational I am generally okay with select into but you should be careful with the way SQL Server interprets with the way SQL Server interprets expressions with select into remember that like the beauty of creating a table is you fully control the data types the data lengths for strings the nullability of columns you get to pick all that stuff when you create the table you don’t really you don’t really get that with select into when you if you if you start like having like case expressions or you start concatenating strings to together or like you know you’re doing any math in your select list there’s all sorts of stuff where SQL Server might either implicitly infer or convert your expression to some other data types so if you are getting weird inconsistent results with select into in that regard you may want to think about just creating the table with the types you want but if you’re doing select into and you’re getting the parallel insert and then you do the create table insert in your plan slows down at all you most likely can get around that just adding a tab lock hint to the to the to the to the insert so that you get the fully parallel insert which is commensurate with the behavior when you do select into and like I like this a lot as a sort of a thing now some one like one thing that I always get questions about because like you know like a big part of my consultancy is helping people get away from bad habits and you know of course no lock is a fairly well known best habit and while we’re while I’m talking through you know various ways to get rid of no lock is generally like optimistic isolation levels are a much better idea people are like but dirty reads I’m like but wrong because optimistic isolation levels don’t allow dirty reads no lock and read uncommitted allow dirty reads optimistic isolation levels you can have like stale reads when you are like read like if you’re reading an older version from the version chain then you would expect right because you’re not reading potentially the most up-to-date version of the row but you’re not getting dirty reads but what a lot like you know where the conversation always seems to go is like well if I take out the no lock in to get blocked yes if I use no lock I can get the I can get a row back yes but what’s in that row and then ultimately like someone will ask well what about read past what does read past do read past is situally situationally a very cool and interesting hint if you’re not that’s why you’re not worried about what’s in the right place if you’re not doing something where you’re not doing something where you truly don’t care about rows that are locked you’re okay with skipping over those so what I’m going to do here is just run this simple update for one row in the users table right so begin train update increment reputation by one where user right where the ID equals 22656 so we run that and this has an open transaction and we’re often running and over in this window I have three queries I have this top one query right where I’m saying give me the top one user by reputation descending this will get blocked right because this thing is open and doing something this query with no locking hint gets blocked if I do read past well notice that I get this row back here right before I go on to no lock I want to show you what happens when I roll this back right so I’m going to roll back that transaction if I run read past with this now I get John skeet with a reputation of 1 million 47 thousand 863 right so this is the top reputation in the table coming back over here if I run this update and I do that with the transaction open and running read past skips over John skeet and I actually get the second highest reputation in the table if you’re okay with this if you’re okay with skipping over locked rows and just getting rows and just getting rows that are not locked return to you read past might be okay but that may not be a okay with your users right like that might not be what they’re quite after if they’re looking for stuff like this and of course with no lock well if we run this now if we run this now if we run this now if we run this now and we look we do get like John skeet again but we get John skeet with his reputation incremented by one is this a big deal for this hmm but if you had an optimistic isolation level what you would get back is his original reputation which is of course make sure that’s done this right so is no lock a big deal here probably not is read past a good fit here probably not is this query getting blocked annoying very much so but uh you would get around all of any any sort of like questionable discrepancy if you were using an optimistic isolation level here under both read committed snapshot isolation and snapshot isolation you would still see John skeet and you would still see him with the high reputation that ends in three rather than the one that ends with four right so his reputation prior to the update is that always going to be correct maybe maybe not uh the the important thing to understand here is that uh not every query needs the same locking guarantees and granularity uh there may be times when you can get away with no lock there may be times when you can get away with read committed there may be times when you can get away with a read past hint i find that most workloads work best under read committed snapshot isolation with specific queries targeted to use a slightly different uh isolation level or locking um locking mechanism when they’re reading through data so that part is up to you right if your workload is like you know if your database is under the default read committed isolation level and the only way for you to get queries to return anything ever is to use no lock hints you are much much better off with an optimistic isolation level than you are continuing with the no lock hints so that’s about it here uh that i mean that that is the end of the demo file you can tell there are no more line numbers after this so yeah we have run out of things to talk about so thank you for watching i hope you enjoyed yourselves i hope you learned something and i will see you in uh another video another time we will we will talk about some more store procedure stuff so thank you for watching goodbye

Going Further

All About SQL Server Stored Procedures: Transactions

Posted on March 10, 2025January 22, 2026 by Erik Darling

All About SQL Server Stored Procedures: Transactions

Video Summary

In this video, I dive into the world of transactions within SQL Server stored procedures, exploring their use and limitations. Erik Darling from Darling Data shares his insights, emphasizing that while transactions are crucial for error handling and ensuring data integrity, they come with significant locking ramifications and don’t offer the snapshot isolation many might expect under read committed. He also delves into practical considerations like nested transactions, save points, and the importance of appropriate locking hints to maintain data consistency. Throughout the video, Erik highlights common pitfalls and provides tips on when and how to effectively use transactions in your stored procedures for optimal performance and reliability.

Full Transcript

Erik Darling here with Darling Data. Look how happy we all are. Look how thrilled everyone is to be here. It’s a grand day, isn’t it? Today’s video we are going to continue talking about our good friend, our Lord and Savior, the store procedure. In this video we’re going to talk about the use of transactions in store procedures, what they’re good for, what they’re not good for. And, uh, sort of what people think that a transaction may guarantee that it definitely does not guarantee. So, we have that to look forward to. But of course, before we dive into those titillating, exhilarating topics, uh, let’s talk a little bit about, uh, your money and me. If you would like to support this channel and the wonderful content that I’ve create, uh, I create, uh, you can sign up for a membership down there in the video description. It’s the link that says become a member. Uh, and you can for as little as $4 a month support a starving consultant. Uh, if you enjoy this channel, but you just for some reason cannot scrape together $4 a month, uh, you can like, you can comment, you could subscribe. And if you would like to participate in the, the grandness that is me answering off the screen, you can click on the office hours, uh, you can click on the office hours questions. Uh, this link is also, I’m going to pretend that I’m grabbing it and throwing it down in the video description. This link is available also in the video description. You can click on that and submit your question. And if you’re, if you’re at all curious about if you maybe made some typos or you need to, you want to fix something, you can edit your question after you submit it and make sure and do some like proofreading on it. I would highly suggest, that you all do that because some of the questions that come in are a bit malformed. Um, if you need help beyond, uh, what is available here on YouTube or, uh, what, what you can do on your own, uh, I am, I am available to, to help as a consultant. I am a SQL Server consulting partner and, uh, I can do all sorts of wonderful things for you and your server health checks, performance analysis, hands-on tuning, uh, responding to performance emergencies, uh, and training your developers so that the performance emergencies stop happening. Wonderful set of gifts I have it. And I wish to give to you in exchange for money at a very reasonable rate. Uh, if you would like some training on SQL Server, just to, you know, maybe watch some videos, get a little bit better, a little smarter on your own. You can get all 24 plus hours of mine, uh, for about 150 USD.

Once you apply the coupon code at checkout, it will, it will magically reduce the cost of your training. And that training will be available to you for life. Lucky you. Uh, SQL Saturday, 2025 coming at you live and in person on May the 10th, 2025 in New York city. SQL Saturday. It’s a Saturday, the 10th. Uh, and of course, uh, May the 9th, we have a full day performance tuning pre-con with the lovely, talented, fabulous, and the radius of all to.

So you’ll have a wonderful time with that. And with that out of the way, let’s, uh, let’s talk about these transaction things here. No, we need to go to SSMS for this. Now, uh, transaction and, uh, error handling, uh, usually go together sort of hand in glove. The, um, the ebb and flow of the last few videos have been around, um, error and situation handling, uh, batching, uh, retrying, uh, logic, uh, and, um, stuff like that. Right. So all the stuff that kind of, and debugging all, all stuff that kind of ties in together when you’re trying to make a really robust solution for something in SQL Server.

Um, it doesn’t matter. So like a lot of store procedures are like, not just like running a query, they’re like building a process to do something in SQL Server. So you end up, so what you end up with is like these store procedures that have to like do a lot of stuff that maybe store procedures, uh, don’t always have to do. Right. Like again, like retrying stuff, um, self-healing, uh, debugging, uh, making sure that, you know, if something messes up, you’re able to get your data back into a normal state.

So like all of these things tie in together, uh, when it comes to building a good, uh, process in SQL Server. Uh, the main thing that you need really need to understand about, um, about transactions in SQL Server are the locking ramifications and the things that they don’t guarantee. So like the first thing to understand is that there is absolutely no respect for nested transactions, right?

If you, it doesn’t matter how many transactions you nest, uh, like let’s just say you like, for some reason you nested five transactions. If you ran rollback once, all five would rollback, right? If you, uh, have five nested transactions and you need to commit them, uh, you do have to commit five, like committing only decrements the transaction count by one.

You have to do a one, two, three, four, five commits, but rollback, they all rollback. Right. Uh, so the SQL Server does not do nested transactions in a meaningful way.

Uh, the second thing is that the default isolation level in SQL Server, um, read committed, uh, does not give you a snapshot point in time view of your data. Uh, and it does not, like aside from applying locking hints, like with a select query or like doing something else that locks people out of, uh, whatever data you’re touching, uh, or using something like sp get app lock to lock people out of the code that you’re, that you’re using to serialize access to that. Like you can’t really like with recommitted, you don’t get a snapshot view of the data.

So you have to really think carefully about, uh, like how data flows in your transaction, right? Like one query to the next when there’s, if you’re just doing a transaction with a single query in it, um, there’s not a lot of point because SQL Server already has, um, auto commit transaction. So any like insert update or delete is already happening in the context of a transaction.

Anyway, you really, it’s really having multiple queries and an explicit begin, try and do something that like gives you any ability to like, you know, acidify, uh, a group of queries. Uh, the other thing is that, um, not everything really needs to be in a transaction. Like again, single statements don’t really benefit at all from it.

Uh, the lack of nested transactions has been well documented over the years. And, you know, we do have save points there. It’s not really the same thing, uh, for the sort of remaining stuff in here.

Uh, let’s talk a little bit more about like what happens during a transaction specifically. Um, now blocking is probably the biggest thing that, uh, transactions will, will cause on your server. So like, unless your transactions are really, really fast, uh, you, sort of just like increase your chances of having like locking and deadlocking problems.

You also, um, like you have multiple queries in a transaction. It can make the block process and deadlock XML reports really confusing, right? Like, cause you could, you see like multiple queries in them.

Uh, if there’s like triggers involved or anything else weird, like you just make, you just make it a lot harder to sort of unravel exactly what happened in them. Um, so like be, just be careful with that.

Uh, remember that under read committed, the default crappy isolation level in SQL Server, read queries can block and be blocked by modification queries. And, um, read queries can also deadlock with modification queries.

Um, specific plan shapes and stuff will do that. So like make, make sure that when you’re using a transaction, a it’s rather make sure that when you’re using explicit transactions, make sure that a, that it’s absolutely necessary for you to do it because you know, like either from an error handling or like consistency perspective, um, stuff like that.

Because without it, because if you, if you don’t need it, don’t do it to yourself. Because if any one of those queries, like, let’s say like you have like, like two updates run and they’re really fast, right? And then you have a delete run, but that deletes take what the delete takes like five seconds.

Like some, like, let’s say you don’t have a good supporting index for, for that delete. The locks that those two updates that were really fast took are going to hang on. Until that delete finishes.

So you don’t just have two quick updates that like do something and get out. You have two quick updates that are now beholden to this delete that takes five seconds, right? So you just, you just like increase the surface area of your query in duration.

And the bigger that surface area gets, the worse, the worse your problems get. Because like, you know, like, like, as this query overlaps with other queries trying to do stuff, like, like all those surface areas line up in a way that either like exacerbate locking problems or lead to deadlocks just because like you had some updates over here and delete five seconds here.

And then something else wants to do something over here. And like, you just end up clashing with like the updates or something. Like those locks hold on and start interacting with other queries.

And you end up with a, like a problem, not just for like one of the updates, like all three queries in that transaction now have a problem. Because when they deadlock, it’ll roll back. So just be really careful with that.

Make sure that you use appropriate locking hints to make sure that you preserve a preserved data integrity. If you’re like reading data to modify it later and make sure that you are handling errors, rollbacks. I mean, I guess, and commits appropriately here and make sure that you have tuned everything to happen in that user transaction as much as possible.

Because everything you do between like begin transaction and commit or like whatever rollback happens, those locks have to hold on until that thing makes a decision, right? Like you can’t like just say like when they’re trying to be like, and release the locks now.

Like it just doesn’t happen that way. So just be really careful with all that stuff. Now, you also have to be really careful to understand how isolation levels work in the context of transactions.

Not just read committed. I mean, I know most of you out there are just so desperate to get away from read committed that you’re using no lock or read uncommitted. But if you’re using optimistic isolation levels, whether it’s read committed, snapshot isolation, or snapshot isolation, both of those behave somewhat differently in the context of transactions too.

I cover that in my video series, Everything You Know About Isolation Levels Is Wrong. There’s a section on read committed snapshot. There’s a video about read committed snapshot isolation versus snapshot isolation, where if you have questions about that, you should go watch that one because there’s a lot of good information in there.

But like let’s say you’re doing something like this, right? Where, you know, granted, not my favorite thing in the world, declaring a local variable and setting that local variable equal to a value here.

In this query, we’re looking for the ID of a user in the users table who has, like, reputation has not gone above one but has a post, right? Let’s say they have a post.

They’ve asked a question, but that question has a score of zero, right? No one voted on it. Poor thing. You poor dear. And we want to, like, we want to give you a mercy vote. Give you a pity vote.

You might, you know, of course, you might bail out if, like, no one meets that qualification. So, like, let’s say user to promote is still null. You might just say, no, we’re out of here, right? We’re done.

Return. Stop doing work. But, you know, then you go and do this, and you go and find, like, a post to update based on who that user was, and you update, and you give that, you bump that score up to plus one, right?

Just add a one to that score. Under read committed, unless you take out locking hints, like, here and here and here, you could end up with some weird stuff happening.

Leaving aside the obvious stuff like local variables, which no self-respecting performance tune or whatever do, right? Like this thing in here. You get whoever that first user is, you only get, like, their most recently committed data first, right?

Like, and, like, again, you’re not locking that data so that it can’t change between when you get that data and when you do something with that data. And also, you don’t do anything to ensure that, like, nothing changes between when you get that data and, like, when you do something with the data, right?

So, like, data can change around you up here because there’s no locking happening in the select query, right? SQL Server does not have a select for update. You have to add in locking hints in the from clause, you know, like, updlock serializable, serializable, whatever you need to happen to prevent changes to that data.

And that leaves you wide open to, like, data changing around you, right? Like, data can just change under concurrency. With that, what you could end up with, you can end up with something as wild as, like, that user could delete their profile, right?

They could say, I hate this site. No one ever upvotes my questions or answers. It’s terrible. Everyone just makes fun of me and downvotes me. They could just get sick of stack over from me, like, I’m out of here. Someone could go and, like, go vote on their question.

After you find, they’re like, ah, like, someone comes along and they’re like, wow, this is a great question. Upvote. Now, all of a sudden, that question has an upvote and it no longer would qualify for what our query is doing.

They could also go and delete their question, right? They could say, ah, you know what? I understand why no one upvoted this. Now that I read it, I realize I was being stupid. I opt myself.

I was being a foolish. I was being a silly little ninny. Ah, delete this question, right? So, like, nothing protects that data after you’ve read it and before you update it and do anything with it.

So, like, the transaction doesn’t do that unless you take out specific locking hints. Read committed for, like, the billionth time is not a snapshot consistent view of your data during a transaction.

If you were using snapshot isolation or read committed snapshot, or rather, if you’re using snapshot isolation, it would be. If you were using read committed snapshot isolation, it would be a lot closer to it. But, like, read committed, the default pessimistic isolation level, doesn’t give you that.

All read committed guarantees is that the moment you read data, it was, that’s the way it was committed in the table. And it’s very, very flimsy if you think about it or if you watch my series on isolation levels where I explain all of this stuff in great detail.

So, if you want those changes, oh, sorry, if you want to avoid those changes, you would need to add in the updlock and serializable hints to your select query and, like, the exist subquery and, like, basically any select before you do, like, modifications.

Even if you were to do all that work in a single query rather than two queries, you can run into the exact same issues without locking hints to protect the data that you’re reading, right? It’s, like, read committed locks do not hold on, right?

They just, they get, they happen very quickly and then they let go, right? So, just be very careful with that stuff. You know, so, like, everything in SQL Server, like, whether you specify a cursor or not, kind of under the covers is cursor-driven anyway.

But once those read cursors find the rows that they care about, anything goes, like, until, like, exclusive locks start being taken. Like, anything else can come along and be like, well, like, I got it now, sucker, right? So, some more pseudocode.

This is a pattern that I’ve noticed in more than a few clients now is they do something like this, right, where we’re going to do an insert into a table, right? We’re going to find some user stuff and we’re going to cross-apply some stuff.

And then what we want to do is avoid redoing data. Like, we don’t want to reinsert any rows, right? So, like, this is sort of like a broken up merge, right?

Sort of, like, don’t get me started on merge for various reasons. But let’s just say that, like, if you had a merge that was doing an insert slash update action or any merge statement with multiple actions, you still need, you need serializable hints on stuff.

But, like, let’s just say you’re, like, you’re doing this where you’re going to insert any new data, right? Any posts that don’t exist, you’re going to insert those. And then, I don’t know how that parentheses ended up there, but whatever. And then for any new data, you’re going to just do an update, right?

All sorts of weird stuff can happen in here because you are doing reads from multiple tables where, you know, again, under read committed, even in a transaction, data can change all around you.

And you can run into all sorts of strange stuff, especially, you know, if you’re not locking data out here, right? You’re not preventing changes to this stuff. You can end up with weird race condition stuff in queries like this.

This update is fairly safe, except we are just doing a read from here, right? So there’s a lot for you to think about. The idea of the code is to insert any new users into the high question score table.

And, you know, let’s just say that there’s another query that might come along and update the high score for existing users. The problem becomes fairly obvious, right?

Especially under concurrency. There’s absolutely no reason to roll back new users inserted into the table just because the update query fails for some reason. Let’s say that the badges table was recently modified to accommodate some new longer badge name, but the high question scores table wasn’t.

We got a truncation error. That thing fails. Something weird happens, right? But that truncation error shouldn’t invalidate the new users that got inserted. Likewise, let’s just say that if our invisible score updating query is not going to be a query produced a bigger integer than, like, you know, like, that’s an integer column, let’s say some, like, really awesome question came along and all of a sudden you had a, like, query that was, like, or rather a question that had, like, 2 billion points, like, 2.2 billion points that would be bigger than an integer.

You’re now in big in territory. You would also get an error there. So you just have to be very careful with all this stuff. Like, transactions really only encapsulate queries where correctness, like, data correctness would be affected by one of them failing.

Do I have the right affected in there? Let’s just do that. Just so we don’t have to figure anything out on the fly.

Just something would happen to it. You have to be very careful when you’re using transactions to, A, like, you know, make sure that you need them. B, make sure that they are as fast as possible. C, make sure that you’re doing the right thing as far as rolling back, error handling, all that stuff.

And also make sure that, like, the queries that you’re grouping together, like, go together to form, like, the Voltron of result correctness for the whole set of them. If you don’t need queries to ride or die together, they don’t need to be in a transaction together.

So, be very careful with transactions. Like I said, you can end up with just exacerbating any locking and deadlocking problems that you have in your system.

And they can also make the blocking and deadlock XML reports very, very confusing to look at and read through and figure out what happened. Various other things like triggers and barring keys can make life even more difficult, but that, like, that stuff is, like, also, like, intratransactional stuff where you, like, you know, triggers are, like, an implicit transaction, barring keys.

Those, like, and, like, especially with cascading actions, those promote to serializable. You just end up with lots of weird problems. So, be very careful when you’re using explicit transactions because if you start just, like, mindlessly wrapping, like, entire store procedures or, like, queries that just don’t need to be in a transaction together in them, you can end up with some really painful problems.

So, thank you for watching. I hope you enjoyed yourselves. I hope you learned something. And I will see you in the next video, which will be just as pertinent to store procedures as the rest of these have been.

So, I look forward to speaking more about that. Anyway, I’m out of here. Goodbye.

Going Further

SQL Server Performance Office Hours Episode 3

Posted on March 7, 2025January 22, 2026 by Erik Darling

SQL Server Performance Office Hours Episode 3

What’s the best SQL-realted book of the past ten years? I’m struggling to beat Database Reliability Engineering by Campbell & Majors.

How excited are you to attend the Microsoft Fabric conference this year?

Everyone I follow on YT and Twitter are SQL Server wizards. Please reassure a rookie that not every DBA has the level of knowledge that you and Brent have.

What kind of scenarios have you encountered in regards to Dynamics GP systems? Where there any common problems you resolved as well?

Hi Erik 👋 I saw on the engine documentation that it can be expanded. Is there any physical operator or rule that MS is sleeping on?

To ask your questions, head over here.

Video Summary

In this video, I dive into answering five questions submitted by you, the viewer. We start off with a discussion on the best SQL Server books from the past decade, leaning towards “Database Reliability Engineering” by Campbell and Majors as the top pick for practical database management knowledge. Then, we delve into my thoughts on Microsoft’s Fabric Conference, expressing disappointment in its development and implementation. Following that, I share insights on common issues encountered with Dynamics GP systems, focusing on blocking problems and the challenges of optimizing queries within this Microsoft product. Lastly, I address a question about potential improvements to SQL Server’s query optimizer, suggesting unrolling joins with OR clauses into UNION or UNION ALL sets as an area for enhancement. Join me in these office hours for more candid discussions and valuable insights!

Full Transcript

Erik Darling here with Darling Data. And we’re going to have a thrilling episode of office hours, in which I answer 5 questions that you, the user, submit. How do you submit those questions? Well, that is a great question for you to have submitted right now. You can, there’s a link that appears here, that it also appears down in the video description. And if you click on that, that link, the one down there, you can’t click on that one. That’s not a clickable link on the screen. As devious as YouTube is, I haven’t quite figured out if that’s a possibility or anything yet, but that’s how you ask questions. If you want to sign up for a membership to support this channel, you can do that for as little, few, as $4 a month. There’s a link also for that in the video description. If you, you can, I don’t know, it’s like 50 something, 50 something, 60 people do that already. So you can, you would be in great company if you did. If you are, for some reason, don’t have four bucks or you think I suck and you just watch this out of spite and you want me to have some false sense of hope about the future of this channel, you can do other things to lie to me. You can, you can like, even though it’s a, it’s a hate like. You can comment, hate comment, and you can subscribe, which is the most hate thing you can do. The most hateful thing you can do is to get notified, build your rage every time I put a video up. If you would like to hate me live, you can hire me as a consultant. I do health checks, performance analysis, hands-on performance tuning, queries, indexes, settings, all sorts of other neat stuff. I deal with performance emergencies. If your server is burning down on fire and you can’t place an order or whatever, Oh, I can, I can, I can, I can usually fix this stuff. Then I can also train your developers so that you, you have fewer or less of those problems. Uh, if you would like to get some training, I mean, it’s gonna, it’s gonna shock you that, uh, a consultant has something that has something to sell. Well, I have about 24, 25 hours of performance tuning training. It’s all SQL Server, beginner, intermediate, advanced. Uh, if you use that coupon code brings the price to about 150 USD and, uh, you have that for the rest of your life. That is not a subscription package. It is for you forever. So you’re welcome. Uh, we have SQL Saturday, New York City coming up May 10th of 2025. Beautiful times square. Uh, from what I hear, it’s, it’s still, it’s still there. No, not, not, not in its former glory.

It’s sort of cleaned up. It’s like, I don’t know. It’s a weird place these days. You’re just going to go inside the Microsoft building and hang out for the day. You don’t have to worry about the outside stuff. You don’t have to worry about the, the New York centric view of times square. But anyway, let’s, uh, let’s get through these office hours questions. Let’s answer some of the burning things that my adoring fans have on their minds. Uh, the first question is, and, uh, we’re gonna, we’re gonna do a little spell check on this one for you, my friend.

Uh, what’s the best SQL realtor book? I know we all know that’s related, but that’s okay. Uh, of the past 10 years. So that would bring us back to the year 2015 at this point. I’m struggling to beat database reliability engineering by Campbell and Majors. Well, um, if we’re talking about the last 10 years, yeah, I generally think you’re right.

Um, and unless you are looking for academic books about building databases or like, you know, like, like database space problems that haven’t been solved, like, you know, like query optimizer stuff, cap theorem stuff, um, things like that. Right. Like, like, you know, like bigger database problems that aren’t just like you as like a data person, like working with the database, like making sure that that database stays up and healthy and fast and all that stuff. Yeah. You’re probably not going to do a lot better than that. Um, you know, all of the, the books that I really liked are older than 10 years at this point.

Uh, like the last, I think good SQL Server book came out in like 2012 or 2013. Really since then there hasn’t been a lot of like good, like, like fundamentals internals literature, uh, about SQL Server. Uh, you know, like I’ve said before, all the like Kaylin Delaney books where she had, you know, people from Microsoft, the co-hosting talking about the optimizer internals, all sorts of like isolation levels, locking all sorts of good stuff in there.

Um, you know, like, like good books about SQL Server had been non-existent really. Um, they’re just kind of all the same like rehash of things. Uh, you know, even, even the books written by Microsoft employees, uh, that have come out, um, since then, uh, were, uh, let’s just call them unproductive in many ways.

Um, you know, it’s, it’s, it’s a lot of like, uh, Microsoft company line material on things with, uh, very little admission of, um, you know, uh, where, where SQL Server as a product is somewhat lacking. Um, so, uh, yeah, database reliability, reliability engineering is probably the best, like working with a database book. If you have other interests with databases, um, there, there are probably better books out there that are more academic, theoretical, you know, relational algebra stuff.

But yeah, that’s a good one. Um, you know, not oddly enough, but like a lot of the stuff in that date, in that book, uh, is, I think a lot of the examples, if I’m remembering correctly, were like MySQL related. So it’s not even like specifically SQL Server stuff.

But anyway, uh, how excited are you to attend the Microsoft Fabric Conference this year? Shockingly, I was not invited to the Microsoft Fabric Conference this year. Can you believe that?

I mean, probably what I’m most excited about is to not attend whatever it’s named next year. That’s going to be fun. Uh, whatever the next squirrel Microsoft sees is, uh, you know, I don’t want to spend too much time on it, but man, uh, Microsoft has just lost a lot of goodwill with their inability to stay focused on, on a product and see it through.

Um, the, the, the way that they’ve switched up the, uh, everything that’s led up to Fabric so far has just been annoying to everyone except like a few groups of people. Like a Microsoft employees who will get fired if they don’t pretend they love it. Uh, B, uh, consultants who have to keep re-implementing the same thing over and over again with a different name.

And see like MVPs who are so terrified of getting kicked out of the MVP program that they won’t say boo about a crap product. So like, like, there’s a, like, those are like the weirdest people. Um, cause like, like that’s a closely overlapping Venn diagram or the, the, the consultants who deal with this stuff in the MVPs.

That’s almost a circle right there. But, uh, man, you, you want to talk about an unserious, unfinished product? What? Why? Fabric is like, fabric is just like the, it’s like, it’s like, it’s like, it’s like, and I’m not, I’m not ranking Databricks and Snowflake.

I’m saying like, you have Databricks and Snowflake. And then like, like, like, like, let’s just say like, you have Databricks and Snowflake here. And then you have Fabric like down here.

And there’s not even anything else in here. Like that, like that, that’s, that’s how bad it is. And, um, you know, like I, I have a bunch of clients who have been forced to use it by sea levels and like, because Microsoft either gave them huge price breaks on it or like actually fed them money to start using it. And they are just miserable.

So, um, you know, I don’t, I don’t, I don’t know how to help there. I don’t know. I don’t know what else to say. All right.

Next one. Everyone I follow on YouTube and Twitter are SQL Server wizards. Well, good. You should, you should follow people who you consider to be smart about a thing that you care about. Please reassure a rookie DBA that not every DBA has that level of knowledge.

Well, of course not. If everyone had that level of knowledge, there would be no point in me doing this. I would just be, I don’t know, stating the obvious to everyone.

Um, so of course not everyone has that level of knowledge. Uh, I, a lot of my clients have like full DBA teams full of people who maybe don’t have as much knowledge as I do. You know, uh, and there, there are all sorts of wonderful areas of knowledge out there for SQL Server.

You know, like I, I know nothing about SQL Server outside of like the query and index tuning stuff. I mean, like, you know, some stuff, but like, well, I guess what I’m talking about is like, like if you wanted me to like troubleshoot an AG outage or you wanted me to like, you know, like set up an age, a perfect availability group for you. Uh, like that would be way outside the realm of something that I’m comfortable with because that’s not stuff that I do and study and work with every day and care about.

Like I have friends who do stuff like that. My friend, Sean Gilardi, wonderful with availability, high availability stuff, knows it inside and out. Um, like I just know nothing about it.

And he actually like knows nothing about query optimizer stuff. So it’s like, you know, there are, there are all sorts of, there’s all sorts of space for people to know a lot and be ignorant of, of the rest. So, um, yeah, no.

And, uh, you know, in, in a funny, in a sort of funny way, like I miss a bit being like a rookie with this stuff because like, I don’t know, there was just so much to do. Like the world is this wide open place full of mystery and magic. Like there are so many things to test and try and there’s so much knowledge to sort of like, like eventually like soak in.

It was, it was, it was, it was great. Uh, these days, you know, it’s like you, you build up a callus kind of, and, uh, it becomes harder and harder to be impressed or surprised by something. It’s like you have to, you have to really do yourself now to get anything.

You know, it’s like just keeps taking more and more to, to, to build you up there. Um, you know, so, you know, uh, yeah, but I, I, first up, appreciate the kind words. I do, I do, um, appreciate you, you following me places and thinking that I’m wizard-like in some way.

Uh, but yeah, uh, it, it’s, it is, it is, it is a learning process and, um, we, we, we don’t, we don’t all start off like this. It’s the only person I think who started off, uh, knowing everything is Paul White. So there’s just, there’s just that.

All right. Let’s see. Our next question here. What kind of scenarios have you encountered in regards to Dynamics GP systems? Were the common, what were there, where, where there, any common problems you resolved as well?

Yeah. So Dynamics GP is a Microsoft product and it’s one of the, I mean, I know that they didn’t start. I think they bought it or something.

Uh, used to be something else. They bought it and it’s whatever. But, um, yeah. So it’s one of those unfortunate things where like, it’s another one of those Microsoft products where it’s built to use SQL Server, like built to have a SQL Server database backend.

But the people who did that work never spent one second talking to the people at Microsoft who built the database product. So it’s a real bad time.

It’s real unfortunate. Dynamics GP is like, like, I think for, for what it does, this is one of VWare’s products, the way it works with the database. Uh, unless you have like a pathetically small, like sample set for stuff, like you’re good. Like as soon as that gets to a meaningful database size, you’re in real trouble.

Um, as far as problems that I typically solve with Dynamics GP, um, there’s a lot of blocking. Uh, there’s a lot of selects with upd lock hints. Um, you can fix some of the blocking problems with indexes and with an optimistic isolation level, but not all of them because like the second one of those selects, selects come along with an upd lock hint.

You’re back to having a bad time. Uh, so, you know, you like RCSI is a good choice for it. Um, you know, like, like finding your crappy queries and like seeing if the indexes at all line up with what they’re, what they’re trying to do.

And then, uh, I think the, the final thing that you’re able to really do, cause like all the Dynamics queries come from inside the application. They’re not like store procedures or anything that you can like tinker with. So like your last resort for a lot of things is either plan guides or forcing plans in query store because there’s not much else you can do.

So like, even if you have, like, even if you have a great index in place and SQL Server is not using it because of cardinality estimation, it doesn’t want to do a key lookup or whatever. Like, like you don’t want to add a super wide cause some of those tables are massive, right? Like, like width wise and like, you don’t want to add just like, you know, 40 column includes, right?

That’s not a good time. So like sometimes you do have to resort to plan guides or forced, forced query plans in query store in order to like get your queries to actually use whatever indexes either you already have or whatever indexes you end up creating. Based on what your, what your worst Dynamics GP queries are.

But I have done a lot of consulting on Dynamics GP systems over the years. And every time I wish I had charged more. All right.

So the last question here, somewhat strangely worded, cause it almost sounds like they, they, they, they, they, they, they, they, they were asking a different question, but I understand it now. It says, hi, Eric. Hi, how are you?

I saw on the engine documentation that it can be expanded. It was a little unclear what it was at first, but I think I worked it out. That it is the engine. Is there any physical operator or rule that MS is sleeping on?

Well, I think if there was a specific space that I wish the optimizer did more work in, it would be unrolling like joins with or clauses to be like union or union all sets. That would be where I would go with it.

I think that, that also I hear, well, I mean, I guess they already have done work on that. I’ve heard rumors that Microsoft is working on how they handle sub queries to do that a little bit better because the current implementation is somewhat incomplete. But I haven’t actually seen the results of that yet.

I do know that in Azure SQL DB, I found a couple new optimizer rules that help you when you’re doing like the left join to find rows that don’t exist. So like select from table A, left join, table B on like the primary key columns where the table B primary key column is null. So in other words, there’s rows that are in table A that aren’t in table B.

They do have a couple of new optimizer rules to change that from that query form to like a not exist query form, either a left or right anti-semi join. So that, like those are two new optimizer rules that I have a lot of hope for. I do hope that they make it into SQL Server 2025.

They’re, they’re cool. They’re good things, right? They’re good optimizer progress. But I think, you know, ultimately you can spend a lot of time coming up with stuff like that. Like there are certain, like, like, honestly, like the, the join with the OR transformation would be a great one, right?

That would be fantastic to see. But I think ultimately the, the, the query, like you have the query optimizer here and the query execution engine over here, right? And these two things need to get a lot closer together because like, like you can spend all the time on rules and cardinality and, you know, heuristics and stuff over here and still come up with a God awful plan over here.

And with, unless these two things have a, like a really clear line of communication, like all this stuff over here is just like spinning wheels after a certain point. Because once that query starts running, it just eats, eats it anyway. Right?

Like, like, like the optimizer is like, it’s the perfect plan. I’ve got it. It’s right here. Nothing can go wrong. And then all of a sudden it runs and it’s like, it was wrong and like wrong in a lot of places. So like the query optimization and query execution need to become much closer friends.

Uh, I think in order for, um, Microsoft to maintain, uh, it’s, it’s, it’s, it’s, it’s query optimizer championship belt and for any major progress to get made. Um, because, uh, you know, like for, forever and ever the optimizer was just like, it came up with a plan. And as long as that plan was in the cache, we’d reuse that plan.

And like, there was no communication at all, but like, you know, with the intelligent, intelligent query processing set of features, there’s been more and more sort of like feedback and communication and stuff there. And like, those are great steps, but I think ultimately what you need is way more, um, you know, uh, way, way more like runtime flexibility, uh, for queries. So that like they’re, they don’t get stuck the way that they do now.

Like, like for a lot, aside from adaptive joins, which, you know, are not horrible, but they’re not, but like, if all you get is adaptive joins, there’s a lot of other stuff that you leave on the table. Uh, that could be like made, but they could be more adaptive at runtime that are not, um, you know, like, like, like one of the things that sticks out to me is memory grants. Where like, if you just get a way under memory grant, like you like at runtime, you’re hosed.

And I understand there’s a lot of cost and complexity and there’s a lot of stuff that can go wrong. If you were to just start like dragging memory in for a query, there are other things that have sort of dynamic memory, like, like index creation and some stuff with like, I think columnstore inserts. But, um, like query runtime memory, like you can only get that feedback after the queries run.

And like, if you had a query that ran for like an hour because it was spilling out to disk, you know, like, I don’t know. I don’t know precisely how valuable that feedback is after an hour, right. Or after however long that query runs for, like there’s, there’s stuff that I wish were way more runtime adaptive to prevent like, like the really bad query holdup stuff.

But I don’t know, I don’t know. I assume that smart people have looked at this and either disregarded it or decided to choose a different path for things completely. So, um, I don’t, I don’t think that I have anything, uh, particularly, uh, new or enlightening to, to, to provide to them.

But, um, there’s, there’s a lot of stuff that I look at when, uh, when I’m looking at query plans, when I’m looking at like performance problems. And I’m like, well, you know, it’d be nice if SQL Server had like a little bit like a fail safe on this stuff. You know, um, I think, I think one, one area where I think the optimizer, actually, sorry, two areas where I think the optimizer could use, um, a lot more work are, uh, costing for random IO and the value of early aggregations.

But before joins, cause you know, like, you know, lookups and loops and stuff are, nested loops are all still costed. Like they’re on crappy old spinning disc hardware. And, uh, a lot of the times you have to manually rewrite queries to use early aggregations, uh, because, uh, the optimizer just won’t.

Like, even if you have like a ton of duplicates and two tables of the columns you’re joining on, like you, you have to physically rewrite one or both, uh, to do like a group by or something before the join. Otherwise SQL Server just doesn’t choose to do that. And you end up doing way more work, dragging way more stuff along.

It’s just, it’s, it’s, it’s, it’s aggravating. So that’s some, some areas where I think the optimizer could use some work. Anyway, that brings us to the end of these five questions. Um, I will ask five other, I will answer five other questions in another video.

So you have, you have that to look forward to, but thank you for watching. I hope you enjoyed yourselves. I hope you learned something and I will see you in that, in that next, in the next video, whatever, whatever that is, whatever magic mystery tour we go on.

All right. Thank you.

Going Further

All About SQL Server Stored Procedures: Debugging Logic and Performance

Posted on March 6, 2025January 22, 2026 by Erik Darling

All About SQL Server Stored Procedures: Debugging Logic and Performance

Debug examples.

Video Summary

In this video, I delve into the crucial topic of debugging stored procedures in SQL Server, emphasizing the importance of adding steps that allow for easy identification and resolution of issues. Whether you’re running a procedure directly from Management Studio or through an agent job, understanding where to direct your output is key. I also share practical tips on what aspects of debugging to focus on—logic, performance, and dynamic SQL—and provide examples of how to implement these in your procedures. Additionally, the video includes links for those interested in supporting my work, such as joining a membership program or seeking professional assistance with their SQL Server needs. Whether you’re looking to improve your stored procedure hygiene or need detailed guidance on debugging complex scripts, this content is designed to be both informative and actionable.

Full Transcript

Erik Darling here with Darling Data, and today Bats and I are going to be talking about something that I think is very, very important to the effective building and maintaining of stored procedures in SQL Server, and that is, of course, debugging. The longer and more complicated your stored procedures get, and especially if they involve dynamic SQL of any real consequence, the more important it becomes to add in steps so that people can find issues, errors, and insights along the way. But equally as important is understanding the context that people doing the debugging will be executing the stored procedures in so that you target the output to the correct venue.

You know, if the debugging is going to be taking place in Management Studio, and someone’s just going to hit F5 and run it and go off on their merry way, cool. You can just print and select and raise error all you want. It’ll do just fine. But if you are going to be running this sort of in a headless way via like an agent job or some other scheduled process or like from some like, you know, or even just like an end user doing stuff, you would have to direct that output somewhere else.

You would have to put that in the table or something and, you know, it’s up to you to figure that out and what needs to be included in that output in order for it to be useful to anyone who might come along and debug it later. Typically, I find that debugging should be verbose. You should start more verbose than you think you need and pare down what you don’t need rather than having to keep adding things in and in and in later.

But before we do that, let’s talk about some very important things here. Let’s talk about how you, Wary Traveler, can support my efforts to keep producing this content. I offer memberships to the channel. You don’t have to sign up for one, but I appreciate it if you do.

If you do not feel like signing up for a membership by clicking on the link in the video description right around in here somewhere, you can do other things that help me sort of get some, you know, joy out of life. Liking, commenting, subscribing. And of course, if you would like to ask questions privately that I will answer publicly on my Office Hours episodes, you can go to that link. Well, you can go to the clickable version of that link down below and ask all your questions.

If you need more help with SQL Server than asking a question can possibly provide. Well, there are a lot of folks like that out there in the world and I work with them quite frequently. make their SQL Servers go much, much faster than they are today.

If you need a health check, performance analysis, hands-on query or index tuning or both probably. If you’re having a performance crisis or you want to get your developers trained up to avoid needing all the other stuff. Well, I am a consultant and as always, my rates are reasonable.

If you would like to get some training on SQL Server performance tuning, you can do that from me. For about 150 USD, you can get for the rest of your life, all 24, 25 hours of my performance tuning content. Again, there’s links down below where you can have a wonderful time clicking on things.

If you would like to see me live and in person, I will be attending as an organizer, SQL Saturday, New York City 2025. It’s taking place on May 10th with a performance tuning pre-con by Andreas Walter on May the 9th. And that’ll be, they’ll both be great times.

So, if you’re looking for it, looking for something to do in May, in the merry month of May. You’ve gotten sick of strolling through parks. Maybe, maybe you could show up to the Microsoft offices in Times Square and do a whole lot of learning about a whole lot of data stuff.

But let’s move on. Let’s talk about debugging. Now, this is a script that I have put up on my GitHub page. There will be a link to it down in the video description.

There are links to everything in the video description. But some important stuff to consider when it comes to debugging. There are essentially three things that you might care about debugging in the context of a store procedure, depending on what it does.

Logic, performance, and dynamic SQL. So, you know, logic would be like control flow stuff, dynamic, like printing dynamic SQL. Performance would be like getting query plans from like the queries that matter in the store procedure.

You typically don’t want to turn on query plans for everything and have just like, SSMS crash because you tried to put 5 billion query plans in the messages and the return results. And then sometimes you may want to generate but not execute dynamic SQL so that you can see what it looks like to fix syntax errors.

It’s usually a lot easier to like see it printed out, copy and paste it to like a regular window, and then figure out where something went wrong, where that, you know, incorrect syntax error, like a single apostrophe is or something like that.

A lot of the times that’s going to be the silent, sort of like silent truncations of your strings. So be very careful with that. So, you know, not really related to debugging, just sort of good store procedure hygiene, no count and exact abort.

Depending on what the store procedure is doing, you may not want exact abort on, because that will force the whole thing to stop running and just basically like return, like the whole thing will terminate rather than just a single statement terminating.

So depending on what you’re doing, that might be good or bad. No count you usually do want on though, because like that blah, blah, rows return or rows affected message is annoying. So like stuff that I usually put in there are things for debugging dynamic SQL, things for debugging loops, and, you know, other ways to sort of collect sort of like runtime, like information about like contextual environmental information.

You know, like a lot of the scripts that I write, they are for SQL Server analysis, right? So sometimes I have to do things like collect version and addition information so I know which columns and which DMVs are going to be there. So what I’m allowed to touch and like, you know, different things that I might have access to depending on that.

So, you know, there’s, there are some important debugging things that you might want to get so that when you go and look at, you know, your final results, you understand why there was a problem with something. But, yeah, that gets that.

And then down here, you know, stuff like debug logic, right? So when you want to debug logic, one thing that you should, one thing that you’re probably going to want to do is have some message print that says the thing you’re going into before you go do that thing. Because like if this were dynamic SQL, you could print out the dynamic SQL, right?

That would be fine. You could, you could see what it got to. But if you’re just running a store procedure and you don’t, you like, you want to figure out which step you’re in and like which, what you’re like, what query you’re about to run. And then like, you know, when you’re finished with that query or something, then, you know, just a simple message like this, just informational only.

And again, like if you’re running this in SSMS and you’re fine with this, like reading through the results, because that’s what most of my work is. Like, I’m not like orchestrating things from other places. Like this might be totally fine.

If not, you would want to log this to a table somewhere. If you were doing this all in a big transaction, logging it to a table would be impossible because if you rolled everything, rolled something back, you would like if you’re like, okay. You can put stuff in a regular logging table and like see the steps from along the way.

When you are dealing with things like loops or you want to figure out if results get weird because of either like too many or too few rows happening with something. You would want to like, you know, you could like log the number of rows that came out of like they got that happened in this here, which is sort of like a silent, like set no count off. Right. Because you’re not, it doesn’t get immediately returned to you.

It gets assigned to this, this variable. And then if you’re, if you’re doing the debug logic stuff in here, then you can print, you could print that as a message and say, hey, this is how many rows you got from sys.databases from that query. Right. So you would say in the output, it would be like selecting from sys.databases.

This is how many rows are in there. Cool. And then if you were going to select from that table, this is what I like to do so that I know, like if a result was empty from something. Uh, because I don’t, I don’t always do the row count thing sometimes.

And I’m like, like, especially for the analysis procedures, it’s like, there’s not a lot of point. Uh, but so like what I usually do is towards the end of the procedure, I’ll put in something like this with just like all the temp tables that I use so that I can see which ones were empty. Right. So like if there’s something in the temp table, then we’re going to select the table name with, uh, like the right table name here.

We’re going to select all the results from that. But if there’s nothing in that table, like it would just say like, it would just be like an empty result set with like, you know, whatever is in the D pound sign D temp table. Sometimes you might want to say something like, Hey, that table is empty. Right.

So, you know, exactly where, where along the line things either went wrong or you might have to change stuff. Um, dynamic SQL, like I said before is sort of fickle. Right. So, um, like, it’ll like, it, like it, like it, like it, like it, like it, like it, like it, like it, like it, like it, like it, like it, like it’ll, it’ll truncate for just bizarre reasons. Right. Like, like, even if you have all your, like Unicode and prefixes in there, uh, you just like concatenate this one slightly smaller string onto like a much larger string.

That’s already in a, already in an VAR car max and should just get it blopped right on and like, just maintain the maxness of it. It doesn’t always work. So like you, like sometimes it’s a, like, it’s a very good idea to print out your dynamic SQL and one, one sort of like, I mean, just quick and dirty way of, you know, if like print has a limit to how many characters it can spit out and show you. So if your, if the length of your dynamic SQL block is, you know, like maybe like 8,000, like 4,000, 8,000, 8,000 bytes or something, uh, you might need to have a loop that go, that iterates over your, um, your string, your dynamic SQL string and prints it out in manageable chunks so that you can see the whole thing.

Uh, but, uh, and then other stuff that is neat to do, if you are using loops, uh, you, it, I always find it very, very useful, uh, aside from having debug logic to like, tell me when I’m going into the loop, when I’ve done certain things within a loop, when I finished the loop, uh, to let me know, like, which, like where I am progress wise. Right. And raise error is way better for, than, than print for this because raise error, you can put this no wait command on there.

And that, and like, like print, you’re, you’re stuck waiting for like buffers to fill up before print will like, like it’s forced to print a message with, with raise error, with no wait, you can skip a lot of that. So, you know, like you could, like, you could put this all into one raise error. I like to have it in three.

So it’s kind of like easy to read through, but you know, you are starting the loop with a low ID of this number, the high ID of this number. And like the total entries are there. So like, you know, then I have my cursor and, you know, entering my cursor loop and, uh, you know, some information about like, like where I am in the cursor.

Uh, I always, I always find like the, the fetching next and the incrementing things to be very, very useful because it helps me catch logical bugs where I maybe didn’t do things in the right place. Uh, and then of course, when you’re finished, uh, it’s great to print finished messages here so that, you know, you’re done with the loop and you can move on from there. Um, as far as like performance debugging goes, um, what I’ll usually do is something like this, right?

Where, uh, like I’ll like, like select a little bit of information about what’s happening. And then, uh, I will use set statistics XML on, right? So if we’re debugging performance, I want to get some information here.

I want to turn on query plans for a specific query. I don’t want to have query plans for absolutely everything in the procedure. I want to, when I get query plans for that stuff, then I’m going to get some finish information, some finishing information rather. Uh, we’re going to look at like say that this query finished, this was the end time.

And this was the difference in time there. Uh, we’re going to turn off query plans at the beginning of that so that we don’t like have any duplicate, like, well, like it’s only get a query plan for anything else basically. So there’s all sorts of good stuff you can do to get performance information too, right?

So like that’s often a very critical step is being able to like, when you have a performance problem, like focusing on the, focusing on the queries where, you know, like performance is, uh, either like important or an issue generally so that you don’t have to, you know, sift through 10 billion. Like variable assignment and like tiny little query plans that don’t add up to anything. It just gets annoying and frustrating.

And sometimes query plans get so long, like, like from store procedures that do a lot of stuff that like you try to scroll down to the bottom and there’s just this like big empty space in SSMS. And it’s like, okay, what happened to the query plan? Can’t render it.

Then guess what? You have to open up like the plan is XML and edit out a bunch of stuff and like delete a whole bunch of like the statement stuff and then reopen it. It’s a nightmare.

So don’t, don’t, don’t do that. And then, um, you know, of course, you know, error handling, try catch stuff, all very important when it comes to debugging. Uh, but like one thing that I find invaluable when it comes to debugging things is at the end of the procedure, I like to have a full accounting of what parameter values, like the way the parameter values got passed in.

And especially if I have to, if I end up doing any parameter value manipulation, I want to know what change in there. And I also want to know what variable assignments happened along the way so that I can, I can get some sense of like, you know, like if anything in here is unexpected is to like the flow of logic and the procedure, I want to know what values I ended up with. So there’s all sorts of good stuff you can do with a procedure like this, um, or with debugging, uh, to give you some sense of like, you know, like where you are in the procedure, stuff like that.

Uh, like how you’re making progress, again, printing dynamic SQL, things like that. Uh, a link to this script will be available in the video description because it’s useful for people to see these kinds of examples so that they can incorporate them in their own store procedures when they’re doing things. But, uh, you know, debugging is a very, very important step in, uh, building, uh, maintainable and, you know, like triageable procedures, right?

Because if it’s just a big, long procedure with some crappy comments in it and you start all of a sudden having problems and you don’t, like it’s either erroring out or miserably slow, you’re going to have a hard time figuring a lot of stuff out, right? And, you know, let’s face it, you know, like code is a lot like data. It never tends to shrink, right?

Like you never, never end up working with less data or working with fewer lines of code. Everything just keeps getting bigger and messier and more complicated and harder to use. So you should really, um, put as much debugging as you can into, uh, into things from the beginning.

So you’re not stuck or you, a, you’re not like caught flat footed when you start having a problem. B, it takes a long time to sort of like work all the debugging stuff in and not introduce more errors with the debugging stuff. So, uh, yeah.

Anyway, I hope you enjoyed yourselves. I hope you learned something. Uh, and, uh, I will see you in the next video in the series about store procedures where, um, I forget what, I forget what the topic is, but I, I, I promise it’s, it’ll be equally as exhilarating. So anyway, thank you for watching.

Going Further

Replacing Subqueries With Conditional Aggregates In SQL Server

Posted on March 5, 2025January 22, 2026 by Erik Darling

Replacing Subqueries With Conditional Aggregates In SQL Server

Video Summary

In this video, I delve into the world of optimizing SQL Server queries by replacing multiple subqueries with conditional aggregates. You’ll learn how to transform complex and repetitive subquery logic into more efficient join operations, significantly improving query performance without altering your database schema. Alongside practical examples, I walk you through a real-world scenario where we compare the execution plans and results before and after optimization, demonstrating not only speed improvements but also ensuring correct results. Whether you’re looking to enhance your SQL Server skills or just curious about how to make your queries run faster, this video offers valuable insights and tips that can help you become a more efficient database consultant.

Full Transcript

Erik Darling here with Darling Data. And we’re going to take a little break from talking about store procedures in this video. Though you might see this pattern in store procedures. Bats was sick of hearing me go on and on about store procedures and he told me in no uncertain terms, we should talk about something else today because it is Friday. It’s time to talk about something slightly different. So this is, today we’re going to talk about replacing sub queries. Sub queries, notice sub queries there is pluralized. Not, I’m not talking about replacing a single sub query because SQL Server’s optimizer does convert sub queries to joins, but multiple sub queries where you’re calculating almost kind of the same thing over and over again with conditional aggregates. All right. So, um, rather than making multiple trips with like the same query or very similar queries, uh, to a set of tables, you’re just going to replace that with some joinery. Um, but before we do that, um, but before we do that, um, we are going to talk about how you, you could replace money in my bank account with money, uh, from your bank account. You can do that in a number of, of very sweet, kind, tender ways. Uh, if you would like to support my endeavors to bring you this fantastic SQL Server content, uh, right, right in the video description, I want to make very clear the video description holds the answers to all of your link clicking questions. Uh, you can see, sign up for a channel membership and for as few as $4 a month, you too could feed a starving SQL Server consultant. Uh, if you like my content, but you don’t have $4 a month, um, I don’t know, I guess tax season is upon us. Uh, you can like, you can comment, you can subscribe. And if you would like to ask me questions, if you would, if you think, hmm, this, this Erik Darling seems to know a thing or two about SQL Server, I might, I have a question about SQL Server. You can follow this link, which is again, is in the video description.

And you can, you can get right to where you, you ask me the questions. If you need more help with SQL Server, uh, once again, I am the best SQL Server consultant in the world, uh, with the, the exception of one small island state in the, in the Pacific, which I’m, I’m tired of naming, saying by name. Uh, I don’t know. I don’t know. I don’t know. I’m gonna feel good about myself. Uh, I, uh, health checks, performance analysis, hands-on query, index, server, everything tuning. Uh, if you’re having a performance emergency, or if your developers need training because you keep having performance emergencies, well, golly and gosh, my rates are reasonable.

Uh, if you would like to buy some training from me, you can also get that for a very reasonable rate. Uh, but 150 US dollars for life. You can get 24 hours of it. Again, a fully assembled link for you to make that purchase is down in the video description. God, I make this so easy for you. I don’t understand why, uh, why you act like it’s difficult.

Upcoming events we have, and Bats is very excited about this because, uh, Bats is actually going to get some Pez put back in him for, for this one. Uh, we have SQL Saturday 2025 taking place, uh, in New York City. That is a New York City version of SQL Saturday. Uh, Bats is terribly empty right now. Uh, on May the 10th with a pre-con, uh, given by Andreas Volter on May the 9th.

Uh, you should, you should obviously attend both because it will do you a lot of good. Get out into the world to visit, maybe, maybe not a new place, maybe a familiar place. Uh, maybe see some familiar faces, maybe make some new friends. I mean, who, who knows?

You take public transit in New York City and just, you know, you could, you could meet all sorts of fascinating, interesting creatures. But, but, but, but, but, but, but, but, let’s go talk about replacing subqueries, multiple subqueries, with conditional aggregates. So I’ve been seeing this pattern in a lot of client code, like, over the years.

And for some reason it just never dawned on me. Let’s talk about it. I don’t know why. It’s a weird thing. Because it’s a thing that I’ve spent a lot of time rewriting in various, various situations, in various ways.

And, I don’t know, maybe I’m just finally sick of doing it. But, here we are going to talk about how, how, what it looks like and how you can fix it. Uh, so the first thing I’m going to do is just, um, clean up this temp table stuff.

And then I’m going to dump some rows into the temp table. Uh, you see we get 13,265 rows. And, uh, so what, this is what I see a lot is, um, like maybe for some reason, like you had all these subqueries in the initial insert.

And you were like, oh, this insert’s taking forever. Oh, I hate this. Oh, good.

Can, this is all messed up. And, so like, I don’t know, for some reason, someone thinks it’s a great idea to set, like, do like an initial insert to get some primary key values. And then come back and either update the table and update all the remaining columns.

Or, depending on how tricky the logic is, maybe spread that out over multiple updates. So, like, just as an example, like the logic for, come on, zoom it. Five times I hit control and one before this thing responded and zoomed.

Uh, like, like, like stuff like this can be kind of tricky to, to figure out for some people. So, like, sometimes I’ve seen, like, like other updates get pushed in, or rather other, um, aggregates like this get pushed into separate updates. So, you’re not just updating the table once after you insert some rows.

It’s just like, like multiple, multiple update steps, which just drags on and on and on and on and on. Um, you know, especially with the bigger temp tables, right? Right. So, uh, the insert for, I’m assuming it being a real, real problem.

Uh, the insert for this did not take very long at all. Just a hundred and seventy something milliseconds. Uh, not bad, right? It’s fine.

But then, uh, what we’re going to do is we’re going to do the update. And then we’re just going to sanity check the update by looking at John Skeet stats. Because John Skeet stats, uh, are a great way to sanity check most things. So, let’s run this.

And we’re going to wait for this update to finish. Because that’s, that’s just what we have to do. We don’t have a lot of choices. We must wait for the update to complete so that we can judge its efficacy, don’t we? All right.

So, this is the query plan for that. And this is not really the great kind of query plan. Like, this is, you know, granted, we could probably do some indexing work on this to make things a little bit faster. But, uh, let’s just say, I don’t know, maybe we, maybe we just want this thing to be faster without creating indexes.

And, you know, this, like, you know, having to deal with all that stuff. Maybe we just want to rewrite the query. And we want things to be magically faster without, like, changing the structure of the database.

Maybe we’re not allowed to for some reason. But, anyway, uh, this whole thing takes about 9.1 seconds, right? Not a good situation.

Nine seconds. We are unhappy with nine seconds. But what I want to do before we, before I forget, is grab John Skeet’s stuff from up here. And let’s just drop this into a little comment here.

And if you’ll bear with me for just a moment, I’m going to line some of these columns up with the magic of tabs. So that we have something that we can look at quickly a little bit later. So what you’re usually better off doing is something like this, right?

So let’s just clear the data out of that table. And what we’re going to do is we’re going to insert everything into the table all at once. But the way that we’re going to handle the aggregates is with this sort of conditional case logic, right?

So for the top question score, we’re going to say get the max score where post type ID equals one. Same thing for answer score. For total question score, we’re going to do that with sum instead of max.

And then for questions and answers, we’re going to do the same thing with sum. But rather than using the score column, we’re just going to use the number one, right? So every time post type ID is one, we’ll add one to whatever we’re summing up.

And the same thing for when post type ID equals two, then we’ll add that. Otherwise, we are adding zero to the total, right? And when we do this, we only need one join to the user’s table.

The thing is, this does have to be a left join because an inner join would restrict rows where people don’t have stuff in there. And there might be people who, for some reason, have a reputation of over 100,000 or over, sorry, over 10,000, right?

That’s a five digit number. So the comma goes there. And you might, so like this might actually get people out, right? Like maybe they have a high reputation for something else. I don’t know, their test accounts or something.

I don’t know. I don’t know. I don’t know what to tell you there. But when I tested this and I had an inner join there, we had fewer rows ended up in the 10 table.

And I said, that’s not good. Eric, you can’t show that to people. You can’t go out in public like that, not wearing pants. So yeah, this does have to be a left join because remember, subqueries do not eliminate rows, right?

So when you do a subquery like we did for these up here, these all end up as left joins, right? These are all left joins because they’re not filtering queries, right? There’s no rows.

No, these are not allowed to filter rows out from the results, right? So these just find the columns in the temp table and do an update based on what’s in there.

They don’t like remove rows from the temp table. They don’t like, there’s nothing gets restricted. So let’s, I think I already truncated this, but let’s do that once more just in case. And remember this, that whole thing above took about nine something seconds, right?

9.1 seconds plus the hundred milliseconds to get stuff to insert it into the temp table initially. And so let’s run this.

And this should run just in a, just about a second and a half or so. And I want to grab this from up here and just drop this down here so we can do a comparison to make sure things are correct.

Let’s make a little note, original result. And now let’s go look at the execution plan. And of course, even without doing any index tuning, we bring this down to about one, like about a second and a half, right?

1.6 seconds down from 9.2 ish seconds just by like eliminating the multiple, like, like hitting the post table multiple times and just doing a single join with that conditional logic.

Now, if we go visit the results and we say copy with headers, oops, for some reason I hit control and E rather than control and R and I don’t need to execute that thing all over again. That would be no fun.

But let’s line this stuff up and let’s just make sure our results are accurate or rather let’s just make sure our results match because that’s a pretty good sign that we tuned the query and we got things correct.

And if you look across here, everything lines up just perfect. And so this is not only faster, but yields correct results. Two very important things when it comes to query tuning. Some people might tell you that they tuned a query, but then you look at the results and they are completely different.

Of course, you do need to be a little bit careful. And when you, if you tell someone that the results are different because all sorts of things can make results look like they’re wrong, but they’re actually correct. I think, you know, and maybe, I don’t know if that’s worth a different video, but like, you know, like obviously different query plans, serial versus parallel, even like running the same parallel plan over and over again.

If you’re not ordering by something deterministic, the, you know, the results are going to come back in all different, like, like all different ways, right? If you’re ordering by a non-deterministic column where you just don’t have an order by on the, the comparing query at all, like the results might look all different.

Like you might have all different IDs coming back in different orders, but like the results for those IDs are all correct across. And then like, you actually do have the same set of IDs for both, just not in the same order.

So whenever you’re comparing things like that, you know, of course intersect and accept are very useful for that sort of exercise, but also just make sure that you’re ordering the final results when you’re comparing them to make sure that like, you know, like other things aren’t interfering with the result order that, so that they look like they’re wrong.

But anyway, this is a great way to replace multiple subqueries with conditional aggregates. Make your SQL Server queries go faster, get correct results and look like a big hero, right? Go from 9.2 second query to a 1.6 second query.

Gosh, you’ll, you’ll look, you’ll look real good in your next performance review. Don’t you? Don’t you?

Sparkling. Anyway, thank you for watching. I hope you enjoyed yourselves. I hope you learned something. And I will see you, well, like I said, this video drops on Friday, so I do hope you have a great weekend.

And I will see you, see you Monday with an office hours video, so adios.

Going Further

All About SQL Server Stored Procedures: Batching Modifications

Posted on March 4, 2025January 22, 2026 by Erik Darling

All About SQL Server Stored Procedures: Batching Modifications

Video Summary

In this video, I delve into the topic of batching modifications in SQL Server stored procedures, a practice aimed at managing large-scale data changes more efficiently. Batching helps mitigate issues like locking and blocking, as well as potential server resource overloads, by breaking down big tasks into smaller, manageable chunks. Throughout the video, I explore various scenarios where batching can be particularly useful, such as when dealing with updates that might require rolling back if an error occurs. I also discuss practical techniques for implementing batch modifications, including using primary keys to efficiently filter and process data in batches, setting up external control tables to manage loop execution, and considering isolation levels to handle potential concurrency issues. If you’re interested in supporting the channel or need consulting help with SQL Server performance tuning, there are links provided to get started. Additionally, I cover some rare but important considerations like preserving previous data states for rollback purposes, which can be crucial in certain situations.

Full Transcript

Erik Darling here with Darling Data, and we are back to talking about stored procedures in SQL Server. And in this video, we’re going to talk a bit about batching modifications. Because there are some things that come along with batching that not a lot of people think about as they get along in their process of stuff. Right? And, there are some important things to consider, depending on what kind of batch modification you’re doing. And, also, what, what, like, how you need to handle various, like, tying back into last video about server-seed with the error handling stuff, sort of figuring out, like, okay, well, like, what’s going to happen if under these, these circumstances, like, if I hit an error, if, like, something weird happens, like, what do I do? So, we’re going to talk a bit about that. If you like this channel, and the content that I produce, and you think, wow, Erik Darling is worth four bucks a month, there’s a link down in the video description, where you can become a member of this fine channel, and, and do that. But, don’t let four dollars a month be, like, like, like, the limit. You can go, you can go higher than that if you want. Like, I’m not gonna, I’m not gonna complain. If you want to, like, instead of buying me, like, some of one espresso, you want to buy me some of two espressos, that’d be, that’d be cool, too. If you are, if you’re, if you’re, if you’re more of a tea person, and you, you just don’t want your money going towards the, the evil coffee bean, you, you, you can do all sorts of other things to support the channel. You can like, you can comment, you can subscribe, and if you want to ask questions that I will answer publicly in my office hours videos, you can do that via that link, which is also a link down there to make it really easy for clicking. If you need consulting help with SQL Server, if you are having performance problems, and you, you think it’s high time you hired someone to come look at these things, well, guess what? My rates are reasonable. I do all this stuff, and I’m great at it. Thanks for watching. If you want some SQL Server training, I have that, too, also at a reasonable rate, 100, but 150 US dollars per month once you apply that there discount code, and, and, and, I don’t know, you, you, you sit, and you watch it, and you learn, and you get better at stuff.

You actually have to watch it for that to happen. It just does, does not happen through osmosis. You cannot just headbutt the screen and receive information. SQL Saturday, New York City, 2025 is happening May the 10th, with a pre-con on May the 9th, a performance tuning pre-con by Andreas Walter, and we will, we will have a great time on both of those days. I will be at both, um, serving you lunch. I’m gonna be dressed up like a, like a, like an old school lunch lady. Hairnet, apron, rubber, weird baggy gloves. I got, got the works. It’s gonna be fun.

Um, anyway, let’s talk a little bit about batching modifications. So, um, the goal of batching modification is to accomplish a large number of row modifications without, like, causing a big issue on the server. Um, you know, like, if you have a very, very large amount of data that needs to change in a database, um, batching modifications is a great way to lessen the impact by separating it into smaller changes, smaller batches of changes. Um, you know, like, from the perspective of, you know, like locking and blocking, uh, the transaction log and, uh, you know, like hitting, like physical problems with the server, like, like, like your log file fills up and like you, like you hit an error and the whole thing rolls back.

Um, you know, like having, like hitting a, like hitting an error in one batch is a whole lot less devastating than hitting an error when you’re like 99% of the way through a really big modification. Uh, there’s just a lot of stuff that, um, you know, there’s a lot of stuff that, um, you know, there’s a lot of stuff that batching modifications can help. One thing that it may not help is the overall time it takes to get that amount of work done.

Cause you still have to do that full unit of work, right? Like let’s say, um, you need, you have to do something to like 10 million, 20 million rows and doing it all in one chunk would take an hour. Uh, batching up the modifications, it might still take an hour, but the point is, during that hour you’re, you’re just biting off smaller chunks rather than trying to do the whole thing at once.

Right. That’s, that’s the goal of batching the modifications. Uh, probably like, you know, obviously like the canonical post for me about bashing modifications is Michael J. Swart, uh, originally published back in 2014.

I still reference it whenever I need to start something, uh, just to make sure that I’m like, like mentally, I’m like in the right place for it. Uh, you might have some other requirements as well. Uh, a while back I wrote this post, uh, called batching modification queries is great until, and, uh, this is about updates.

So I do want to walk over to that, uh, that code for a second, because, uh, there is some, there is some stuff in here that, you know, does bear repeating. So like, if you wanted to, um, if you wanted to write code in a way where like, like if you’re like inserting data into a table from like something else, like batching inserts, fine.

Like, you know, like you, you, you don’t have to worry about too, too much because you have the source data over here, right? Like you’re just pulling data from here over here. And then when you’re done with that whole thing, like with all your data is there and safe and sound, you can truncate this if you want.

You can also keep it for reference. I don’t, I don’t know. I don’t care if you’re deleting data and you’re just like, this is stuff that we no longer care about, or this is stuff that we legally have to get rid of and we can’t keep it around anymore. Well, I mean, again, like if a batch error is out for some reason and you move on to the next batch, you just have to figure out at some point, like, okay, what went wrong there?

But if you’re updating data and, uh, either something goes wrong or you’re in a situation where like, okay, look, uh, I have to update these million rows. I’m going to do it a thousand rows at a time.

But if any one of these batches errors out, uh, I have to change this data back to the way it was before. I can’t keep going. I like, I like there is a problem with like the data somehow that I need to keep going. Like one handy way of, uh, dealing with that would be to like do an update with the output clause and output the, the data that changed into another table so that you can roll back that stuff.

Because if you’re doing, uh, all these batch modifications, you don’t want to do it all within the scope of one transaction because then you have most of the same problems that you would have if you were trying to do stuff in just one big update anyway.

And if you do it in like smaller transactions where you’re like, you know, you can either like do a transaction for every update or, um, we’re going to talk later about the concept of, um, like, like batching batches kind of for like with transactions.

Like, like once you commit that, that’s, that’s done. And you can’t just, you can’t roll that back. Like you’re onto the next transaction once that starts.

So like, like for some stuff, you know, you may have to preserve the data as it was before you made the changes for updates in order to roll that stuff back to something else. So the update, um, the update stuff is very useful or rather do the update with the output to capture that data can be a pretty useful thing in those cases.

Those are pretty rare though. Like those aren’t things that I run into very often. Right. So I, and like, well, there’s, there’s two things about that. One, it’s not a requirement that I run into very often.

Most people are just like, well, if one batch hit a problem, we can, you know, we can just figure out what went wrong with that batch. We don’t need to roll back like, you know, the 90 million rows we just changed because, you know, there was a problem with row 10 million, right?

It’s not, not, not a very common requirement, but if it is, there are ways to, there are ways to preserve the previous data and then like batch back through and read and like undo all the work you did.

It’s a pain, but there are ways to do it. Um, it would be nice if, uh, like it would be nice if SQL Server had something like Oracle has with like the flashback tables.

Um, you know, the, uh, the temporal table SQL Server has probably wouldn’t, wouldn’t be the greatest thing for this, but they, you could use that too. Uh, temporal tables just have too many problems for me to like recommend them is, you know, a great source of truth for stuff.

So, um, one thing or rather a couple of things that are worth talking about in here are, you know, a, like how you find where you want to start. Uh, typically you want to use a primary key column because that’s going to get you a, it’s like, it’s not nullable, which is great.

Uh, B, um, you know, uh, it’s, it’s, since it’s a primary key, it’s indexed and you have, it’s very easy for you to sort of like page through a primary key where you’re like filtering on say, like where the primary key is greater than this and then order by the primary key.

It’s, it just makes life a lot easier. If you don’t have a primary key on the table, you’re stuck with some weird cursor options, right?

There’s some very strange ways to page through, uh, unordered data with a cursor. We’re not going to talk about that here though. Cause that’s, that’s, that’s a little, a little bit too, um, a little bit too, uh, exotic for, uh, for general consumption.

But there are two ways to do it, right? Uh, you could either, you know, just like declare some, uh, parameters or declare some variables or use some parameterized stuff.

And you can find stuff in like the, the table that you want, the rows in the table you want to affect in one batch and then go through here and another, another way you could do this, uh, and like, so actually one thing that was worth talking about here is, um, whenever I need to do something like this, I like to set up like sort of an external control table.

Because what happens is, uh, you know, if like, let’s say this batch runs for some amount of time, right? You can build, you can build a timer into this where you’re like, you know, if this has been running for six hours, it needs to stop.

Right. Like, you know, like even like Ola Hallengrin scripts have like stoppers in them. Like this job is only allowed, like after the job is run for this long, we’re like, don’t process any more commands. So you could totally do something like that here.

And you could totally do something like that with a control table. Another thing that, but one thing that I like to do is, uh, has to have an external control table.

And, you know, I like to capture some information about what, like, you know, what’s going on in there. So like, you know, like if I have multiple jobs that have to do something like this, I’ll put in the job name, the name of the loop that I’m currently in. If there are multiple like steps to it.

Um, if the name of the step that I’m in, if it’s the current step that’s executing, or if it’s just like the next thing, uh, the last completion time. And usually you compute, like a computed column where I’ll have like the start time and the completion time, like set, like, uh, computed out with date diff to like minutes or hours or seconds or something.

Uh, so that I can, uh, I can very easily, um, like sort of like gauge how, like if I, if I need to fix anything in any of these loops, right? Like, like how long does this take?

Wow. There’s something that’s terrible in there. Maybe, maybe an index or something. Uh, but I like to have a, like at minimum a table that helps me control if the loop should keep going because inevitably what happens is like, like if you don’t have anything built in to stop it, like you’re eventually, you’re going to be like, uh, okay, well I’m going to do something and, uh, this has been running for a long time.

Uh, it’s getting into the business. I’m just going to hit cancel. Like if you’re, if you’re in the middle of this, like, that’s kind of annoying. Like you want this thing to be able to finish. So what I, what I like to do is set up sort of a loop, what I call a loop controller table.

And, uh, you know, I’ll just say, you know, should I continue? Yes. And then in the loop, I’ll say, you know, uh, while the, the row count is greater than zero.

In other words, while I still have rows to process and, uh, you know, like I’m allowed to continue, I want to keep running this loop. So that way, if I decide that like this thing has been running too long, I can just update the loop controller table to false.

And then the loop will stop the next time before it starts another batch of modification. So I can at least finish the last thing that I’m doing, uh, cleanly before, uh, chain, before bailing out.

So like, you know, just, you know, like, and again, if, if, if you have a loop controller table with like, you know, job name, step name, stuff like that. And it, like, you would have to just tell your where clause to make sure you’re ending the loop for the thing, the specific thing that you care about stopping.

Another, another interesting question that comes up with, um, with these sorts of batch modifications is from, is around isolation levels. Um, again, this is not a terribly common requirement, but it is worth noting that no matter which way you do this, uh, you are subject to concurrent potential concurrency issues.

So if you care about that, you need to think about like a, like transactions and B locking hints for when you find data. Uh, so there are like, there’s essentially two ways of looking for it, but both of them are subject to concurrency problems potentially, like not, not always.

And you might not care about them, but if you’re, if you’re not okay with them, you do have to think about that is like, let’s say like you’re using read committed, right? The, the crappy isolation level.

If you’re unfamiliar with why I think read committed is a crappy isolation level, I highly suggest you watch my series. Everything, you know, about isolation levels is wrong, very good stuff in there, all about problems with read committed.

But, uh, generally like, let’s say you do something here and you find this data and then something happens here in another transaction, which alters like whatever, like the, like the, the correctness of the data that you found here, like maybe invalidates something that you found in here.

Like let’s say there was some additional where clause where like you’re saying where the largest key processed is greater than or equal to the last thing that I just did. And the last modification date is less than, is like older than it, like a week or a month or a year or something.

And then something happens that like changes that last modification date. And all of a sudden you would be doing something to a role that you shouldn’t be doing something to anymore. Uh, that could be a problem, right? Like all of a sudden you shouldn’t be, you shouldn’t be doing something with this.

Right. And that could happen if you did the update like this too. Um, because you know, in this, in this scope, you would, you would take a lock here, but in this scope, you would not have, um, you would not have that same, you would not have any locks taken that would prevent data from being read or prevent data from moving around data you’ve already read.

So something could happen, like even with like a sub query thing like this, where you would invalidate the stuff that you would, you would be working on in the update. So there’s, there’s, you do have to be careful. So in those cases, um, you know, like depending on how, depending on what you care about and what you need to, what you need to happen, you could either add in locking hints.

Uh, you could use a static cursor, which copies the table data that you’re going to be working on, um, if you don’t want that to happen, you can use a dynamic cursor where it would be working directly off the table data.

Uh, or if, if you, if you like a key set cursor would also technically preserve the key values, but not the, not any other table data. So the static cursor might be the best way to go depending on what you want. Uh, but you would might have, if you, if you do care about isolating the work that the batch is doing and not allowing any other queries to mess with that data while you’re in the process of like either finding it or modifying it, you might have to use, uh, you might have to, you know, use a transaction and you might have to add in, um, oops, that’s, uh, that should have been upd lock, not, not hold lock.

Hold lock is a synonym for serializable, which means I probably messed that up somewhere else too. But anyway, uh, so you would have to, um, you know, because like, actually this came up in the YouTube comments recently, one thing that I like about some other database platforms like Postgres, MySQL, I think even Oracle has it is the select for update clause.

So like, instead of having to be surprised by locking hints in your from clause, you could do something like in, like when you write the query, you say like select for update up here.

And then that it turns that select into a locking select. So you could, you could do that here, like take locks on the rows that you, you, you find in this batch and then like pass that in here.

But again, you would need, uh, you would need this down here as well. If you want it, like, so like if you’re finding the next batch, next batch, next batch max, uh, using a select before you do the update, you would have to put that in there.

And then if you are finding the, the, the thousand rows and aware clause, you could do that down here, but you would need the locking hint. If you care about like nothing else, like messing with that data as you make your way through.

Um, but, uh, like kind of going back to, um, uh, some of the stuff that we talked about before, uh, in the previous video about like error in situation handling, writing batches like this is, uh, a great place to have retry logic built in, uh, because there are going to be taught, there are going to be times when your batch gets either deadlocked or blocked and you don’t want to, uh, you don’t want to like keep trying to, or like cause a bigger blocking chain that might already exist.

You like just want things to kind of keep going and make progress depending on the batch or the, depending on the purpose of the batch, you might, you might not care to the point where you could put like a read past hint in your query as a, as a locking hint so that you skip over any rows that are locked.

But you know, that, that’s a, that’s a, that’s a, that’s a, that’s more like a queue table situation. What we’re talking about here is, um, where you like a situation where like, you know, let’s say you kept hitting deadlocks with, when you try to do your modification, or let’s say that, you know, um, if you don’t want your process colliding with anything else, you might stick like a lock timeout or like, you know, retry logic if you had a deadlock.

And this comes back to like how you would incorporate some of the, the retry stuff that I talked about in the last video, uh, with a, with a batching process so that you could very easily, um, you know, like if you hit a problem within your loop, you just go back and start your loop over.

Like you just retry that batch the next time around. So that, that would be one way of doing it. So, um, yeah, anyway, uh, there’s a lot of interesting stuff to think about when it comes to batching modifications.

Um, you know, a, uh, is, is this something that I can batch? Is this something that like, um, like I, I, I, I, it’s even allowable to, you know, split this up into chunks.

There are some situations where it’s not, uh, the best way to write the batch. Again, that Michael J. Swart blog post is probably my favorite. Like it’s a great starting point for things. It’s a great explanation of things. There’s a lot of good technical detail in there.

Um, if it’s an update loop and you need to like maybe, maybe roll that stuff back, then, you know, that, that would be, that would be one thing. Um, I guess that, that could go for deletes too, but that depends on how much you care about the data.

Uh, and then of course, um, you know, like all sorts of other concurrency stuff like should be at least on your mind when you’re writing this stuff. You might decide immediately that you don’t care, but at least, you know, have the mental conversation with yourself about, uh, how much protection, uh, your batching process needs.

And, uh, always, always, always give yourself a way to bail out of the loop. So have that external control table or like a job logging table, uh, where you can control exactly like, you know, how long a loop can run for, or, uh, if you want to bail out on the next loop through, because that gives you a lot of flexibility for controlling the loop.

It’s not just hitting cancel in the middle of it and wondering when things will finish. So thank you for watching. I hope you enjoyed yourselves.

I hope you learned something and I will see you in the next video where, uh, well, I haven’t quite, I haven’t, I haven’t quite, uh, organized the schedule yet. So, uh, the next video might be a surprise.

We’ll see. We’ll see how it goes. Anyway, thank you for watching.