Learn T-SQL With Erik: A.C.I.D.

Learn T-SQL With Erik: A.C.I.D.


Video Summary

In this video, I delve into the concept of ACID properties in SQL Server databases, specifically focusing on how they apply to T-SQL operations. Starting from the basics, I explain each of the four ACID properties—Atomicity, Consistency, Isolation, and Durability—and provide clear examples to help you understand their practical implications. While the video doesn’t include demos, it does offer a thought-provoking example that illustrates how isolation can be misunderstood in SQL Server. By exploring scenarios where queries might not behave as expected due to read-committed isolation, I aim to highlight the importance of understanding and managing transactional integrity. Additionally, I touch on durability, explaining its significance and the potential caveats when dealing with features like delayed durability or in-memory databases. This video is part of my comprehensive T-SQL course, which will soon be expanding with advanced material after summer. If you’re looking to enhance your SQL Server skills, now might be a great time to join at the presale price before it doubles in value!

Full Transcript

Erik Darling here with Darling Data, and we’re going to continue on with the teaser content from my T-SQL course. All 23 hours of the beginner stuff out there published, ready for you to start consuming. The advanced stuff will be dropping after the summer. The course is still on the presale price of $250, and it will literally just absolutely double in value once the advanced material pumps out. So I would strongly suggest that you get in on this now while it’s at the cheaper price because I realize how price and cost sensitive a lot of people are. So in this video, we’re going to talk about acidity because acidity in the context of databases and more specifically in SQL Server is not quite the level of acidity is not quite the level of acidity that you might receive if you read a highly academic Wikipedia article that outlines all of the effects of the ACID, which is Atomicity, Consistency, Isolation, and Durability. So we’re just going to cover those. There’s no demos in here. It’s just sort of a one kind of example query to give you some food for thought on that.

So let’s look at this one on exactly how isolated a query executing might be. So Atomicity means each transaction either happens or it doesn’t. SQL Server by default runs in auto commit mode. So if you want to select, it’s sort of like in the background SQL Server on its own without you typing anything. It’s sort of like saying, you know, begin transaction, run the select commit. Same with inserts, updates, and deletes. It’s auto committing everything. So you don’t have to declare an explicit transaction for every piece of code you write. You can, of course, declare a specific and explicit transaction. But when you do that, like the, you know, assuming that you do it for a reason that is not just putting one query into an explicit transaction, there are multiple queries in a transaction. It is for the purpose of having those queries succeed or fail as a single unit. We do not want to leave those queries in a strange, lingering, stranded state.

But let’s say that you’re running an insert query and you’re trying to insert two rows. The first row goes fine, but the second row causes a, I don’t know, say primary key or foreign key violation, you know, constraint violation, something along those lines. That just means that, like the first row that even though that one was okay, isn’t going to still be in the table. That statement will roll back and the successfully inserted row will get removed.

Zooming out a little though, all atomicity means in a database is that each transaction will succeed or fail as a unit. It does not guarantee that other queries can’t interact with the underlying data while they’re running. Right? Like, you know, while you are inserting your two rows, someone else might insert a different two rows and someone else might delete a row somewhere else in the table.

And someone else might update a row somewhere else in the table. And someone might run a no lock query and select your two half inserted rows, right? A row and a half is in there or something.

Consistency for SQL Server is especially disappointing. Because the only thing that the C gets you is a guarantee that your constraints are obeyed. By constraints, I mean primary key, foreign key, unique, not null, and check.

Default constraints, I mean, well, I suppose they will be obeyed. They hardly qualify here for, I forget if I covered it in the teaser material for inserts. But if not, I covered in the full material.

So you should go buy that. But, you know, just don’t expect too much out of the consistency thing, at least in the context of SQL Server. However, isolation is the toughest one because most people are completely confused about what isolation means, like what isolation levels guarantee and at what point they provide the various guarantees that they make.

So let’s say that you have a code pattern that looks like this. You begin a transaction. You declare a local variable.

We’re just going to call this next user. And when you go to assign this local variable of value, which is going to be the ID column from the users table, you have to access, in the context of this query, you have to access the users table twice. If you’re, like, we don’t have any walking hints on here.

We don’t have any isolation level hints on here. So just assume that we’re using the default locking read committed isolation level. That doesn’t really guarantee much.

We’re going to talk way more about that in some of the other videos. But just understand that, like, you know, when you read, when you say select from, that’s one read of the users table. And when you say select from here, this is another read of the users table.

These two reads are not going to show data from the same point in time. Read committed is not a point in time view of data. So you might read this data from the users table in the outer context or the inner context at a different time from the inner context or the outer context.

And you still, if you’re scanning through the data, you have to read along those pages. All sorts of stuff can happen to your data as you’re reading along those pages. Again, it’s not a snapshot of the data.

It’s just you’re just taking little read locks as you go along and do stuff. As you’re taking those little read locks, all sorts of stuff can happen all around you. So, like, you know, just if we were to look at the query plan for this, just hit control and L.

We have one scan of the users table here. That’s going to happen at one point. This is going to run like the, like you have another scan down here.

But this isn’t going to run at precisely the same time because we have to get all this data, aggregate it, right? Scan the table, all two point something million rows, aggregate it, gather the streams, aggregate it again, distribute the streams again, and then go into a nested loops join so that every row that comes out of this goes in here and then reads the data.

And since that loop join is just going like this over and over again, these are all at different points in time too. Again, there’s no snapshot happening. So if the goal here is to find the top, like the most recent accessed user in the users table and then go assign them some award, right?

Say, hey, you’re the most recent user. You win a million dollars or reputations. You can go buy a cool stack overflow windbreaker or canoe or Zippo or something.

You know, like you don’t have a guarantee that that person actually was, like when this update ran, the person who should get it. Because all sorts of stuff can happen all around you. Someone else could log in right after you ran this query and this, whatever value you assigned to this would no longer be valid.

So this is where you would have to think about increasing the isolation of this query. Perhaps you would need serializable. Perhaps you would need updlock, right?

Perhaps you would need to do something to protect this data when you read from it both times in order to make sure that this date, like nothing else sneaks in and does something to this data, changes the validity of the value that you assign to that next user when you go and assign some stuff here.

Durability is an easy one, right? Durability, this one is very simple. What this means is that once a transaction commits, you won’t lose it.

There are some caveats around features like delayed durability, but in, you know, I think the in-memory stuff kind of makes it might make this a little weird too. But in most cases, unless you experience data corruption, committed transactions will be safe.

So just like talking, like thinking about the ACID stuff in the context of SQL Server, you know, A is most likely the strongest guarantee. C is a very iffy proposition.

I, you know, as you think about like, you know, depending on the isolation level you’re using, the I might not mean a damn thing, especially if you’re using read uncommitted or nolock. I means nothing.

I might not, I might as well not even be there. I might as well just be ACID. And durability is, you know, up there with A as far as strength goes, though there are some caveats to durability depending on database settings and, you know, the reliability of your underlying storage hardware.

So anyway, thank you for watching. I hope you enjoyed yourselves. I hope you learned something. In the next video, we’re going to talk a little bit more about isolation levels and stuff like that. So hold on to your bucket hats for that.

Anyway, thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

SQL Server Performance Office Hours Episode 26

SQL Server Performance Office Hours Episode 26



To ask your questions, head over here.

Some of my reporting queries finish a little faster with NOLCOK hints even when nothing is going on in the database. Any ide why?
I get the argument that new features sell better than fixed up old features, so how come highly upvoted User Voice items for new features have sat around for years with no action?
There are a lot of applications out there that do horrible things to SQL Server. Why don’t software vendors hire people like you to fix things?
What is batch mode not a good fit for?
Is it always possible to tune queries to save money in the cloud with smaller hardware?

Video Summary

In this video, I delve into some intriguing questions submitted by viewers during my Office Hours session. We explored topics ranging from why certain queries perform better with no lock hints to the challenges of getting new features implemented in SQL Server. I also touched on the dynamics between software vendors and database performance experts, offering insights on how to optimize cloud costs through query tuning and hardware reduction strategies. Whether you’re a seasoned DBA or just starting out, there’s plenty here to help you navigate the complexities of SQL Server optimization.

Full Transcript

Erik Darling here with Darling Data, and we have another exciting rendezvous with Office Hours. My favorite. That’s where I get to be a real man of the people and answer some people questions that you submit. If you want to submit your own question, you can go to that link, which is down in the video description. If you feel so inclined and you like the material that I produce on this YouTube channel, you can go to that link, which is down in the video description. If you feel so inclined and you like the material that I produce on this YouTube channel, you can also sign up for a membership, you can also sign up for a membership, and you can support me with as few as $4 every month. If you need SQL Server Consulting help, I am out there in the world, working with clients, helping people, tuning queries, doing all sorts of stuff. And as always, my rates are reasonable. If you want to get in on my performance tuning training, you can get all 24 hours of it for about $150 buckaroos that is at that link with that discount code. That is for the everything bundle that covers my performance tuning training. If you would like to buy my new T-SQL course, that is also available there. All 23 hours of beginner content is done and recorded. This course will double in value from $250 up to $500 after the advanced material comes out after the summer months.

So save yourself some money now and buy that while you can. Redgate is being a bunch of real sweethearts and dragging me out of my house to varying degrees this summer. New York City, August 18th and 19th. Dallas, Texas, September 15th to 16th. And Utrecht in the Netherlands, October 1st and 2nd. This is all part of the Pass on Tour events that they are putting on, little mini Pass events. And this is all leading up to Pass Data Community Summit in Seattle, September 17th to 21st. But it is time to go answer some questions now. It is time to have a bit of question asking and answering fun. All right. So first up, here is an interesting one. Some of my reporting queries finish a little faster with no, let’s just call that a no lock hint.

Even when nothing is going on in the database. Any eyed why? Ah, sane eyed. So when you use no lock hints, aside from the fact that you are requesting the read uncommitted isolation level, which will allow your queries to ignore locks taken by modification queries that are executing concurrently, you also allow the SQL Server storage engine to access data a little bit differently. It can use what’s called an allocation order scan using IAM pages rather than using index pages to read through stuff. And that can sometimes be faster.

So, um, like that, that is most likely why I have an old video about it where there was a really dramatic difference between a query with no lock hints and a query without it with nothing else running on the server. Uh, part of it was reading, uh, like from a cold cache that was like reading from disk into memory. Um, you might look at the difference between those and you might see the, uh, the ordered property of whatever index scan operators being false.

Now keep in mind, this is for index scans. Index seeks don’t really get that behavior because you’re seeking along a B tree, but scanning a big index will, will definitely see that. All right. Next up here we have, oh boy, getting, getting personal here. I get the argument that new features sell better than fixed up old features.

So how come highly upvoted user voice items for new features have sat around for years with no action? Well, apart from the fact that, um, Microsoft seems to have the attention span of a squirrel that got into a bag of meth. Um, you know, uh, a lot of what, like, like you have a, like every, like every release, they have like a certain amount of budget that they can spend like people’s time and like salary on to develop stuff.

And, you know, some, some narratives get pushed a lot harder than others, you know, like, um, turning pink because of some weird light thing here. But like, like, you know, just like, if you look at like a lot of, a lot of recent releases, you know, like some narratives got pushed really hard that ended up just being like nowhere’sville. You know, like, um, you had big data clusters, which are a complete flop, right?

They’re, they’re completely deprecated now. Uh, ledger tables, no one cared, right? Like blockchain stuff. Um, you know, it’s like, there’s just a lot of things that, they get poked out there that, you know, are driven by these sort of like checkbox items to, you know, make sure that you have the new shiny thing in your database that people that like executives and C-levels currently care about.

And not a lot of that stuff aligns with what users care about. You know, it’s like, uh, like from what I know talks about, like, you know, improving partitioning come up, uh, like every release cycle and everyone’s just like, well, I mean, whatever. Uh, you know, the argument is that like, you know, it’s like, it’s like this stupid circular argument.

It’s like, like, well, the feature kind of sucks. So no one, like, yeah, like we could fix it. And like, yeah, but no one uses it. So it wouldn’t really have that much impact.

It’s like, well, no one uses it cause it sucks. Right. It’s like, you made it painful to use. Of course, no one uses it. Right. Like, of course it doesn’t have wide adoption. You made it miserable. Right. So, you know, uh, don’t expect that to change. Uh, I guess that’s the bottom line.

Unless you are like Walmart or something or SAP and you, uh, swing a big purse at Microsoft and you really want to push for stuff. You probably, you, you have the worst voice in the room, the quietest voice in the room. Um, and you’re like, like that, that user voice size is just a black hole for ideas anyway.

So like, unless you’re reporting a serious bug, uh, good luck. Good luck out there. There are a lot of applications out there that do horrible things to SQL Server.

Why don’t software vendors hire people like you to fix things? You know, that’s a great question. Um, I have actually worked with a number of software vendors. I am always happy to do so.

Uh, and it is always nice to improve products as a whole. Uh, I have noticed that this, this trend has actually peaked up a bit, uh, since like software as a service has become more and more ingrained in what a lot of companies are offering. You know, when it was like, you know, we sell you a piece of software and you install it on a SQL Server that you’re responsible for.

You couldn’t trust anyone to install anything and do anything right. Like you had no idea what version or a, like you could, you could put out a best practices guide, right? And you could say like, here’s a setup checklist.

Here’s the stuff you should do. Like here, like the bare minimum requirements here, like, you know, hardware requirements, depending on database size, stuff like that. There is no guarantee that anyone’s going to follow any of that. So, uh, the software as a service thing has, um, has, has gotten me hired by more software vendors because now they’re responsible in house for like infrastructure, the code, the indexes, everything.

And so I have been doing more work like that, but, uh, I would love to do additional work like that. So if you know of a software vendor who you think, uh, should, should hire perhaps an Erik Darling in the world who has, who has very reasonable rates, uh, to improve their software offering, you, you, you feel free to send them. Well, I don’t, I don’t have a resume.

You can send them to my website and that’s, that’s probably as good as it’s going to get as far as, uh, resumes go. Uh, let’s see. I seem to have lost my Zoomit cursor there. We’re going to try that again.

Uh, so here’s a, here’s an interesting question. What is batch mode not good for? Well, I think, you know, uh, I assume this question is because I’ve been, you know, uh, doing some videos lately about stuff that batch mode is good for writing batch mode friendly queries time. You know, when like, you know, like how to do stuff, how to think like batch mode, stuff like that.

So, uh, batch mode is typically like, you know, good for like your heavy lifters, right? It’s like your, your, your bigger queries that have to like go through a lot of stuff. If you just have to like find a few rows and like send a few rows along and, you know, stuff like that.

If it’s like, think like, like, you know, OLTP ish workloads, uh, batch mode isn’t generally a great fit for stuff that you do in there. Like typically where batch mode comes into, into play are, you know, when you have to do like big scans of tables and you have to aggregate a lot of rows down, uh, or, you know, you’re like, you know, you have like, you have to like hash join a lot of rows or hash aggregate a lot of rows, stuff like that. If a lot of your query plans are like single threaded, like little loopy join things anyway, that’s, that’s really not a good fit for batch mode.

Obviously modification query is not a great fit for batch mode, things like that. But, you know, um, you know, it, it’s, I, I am happy that Microsoft has stopped locking some of the intelligent query processing stuff behind batch mode because that was, that was kind of a drag. Uh, but you know, I, I, I do wish that like batch mode on rowstore was a little bit more lenient with where it would kick in.

But anyway, uh, let’s go on to the next one here. Cause this is, this is, this is a kind of a fun thought bubble question, isn’t it? Uh, is it always possible to tune queries to save money in the cloud, but like with smaller hearts?

So I mean, I assume that means like by reducing hardware. So I would say generally, yes, but it really does depend on where you’re starting. If you’re already not on, uh, if like, if the hardware that you’re on is like sufficient for what you’re doing and, you know, like you, you’re thinking to yourself, I need to add hardware to keep up with this.

Then yeah, absolutely. Um, but if you’re, if you’re at a point where like, you know, you have like terabytes and terabytes of data and, you know, and like the, like maybe like, you know, high double digits, low single digits memory. And, you know, depending on kind of like some, you know, user concurrency parallelism stuff, like a small number of CPUs, it’s, it gets harder to, uh, you know, tune things so that, uh, so that you can reduce hardware for that.

You can certainly tune things to make better use of the hardware you currently have, which can give you a better idea of what you might need to increase stuff to. So like, like one way to think about it is, you know, like if, if you’re on like a smaller server and you know, you need to go to bigger hardware, it could certainly pay off to do a lot of query and index tuning. So that, uh, you can make a more, like you can make a more like good conservative choice about the next hardware step up.

So like, instead of like quadrupling hardware, maybe only like double or like, you know, 1.5 X your hardware. So you don’t have to go like, you know, like nuclear and get like, you know, like four terabytes of memory or something. Right.

Like not that, you know, a lot of, a lot of, a lot of cloud instances don’t even go up that high, like managed instance piece of crap. But, uh, anyway, um, like, you know, like it is, it is always possible, but really it depends. Like if you, if you have like, you know, like if you just went like to the absolute biggest server you, you could possibly find, uh, and like, you know, no one’s really doing the, no one, no one’s doing, or no one has historically done like meaningful query and index tuning.

Then yes, absolutely. You know, you can, you can bring a lot of stuff down that way. But, uh, you know, a lot of the stuff that I see is sort of painfully like under, under provisioned, uh, from a hardware perspective as a starting place.

And that just makes it a lot, that this makes it a lot harder to do stuff that can make enough of an impact to say, oh, now you can reduce hardware further. It’s like, cause you’re already spending like, you know, some tiny little instance size already, you know, and you’re, you’re expecting, you know, this miraculous 24 seven mission critical 30,000 batch request plus reporting workload to, to all of a sudden fly on that thing. It’s, it’s a little bit tougher of a proposition.

Um, so, you know, like, but there are things you can do, obviously, you know, like, you know, like aside from like the generic index and query tuning, um, you know, I like one of the, one of the biggest things that I, I push is, uh, compressing indexes, which makes them smaller on disc and in the buffer pool. So that you make better use of, you know, those expensive, like, you know, uh, IOPSy discs that you, uh, pay, pay too much for. And so that, and you also make better use of the memory on the server.

Um, so, you know, there’s of course, you know, times when absolutely you can, you can like hit home runs with this, but there are also times when, uh, you know, you like, sometimes you, sometimes you put in all the effort and like, you just allow the server at that size to tread water a little bit longer. So, you know, always no, but generally, you know, you can, you can, you can do meaningful things to, uh, to improve the situation almost regardless of what, what the starting point is. But, uh, there, there are some, certainly some outlier starting points that I’ve seen that, uh, you know, like you could, you, you could tune, you could tune to your heart’s content and, uh, you just wouldn’t move in.

You wouldn’t move the needle enough to, um, like to, to allow, like allow for a downgrade of hardware. Anyway, that’s about good for this one. We have answered five questions, me and the voices.

Uh, thank you for watching. I hope you enjoyed yourselves. I hope you learned something and I will see you in the next video. All right. Thank you.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

A Little About Row & Range Window Functions vs Batch Mode in SQL Server

A Little About Row & Range Window Functions vs Batch Mode in SQL Server


Video Summary

In this video, I delve into the nuances of window functions and how they interact with rows and range modes in SQL Server. I explore the differences between using `ROWS UNBOUNDED PRECEDING` versus `RANGE UNBOUNDED PRECEDING`, explaining why the former was historically slower due to on-disk spooling, while the latter could be more efficient. By comparing these methods through practical examples and query plans, I demonstrate how modern SQL Server versions and features like batch mode can significantly improve performance for window function queries. Whether you’re working with large datasets or just looking to optimize your T-SQL code, this video offers valuable insights into leveraging these powerful tools effectively.

Full Transcript

Hey, it’s Erik Darling, Darling Data, your number one source for SQL Server content on the internet. I hope. If you can go anywhere else, I mean, I can’t say what you’ll get will be correct or high quality. So, careful out there. Anyway, today’s video, we’re going to talk about rows and range versus batch mode. Now, this all sounds terribly foreign to you. You may not work with window functions very much. But back to the end, I’m going to talk about rows and range versus batch mode. Now, if this all sounds terribly foreign to you, you may not work with window functions very much. Back in the, let’s say, pre-batch mode days of window functions, SQL Server has two different ways of sort of giving you a partition clause or giving you a window of things to look at. You can look and specify a certain set of row elements to be in there, or you can specify a range of row elements to be in there. Now, back in the bad old days, you had to be very careful about specifying the rows. Because rows was a lot faster. If you use range, you got this terrible on disk window spool. There was very, very, there was not very, very, sometimes very, very slow. Let’s just say it was much slower than range.

But now, with batch mode, all that kind of goes away. So we’re going to look at examples of that in this video here today. But before we do, if you would like to sign up for a channel membership to support the best SQL Server content on the internet, there is a link in the video description below. And if you want to ask me questions privately that I will answer publicly on my Office Hours episodes, there is a link right up there for you to do that exact thing with. Otherwise, you can just do the usual YouTube stuff and keep me happy in different ways. If you need SQL Server consulting help, I am here for you, baby. I got you no matter what you need. You know, within reason. I mean, I don’t want to like carry a server somewhere for you. You got to do what you got to do. I’ll rent a dolly if I have to. And as always, my rates are reasonable.

My performance tuning training is about $150 for life. If you go to that link and use that discount code, that’s the everything bundle. And if you would like to pick up my new T-SQL course, all of the beginner content is recorded and published and available. It is currently on sale for the pre-sale price of $250, which will double in value to $500 once the advanced material is out about there in the world. And some other fun summer stuff coming up for me. Redgate is taking me on tour with the PASS mini events. That’s going to be New York City, Dallas, and Utrecht.

And of course, all that leads up to PASS Data Community Summit, November 17th to the 21st. So I look forward to seeing you at as many of those events as humanly possible. But with that out of the way, let’s talk about this window function stuff. Now, what I’m talking about when I say like rows versus range is specifically this part of the window function, where you say rows unbounded proceeding.

Now, I don’t technically have to say and current row because this is it’s by default when you do that. So if you just say rows unbounded proceeding, then you will just get like a running sum of all the prior rows up to the current row. Now, I’ve got this query hinted.

Well, let’s make sure. Query plans are turned on. I’ve got this query hinted to you as optimizer compatibility level 140. The reason for this is because my current database, which is on SQL Server 2022, is in database compatibility level 160. Being on developer edition, which is an enterprise equivalent SKU, and being on SQL Server 2022 in database compatibility level 160 means that I get the batch mode on rowstore feature automatically without having to do things, do any strange things there.

If my database compatibility level 140, I would have to use this hint to say 160 in order to show it off. So we’re just doing this in reverse this time. But when I run this query and we look at the rows unbounded proceeding, we are going to get back some results.

And they take a second because there’s a lot of rows. There’s about a million rows that come out of this. But this all takes just about a second to run.

You can see over here, just one second. It makes life easy. And, you know, things in here are not too terrible. If we run that same query, but we use the range unbounded proceeding, we’ve changed this from rows to range, then things are going to look a little bit different.

And if we look at the query plan, you can maybe felt that that took a little bit longer. That took twice as long at two seconds. Now, part of the reason why does require us running both of these at the same time.

I’ve turned statistics.io on for this because it’s a slightly easier way of seeing where that other second of time went. So now, if we look at the query plans next to each other, we will see just about that same pattern here, right, where it’s one second versus two seconds. And so the range one is twice as slow.

And if we look at the statistics.io results, notice that there’s this work table. Let’s frame that up a little bit better. There we go. And for the rows query, there is a work table with a scan count of zero and a logical reads of zero.

But for the range, there is a work table with a scan count of 1058840 and a reads of 5430748. So in the first query, we use rows, that work table didn’t actually end up getting like it was there, but nothing really happened to it. It’s almost like with a many to many merge, when you actually have a many, many, many, and you end up like there’s a work table, like when you’ll see a lot of activity there.

But when SQL Server is like, I think this is a many to many merge, but then there’s not actually a many to many. The work table doesn’t actually see anything. Now, let’s contrast this situation with database compatibility level 160, where we get the batch mode on rowstore feature to kick in.

I’m going to run this one, right? We’re going to take a look at how long this takes. And this, I mean, like we still have to display a million rows, so it may not feel quite as snappy as you’d hope, but it still runs very quickly.

We’re at just under a second in total for this. So we improved from the non-batch mode version by about, I don’t know, a couple hundred milliseconds or something. That’s not really the thing that is my big deal for this query.

The bigger deal for these queries is that the range unbounded proceeding version does not suffer and take twice as long when things run in batch mode. Now, this one takes just about the same amount of time as the previous one. So to talk a little bit about why, the main difference between these, and what I want to do is just get the estimated plan for this.

And if you look at what happens in here, we have the clustered index scan, we have a repartition streams, we have a sort, we have a segment and another segment, and then a window spool. The window spool over there, let’s clear these out so it’s just one thing in the middle that we care about. The window spool is where that work table occurs.

When this is on disk with range and using like a row mode execution, then this is a bad time, right? This is where that extra second in that work table lives. When we run these in batch mode, right?

And if I run these together, we’re going to get a different query plan. It’s a lot more compact and it has different operators in it. So if we run both of these, they’re both going to take around 800 milliseconds to run. Again, there’s like a million rows that come out.

So things are, you know, not quite perfect. But if you look at these, we still have the clustered index scan, and then we have a sort to put data in order. But then we have this window aggregate.

And this window aggregate doesn’t really have like the on disk thing for it when you, like, depending on if you use the default, which is range, or if you like go out of your way to specify rows. And if we look at the messages tab, and we look at like the work table for both of these, we’re going to see a scan count of zero in here and a logical reads of zero from the work table in here. So neither one of these did any of that like on disk window spool work, because in the query plan, we have the window aggregate rather than the window spool operator.

So this is like, you know, this is like a sign that your window function query ran in batch mode. You see the window aggregate and a lack of like the usual, like, you know, like, you know, segment sequence project, stuff like that. And then the window spool operator, which is where that hidden work table occurred in the other queries.

So if you are using, you know, a SKU of SQL Server and a database compatibility level where you can get batch mode, or if you are allowed to play some tricks on your queries, which I’ve explained in many other videos about how to get batch mode to happen for your queries without that, then you have a little bit less to fear with window functions if you need the range behavior, because the range behavior is slightly different from the rows behavior. So if you’re getting batch mode, you can be slightly less afraid of your queries, your window function queries that may require the range behavior over the row behavior.

If you are using, if you are getting batch mode execution, because you no longer have to deal with that pesky on disk window spool. So some, some, some light at the end of the tunnel out there for the, the big data analysis folks. Anyway, thank you for watching.

I hope you enjoyed yourselves. I hope you learned something and I will see you in the next video where we’ll talk about something else interesting. I promise. All right. Thank you.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Learn T-SQL With Erik: A Neat Thing with MERGE and OUTPUT

Learn T-SQL With Erik: A Neat Thing with MERGE and OUTPUT


Video Summary

In this video, I dive into an intriguing aspect of T-SQL using the `MERGE` statement and its `OUTPUT` clause, demonstrating something you can’t achieve with a regular `INSERT`. I walk through creating sample data in SQL Server Management Studio 21 (version 21.something.6.3) to illustrate how we can modify our target table by adding 10 to the ID column, incrementing dates by one day, and replacing values in another column—all while retaining visibility into what exactly was inserted. This feature is particularly powerful for complex processes where you need detailed insights into the changes made during a merge operation, allowing you to output this information to a different table for historical tracking or auditing purposes. I also highlight why trying to replicate this functionality with a standard `INSERT` statement results in errors, emphasizing the unique benefits of using `MERGE`.

Full Transcript

Hey! Hello, friends! My friends, my only friends, Erik Darling here with Darling Data. And we are going to do some more super fun T-SQL learning today. We’re going to talk about kind of a neat thing that you can do with merge and the output clause that you cannot do with a regular insert in the output clause. So I hope that is is majestically entertaining for you as you take time out of your busy day and schedule to watch this video. All right. So let’s get over to SQL Server Management Studio 21 version, 21.something.6.3. There’s been 15 updates in the last week or so. So I don’t know exactly which version I’m on. But what I want to show you here, so first let’s create some sample data to work with. So we’re going to create some sample data to work with. So we’re going to create some sample data to work with. So we’re going to create some sample data to work with. So we’re going to create a table called the target and a table called the source. Don’t ask me where I got these clever naming conventions from because it took a whole lot of, you know, really, like I needed a higher muse. There was a lot of creative work effort that went into those. But we’re going to insert 20 rows into this table called the source. And the data that we inserted into here is going to look, oh, that’s a good interesting keystroke, is going to look like this, right? So it’s the numbers one through 20.

So we’re going to look at the same tree in the ID column, a bunch of dates in the sum date column, and this sort of half of a Christmas tree or half of a triangle, however you prefer to call it, in the sum thing column, right? So we just have this nice sort of shape in there, right? Pythagoras would be proud. What’s that? It’s a triangle. Nope, it’s a bloop shape. All right. Cool. So let’s make sure that there is no data in the target table just in case. And what I’m going to do in this this merge query is something that you cannot normally do with an insert query, which I’ll show you next, is we are going to say, we’re going to merge, and we’re going to say, using merge into the target table. I do not have a serializable hint here because we are only doing an insert. Serializable is only necessary if you have multiple actions assigned to your merge statement.

So like, insert update, like the upsert pattern is the most common, but deletes would qualify as well. And we’re going to say, using the source, but we’re going to say, on one equals zero. So essentially, there would be no, there’s no match here, right? We’re making the implicit assumption that no rows will match between these two. And then we’re just going to say, when not matched by target, then insert into id, sum date, something.

And we’re going to say, and we’re going to use the values clause here, because that’s usually what you do with the merge and the insert. But we’re going to tweak what goes into the table a little bit. So we’re going to add 10 to the id column. We’re going to add a day to the sum date column. And we’re going to replace the bloop shape of As with a bloop shape of Bs.

All right. And then down here, we’re going to have our output clause. And in the output clause, we’re going to, I mean, A, there’s something you can do with merge that you cannot do with standard modification queries. And that is, you can have this action column. And this action column will tell you what came out of the merge. So if you write like a different type of merge that maybe has multiple actions, this action column will tell you if it was an insert, an update, or a delete, which is very helpful.

One thing that annoys me about output is that you can’t do the, like, column equals expression syntax. You have to say as. You have to say this as this, this as this, this as this. I guess it would be confusing for some people. But then I’m going to alias, just so we can see where the different columns start.

This is going to be where columns from the inserted table begin here. And then this is going to be, this is the other thing that you can’t do with a normal insert query is look at the source values in it, right? So we’re going to have a set of columns here called source values.

And this is where they start, this s dot star. And if we look at what happens when I run this whole thing, we are going to get back the output clause. And we are going to be able to, this is the really helpful thing, is that you can see what, you can see like the, like the source values, which you cannot normally see, right?

So this is where things started. It was the numbers 1 through 20. Everything was 20, 25, 06, 25, right?

And then this bloop shape was all A’s going down. But the values that we actually inserted look like this, right? Well, these are different now.

We can see that the IDs go from 11 to 30 rather than 1 to 20. Some date is all 626 rather than 625. And our bloop shape has taken on the form of all B’s rather than all A’s. So this has a lot of interesting uses, I think, that not a lot of people sort of get into, especially because as you write more and more complicated processes that use merge, you might need to have visibility into this stuff.

And you might even want to output this stuff to a different table to sort of retain a history of things as well. But notice that if we try to do this same thing with a normal insert and we write our output clause, this thing has a little squiggly under it. And even though this little squiggly thing should come from here, SQL Server is not able to, I guess, look forward enough to output values from this, right?

We’re not able to look in here and output any values from that. So this will just give us an error. SQL Server will say, no, we can’t do that.

The column prefix TS does not match with a table name or alias name, even though it’s just, it’s right here. But SQL Server is not able to get into that. So this is just, you know, something that annoyingly, very annoyingly, you can do with merge, but you can’t do with a normal, like, modification.

We’re using insert as an example here, but this is something you can’t do with a normal, like, insert, update, or delete either. You can’t see that source data. You can only see the change data, which is, well, I mean, I guess for updates, that’s, you know, you can see the before and after, but deletes, you know.

Anyway, looking at the source rows and there can be very useful for some query patterns. Anyway, just a neat thing that you can do with output and merge, I think, anyway. Thank you for watching.

I hope you enjoyed yourselves. I hope you learned something. And I will see you over in the next video where we will continue to express and enjoy ourselves in the T-SQL query language. Oh, that was a repair.

Structured T, Transact SQL query language. All right. Thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Learn T-SQL With Erik: Some Merge Tips

Learn T-SQL With Erik: Some Merge Tips


Video Summary

In this video, I delve into some essential tips and tricks for working with SQL Server’s `MERGE` statement, focusing on making your code more efficient and less prone to errors. I start by emphasizing the importance of using the `SERIALIZABLE` hint when performing multiple actions in a merge statement to prevent concurrency issues. Then, I demonstrate how simplifying the matched clause can save you from complex null checks, showing a cleaner way to update records with concise SQL. Lastly, I discuss the `USING` clause and explain why it’s more akin to a `FROM` clause than a traditional join, highlighting its importance in ensuring that your merge logic behaves as expected. By the end of this video, you’ll have a better understanding of how to write effective and maintainable `MERGE` statements.

Full Transcript

Hey, it’s T-SQL time with Erik Darling. All right. Usual, usual song and dance here. All 23 hours of the beginner content from my T-SQL course, Learn T-SQL with Erik is available now. The price is $250 and it will be advancing to double in value.

It will go up to $500. So you can, you can, you can short me if you want. When the advanced material drops after the summer months have concluded.

So in this video, we’re going to talk a little bit about some, like a couple of things you should know about merge. Like I’m not going to sit here and be like, this is how you write a merge statement. I’m going to just show you some stuff that can kind of make working with merge somewhat less painful and maybe explain why you might get weird results with merge sometimes.

So let’s, let’s do that. Let’s have that, let’s have that kind of fun today. So the first thing, when you are, when you’re writing a merge statement and you are, you intend that merge statement to have multiple actions.

So like update and insert, like the upsert, like form of merge is probably the most common. The first thing is that you absolutely need this serializable hint here to prevent strange things from happening. When you run that, there are all sorts of strange concurrency phenomena that may occur if you do not use this.

So this is the very first thing here. The second thing I want to show you in this, in this section is how to make the matched section for, to perform the update portion of the merge a little bit easier to write. So when, what I see in a lot of matched clauses is not this, right?

What I see is a lot of stuff like where T dot name is not equal to S dot name and T dot name is no, or S dot name is no. And T dot user ID is not equal to S dot user ID or T dot user ID or S dot. Like it just goes on forever with these, like not equal to or no, like foreverness things, because you might have no’s and you can’t do the not equal to no’s.

And the whole thing just turns into a nightmare. This is a much more clean and concise way of writing this. You say select like the, the target columns, except select the S dot columns.

Uh, you could potentially, well, well, you would, you would have to have made after reverse some other stuff if, if you did, if you reversed it. But, uh, if you do this, this will save you all the null checking because except handles nulls, uh, graciously for you. So that is the, that is the main thing here.

The other thing that I want to talk about is the using clause. Now, the using clause for a lot of people feels like a join, uh, because there is an on clause and that that’s, that’s, that’s reasonable. But you do have to be, uh, you, what you should be aware of is that using is somewhat more like a from clause than a join clause.

Um, and what I mean by that is if you were to write a query like this and you were to say using badges, uh, badges stage as S on like S dot ID equals T dot ID. Uh, like, like any, any, anyone, anyone from the, from this who didn’t like match this exactly would go to the, when not matched by target and would go to the insert portion. Uh, which is probably not what you intend.

So when you’re writing your using clause, a lot of the times what you want to do, and this, this might seem sort of similar to when we talked about like, um, pivot and unpivot. How, when you write the pivot query, uh, you kind of want, like, if you use a derived table expression, uh, you can, you control better the columns that SQL Server will attempt to do, attempt to do the like implicit grouping by thing. Uh, so when you’re, when you, when you write a query where you only want to get certain stuff from a table to use for your merge, what you want to do is wrap that up into something like this.

So you’re only getting, uh, like you’re what the data source that you’re using is a select from the staging table where user ID equals two, two, six, five, six. So this will limit it to just that portion of the data rather, and you won’t end up with like weird, uh, bugs and potential other things going on when you hit them when not matched by portion. So, um, just a couple of things that might help you write somewhat better merge statements in there.

Uh, one, if you are performing multiple actions, you must use the serializable, uh, uh, uh, uh, uh, uh, uh, locking hint, uh, on the, uh, target table. Two, if, uh, you are using, uh, if, if, if in your matched clause, you have to write an excessive amount of null checking, it is a lot easier to just say, and exists, select columns that you care about, except select the other columns that you care about to do your update. And, uh, when you are writing the using clause, if there is any additional sort of filtering or anything else that you, you want to do here, um, confine it to a derived table so that you actually, you start with the correct data source and you don’t have weird things, uh, flying around your, uh, matched and not matched clauses in your merge statement.

So, uh, that’s about it here. Um, you know, there, there, if you want to read a lot about merge, um, I would highly suggest looking at Michael J. Swartz blog. Uh, he says lots of fantastic things about merges and upserts, uh, because he uses them a lot.

Uh, I don’t know if he says any of this stuff directly, but, um, if, if he, if he doesn’t, I’m sure he says other very, very smart things about it. So, anyway, thank you for watching. I hope you enjoyed yourselves.

I hope you learned something and I will see you in the next video, uh, where we will talk. I believe, uh, we’re going to talk a little bit about output next. So that’ll be, that’ll be great fun for all of us, won’t it? All right.

Thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Learn T-SQL With Erik: Neat Update Tricks

Learn T-SQL With Erik: Neat Update Tricks


Video Summary

In this video, I dive into some advanced T-SQL techniques focused on updates, showcasing how to write and validate update queries effectively. I emphasize two key rules: always start by writing your updates as a select statement to ensure the intended results before committing any changes, and highlight from the bottom up to minimize errors. Additionally, I demonstrate a unique trick involving updating local variables or parameters within an update query, which can be incredibly useful for tracking changes without using OUTPUT clauses. This video is part of my comprehensive T-SQL course, offering insights that might not be commonly discussed in everyday practice but are invaluable for mastering the language. The 23 hours of beginner content are now available at a presale price of $250 and will double to $500 after the summer release, giving you plenty of time to start learning and improving your T-SQL skills.

Full Transcript

Erik Darling, Darling Data, same as it ever was, same as it ever will be. Consulting without end. So this is going to be some more preview material from my T-SQL course. We’re going to talk about some neat stuff you can do with updates that are nothing to do with anything being in order like they were with deletes. All 23 hours of the beginner content is out there and available for you to start learning anything, getting better at stuff, specifically T-SQL with. I don’t know if it’s going to make you better at anything else. Who knows? Maybe, maybe it will. I don’t know. Give you some superpowers. Uh, but this is still available at the course presale price of 250 bucks, which will double in value, uh, to $500 when the beginner material publishes after the summer, uh, when I have had time to recuperate some brain cells. So let’s talk about updates. Now, uh, when it comes to modification queries, I have, I have two rules that are, I am pretty steadfast about, uh, now if inserts, inserts a bit less because inserts involve writing a select anyway, when you are writing the portion of the query, that’s going to get data to insert into a table. But for updates and deletes specifically, two big rules that I have are one, you should always write them as a select first. Always write them as a select first. So you can validate that the results are, what you, what you intend that to me, that makes a lot of sense. Sure. You can say begin transaction and then have like a rollback and a commit at the end. But you know, then you’re also like performing some modification, hoping that you remember to highlight, begin transaction, uh, exploring data after you have modified it within your transaction, scrolling through stuff, checking things out. Oh, is that right? That look okay to you? I’m not sure. Uh, and then hoping that you have modified it within your transaction.

That you remember to commit or rollback the transaction. And even worse, I know you, I see you out there. Uh, all your other queries have no lock hints on them. And so while you are in the midst of your transaction, uh, perhaps not causing any blocking problems because all your other queries have no lock hints on them. You may very well just be letting data, letting users read a whole bunch of like, you know, uncommitted data. Uh, well, you figure out if everything is right or not. So the, the first rule is to always write your updates and deletes as selects. The second rule is to, uh, always highlight your modification queries from the bottom up. And the reason I say that is because if we were to, uh, we were to highlight this update from the bottom down, we, we, we might, we might, we might flake on something. We might, we might only get to here. We might get distracted and look away.

And you know, this would be, if we run this, we would in fact just delete, uh, or rather we would update all the dates in the table to add, uh, add one day to them. Granted, this would be fairly easy to fix because we would just have to say data day minus one to, to adjust it back. But depending on how long that takes and some other, uh, you know, local factors that might not be an enjoyable experience for you. So the reason I say to always highlight from the bottom up is because let’s say we do the same thing and we get distracted and we only get to here. We get this very, very helpful error message. Incorrect syntax near dot.

Okay. Uh, no incorrect syntax before dot though. Nothing else is wrong with this query. There’s just incorrect syntax near a dot. Okay. Okay. Anyway, always highlight your queries, your, your updates and deletes from the bottom up. Uh, I suppose that would also go for inserts too. Cause if your select query has some form of where clause on it, that might be important as well. So modification queries, write them as selects first highlight from the bottom up.

All right. Now I want to show you a cool update trick. And this is something that not a lot of people, uh, well, I mean maybe ever know exists, but that’s the value of these amazing courses is you get to learn about stuff that you might never, uh, practically hear about or use in your life. So in this, uh, badges update table that I have created, um, we have, uh, a row for you for user ID 41, but no rows for user ID six, right?

This isn’t like there, something exists for this one, but nothing exists for this one. Right. And this is going to play into the next thing I show you. So one thing that you can do with updates is you can not only update columns, but you can update, uh, either local variables, declared local variables, like I have up there or parameters. And if you’re going to like, you, sometimes this can be very useful, uh, if you don’t want to use output to like, like if you’re affecting a single row, uh, and you don’t want to use output for this, you like, cause you can’t like output into like variables like this, but you can update these local variables or parameters, uh, to values based on what you updated in the table.

So in this case, we are going to update old user ID to the user ID column. We are going to update new user ID equals user ID equals six. So we’re setting this to six and then we’re setting this to six. And then we are going to set, did it update to true? Uh, right now, these three things up here are declared, uh, old user ID and new user ID have are assigned null marks and did it update is assigned a false value. So what we’re going to do after this is we’re just going to make sure that everything sort of happened the way that we thought it would. We’re going to select, uh, our values from these local variables to see what they ended up as.

And we’re also going to look at the, uh, badges update table to ensure that our row changed from 41 to six. So now highlighting from the bottom up, we are going to, uh, get all of, all of this query that we care to run in one go, and we’re going to execute it. And when we check in on our local variables, they have all been assigned the correct values.

Old user ID was 41, new user ID is six and did it update is true. And then when we look in the badges update table, we will see that we did indeed change user ID 41 to user ID six. Cool. We’re all good there.

Uh, and we did roll that transaction back because we don’t want that to stay in there because it’ll ruin the next thing I’m going to show you. Okay. When there are no rows, then, uh, this will not get changed and these will remain null. So now we’re going to do this sort of in reverse and we’re going to say, Hey, let’s try to update user, where user ID six, we’ll do the same thing in here.

And we’ll try to change user ID to 41. So we’ve got a new one, but of course now highlighting from the bottom up again, if we run this, we will see that, uh, we got back nulls because no VALU, no values were assigned or either overwritten or overwrote the null values. the null marks here and did it update remains a zero because nothing came back to change it from being false here right when we attempted to set it to true here we didn’t get a row in order to actually make that assignment change and of course when we look in the badges update table we still have user id 41 so we were we were not able to switch 6 to 41 because there is no row for 6 so some neat stuff that you can do with updates that is not just you know changing some data doing a join CTE blah blah blah hope you enjoyed yourselves hope you learned something and i’ll see you over in the next video where i believe we’re going to talk a little bit about some merge stuff there are a couple things that you should know about merge aside from uh the fact that you know maybe for a lot of cases you should be avoiding merge of course all right anyway thank you for watching

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

SQL Server Performance Office Hours Episode 25

SQL Server Performance Office Hours Episode 25



To ask your questions, head over here.

I know using MAX for columns has downsides but what about 4000/8000?
What is the hardest subject to teach people about ?
Have you ever had a consulting engagement not go well? What happened?
I get really frustrated while tuning queries because a technique that worked one time won’t work another time. Does that happen to you? HOw do you deal with it?
Do you ever want to get out of tech/working with databases?

Video Summary

In this video, I embarked on an office hours adventure where we tackled some interesting and practical SQL Server questions. We discussed the downsides of using large string columns in tables, delved into teaching isolation levels—a notoriously difficult topic—explored consulting challenges when clients aren’t ready to implement solutions, and shared insights on why tuning techniques that work one time might not work another. It was a great session where we aimed to provide value through real-world examples and practical advice. I hope you found it as informative and enjoyable as I did!

Full Transcript

Oh, look who it is. How you doing? Erik Darling here with Darling Data, and we are going to embark on an office hours adventure for this video. I hope that you find it enjoyable. So if you want to ask questions that I answer on these things, there’s a link here that allows you to do that. It’s a wonderful deal. You get to submit a question for free. I get to answer it for free. And, uh, I don’t know, I guess that’s, that’s about the end of it. Uh, if you want to support this channel, you can also, you can sign up for a membership for as low as $4 a month. Uh, you can do that down in the old video description. Uh, otherwise, uh, you can just sit there and, uh, get it all for as much as it costs you to, I don’t know, have the internet, pay for electricity, all that other stuff that kind of goes, goes along with, uh, watching things online. Uh, if you need SQL Server consulting help, I have, of course, in the best in the world at all of these things outside of New Zealand, of course, the lawyers make me say that every time. Uh, and as always, my rates are reasonable. If you would like to get my performance tuning training, uh, there’s all 24 hours of it available to you via the everything bundle, which is everything that I’ve done about performance tuning. Uh, and you can get that for about 150 USD and that’ll last you for the rest of your life.

So, you know, it’s kind of a good one-time purchase there. Again, as we get into these summer months, when you, uh, want to just lock yourself in a room with an air conditioner and not deal with anything. Well, what, what better way to do that than get some SQL Server performance training while you’re at it? Who needs music and movies and whatnot? Joy. Get better at databases. All right. Uh, my new T-SQL course, uh, all 23 hours of the beginner content. The best event is out there and available. It is currently $250 on the pre-sale price. It will double in value to $500 when the advanced material is done, uh, over after the summer. Uh, I will also be leaving the house a lot this summer. I will be leaving my, my pleasant air condition and life. and I will be traveling to the faraway land of New York City, August 18th and 19th, for the Pass on Tour series of events.

Also going to Dallas, which hopefully is air conditioned, even though it’s all September in Texas. Yeah, that’s a thing. September 15th and 16th, the Hamlet of Utrecht in the Netherlands, October 1st and 2nd, and of course, Pass Data Community Summit in Seattle, November 17th to 21st.

All good things coming up over the summer and fall months. But with that out of the way, let’s do some office hours in here. All right.

We have some questions. We have some very important questions to answer. I know using Macs for columns has downsides, but what about 4,000, 8,000? I assume you mean in VARCAR 4,000 and VARCAR 8,000.

Yeah, of course there are downsides. I mean, you know, your developers get to be lazy and perhaps not deal with truncation errors when they insert their dirty data sources into your beautiful database.

But, you know, you also run into some stuff too where, you know, you can’t have those columns in the key of an index, which might be important at some point.

You know, when you select data out of that table, if you, you know, need, if there’s any memory grant, it will be inflated by those much, much larger string columns.

So, I can’t really, you know, and this isn’t me being like a disk cheapskate, being like, disk is expensive. You should always use the right data type because disk space, blah, blah, blah, blah, blah.

This is like practical performance stuff. Like, like, don’t do it. Still, still a bad idea. Don’t, don’t, don’t hose yourself with overly long string columns because they can come back to bite you in many ways.

All right. Oh, here’s an interesting question. What is the hardest subject to teach people about? Oh, without a doubt, it is isolation levels. That is the toughest material to teach people about because almost no one is approaching it mentally from the, the right perspective.

And, uh, they often come with a lot of preconceived notions, uh, about isolation levels that require the, require unteaching before you can actually, apply any new teaching.

Um, there is a lot of really, really bad blog content out there about, um, especially role versioning isolation levels that should be thrown in the dumpster and set, set, set ablaze.

But, uh, you know, that’ll be the day. Um, all right. Uh, have you ever had a consulting engagement not go well? What happened?

Um, so yeah, of course not, not, not all of them are as successful as, as some of, some of the others. Um, I think what ultimately makes a consulting engagement unsuccessful is when someone needs help, but is not ready to get help.

Um, it’s, it’s sort of like, you know, you, you go in there and you can give them the analysis and you can give them the stuff to do when you can show them like, you know, Hey, if we change this query to do this, this will get better.

If we change this index to do this, like you can, like, you can like, like show them like, like real proof that this stuff happens, but like there might not, like they might not follow through on anything. Right.

Like they just might not end up doing anything. And, uh, you know, like they’ll, they’ll still like hit you up later and be like, Hey, you know, we’re still having problems. And you go look and just like, nothing’s been done.

And like, you know, it’s, it’s frustrating for you because like, you know, like you, you, you, you’re, you’re, you’re giving these people everything, all the tools that they need to solve problems and be successful and like get things done. But there’s just no follow through.

There’s no willingness to actually make changes. And when you ask them why it’s just, Oh, we haven’t gotten to it yet. Oh, we were afraid about this. Oh, you know, there’s a list of excuses.

So, uh, you know, those, those are, those are the, the toughest sort of things to deal with is, um, you know, like people whose servers are just in bad, bad shape, but, uh, there’s just no one willing to sort of take responsibility for it.

You know, actually like go through and do stuff. And, um, you know, when, when you, you’re like, well, is there anything you want me to do? And they’re like, well, I don’t know.

We’ll see. Like, you know, it’s just hems and haws and like the bucket gets kicked. So, yeah, those are, those are probably the worst ones. Uh, all right, let’s see here. I get really frustrated while tuning queries because a technique that worked one time won’t work another time.

Does that happen to you? How do you deal with it? Well, yeah, of course it happens to me. You know, uh, you, you build up this bag of tricks and knowledge and stuff over the years.

And, you know, something that works beautifully, uh, to tune up one query, uh, has no effect on another query, or maybe has an opposite effect, an opposing effect on a query that you’re working with.

Uh, I think, you know, the, the trick is that as you like, like every change is, is, is feedback, right?

So don’t like, like, yes, it can be frustrating, but like, ultimately it’s kind of like what you make of it. So, you know, like if you make, if you try something and it doesn’t work, that’s a data point, right?

That’s you figuring out, Oh, like when I do this, like, you know, this didn’t get the right change or like, this didn’t change the plan, the way I thought it would like, like, what do I need to do next to try and like get past that? And like, as you go through that process more and more, it’ll happen less and less.

So as you sort of like, like sort of like, you know, like expose yourself more to times when something doesn’t work, you start to understand when it doesn’t work and why it doesn’t work. And you start to sort of like, like, like not try it out when you recognize that those circumstances are cropping up again.

So like, like, you know, like, like it’ll happen to you less. It’ll still happen. It still happens to me where I’m like, well, you know, like, you know, I don’t know what some examples, um, you know, doing like, uh, you know, like, like there are a few things where like, you know, I like things that I, I’m like, Oh, well this, this, this would probably be a meaningful change.

Like changing like the left join with a null check to not exists, or sometimes doing like a, like a, like a introducing a top into something to get a row goal, or sometimes using like cross apply, uh, when I’m generating a row number, rather than like joining to, uh, like a whole, like derived join, um, you know, introducing batch mode, like stuff that, you know, like usually has, uh, like a fairly quick and obvious, uh, performance when there’s a feedback mechanism.

Sometimes you try it and you’re like, Oh, well that, that, that didn’t, that didn’t go the way I thought this time. Uh, you know, you kind of go back to the drawing board. Uh, but you know, again, the, the more you do that kind of work, the more you can sometimes recognize like the situation that like, like made it unsuccessful and kind of skip over that.

Or I don’t know, maybe just try it, but be a little bit more prepared for failure. I don’t know. That’s another, that’s another good, uh, thing to master is, uh, being prepared for something to not work out.

Right. Cause sometimes things don’t work out. Speaking of not working out, there we go. Wow. Do you ever want to get out of tech and working with databases? Uh, yeah, of course.

Um, you know, I mean, as, as much as I enjoy it, it’s, uh, it’s hard to imagine doing this at like 70, 80 or something. Like that’d be kind of wild.

Um, but like if, if I, if I got out of tech and working with databases, I think, um, what, what I would, what I would love, even, even though I realized it’s not a money printing machine, I would love to own a gym, uh, just, you know, like, like real basic barbell equipment, not a single Nautilus thing, no leg press, no leg extensions, no, none of like the goofy stuff that I hate.

Uh, like when I go to a gym, uh, just like, you know, barbells, squat racks, benches, um, you know, limited set of dumbbells, uh, and just no, no real nonsense is, is, is, would be, would be my take on it.

So, maybe someday, who knows? Anyway, uh, that is five questions. One, two, three, four, five. Uh, we did it. We made it through. We didn’t swear. We didn’t curse.

Congratulations. Uh, we did a great job on that. Uh, thank you for watching. I hope you enjoyed yourselves. I hope you learned something. I will see you, uh, in tomorrow’s video, which I think will be something to see equally, but who knows?

All right. All right. Thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Learn T-SQL With Erik: Ordered Deletes and Table Expressions

Learn T-SQL With Erik: Ordered Deletes and Table Expressions


Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Learn T-SQL With Erik: Identity Annoyances

Learn T-SQL With Erik: Identity Annoyances


Video Summary

In this video, I delve into some common annoyances related to the `IDENTITY` property when working with temporary tables in T-SQL. Specifically, I discuss how using an identity column can hinder parallel inserts and provide alternative methods for generating auto-incrementing IDs that maintain query performance. I also highlight the benefits of using a `ROW_NUMBER()` function instead of relying on the `IDENTITY` attribute, especially for larger insert operations into temporary tables. If you’re working with heavy data movement queries or need to optimize your T-SQL scripts, this video offers valuable insights and practical solutions. Whether you’re just starting out with SQL Server or looking to refine your advanced query tuning skills, there’s something here for everyone.

Full Transcript

Erik Darling, your friend from Darling Data. I don’t know, like at this point, I don’t know, I either need to change my name or the company name and spice up these intros a little bit. Something new to say. This is going to be another video of material from my new T-SQL course.

This video is going to be, I don’t know, hopefully a rather short one to just go over some annoyances with the identity property when you are trying to insert data into a table. And not for like single row inserts, but let’s say, you know, you’re creating a temp table that you want to work off of and, you know, you need some incrementing number in there.

Some stuff that I run into, well, like performance tuning queries to do that. All 23 hours of the beginner content is live and published. The course is at the pre-sale price of $250. It will double in value.

After the summer when the advanced material comes out. So I do, I would urge you to buy now and save yourself $250. Unless you are, unless you’re just like, if you’re psyched on giving me an extra money, cool.

But I appreciate that. But, you know, if you want to save yourself some money, get in there, get in there sooner rather than later. So let’s talk about these identity annoyances. Now we don’t have a ton to cover here. It’s just some example queries that I want to show you.

So one thing that I lean on a lot when I am tuning queries is, you know, like, like temp tables are a wonderful mechanism for making queries go faster. You can materialize a result set. You can, you know, create indexes across, you know, a combined set of columns. Like if you have to join two tables together, you can put those two, you can put those rows into a temp table.

And that would allow you to create an index across columns that might come from different tables, which ordinarily you couldn’t do. And would make maybe your performance tuning or just general indexing job a little bit tougher to make things go faster. So there are very, very good reasons to put data into a temp table.

Something that I see a lot of people do is add an identity property to a temp table. It’s never entirely clear to me why, but I see this quite a bit. And, you know, like later in queries, it does end up getting used a little bit for stuff, which is, you know, of course, curious to me.

But there are better ways of giving yourself that same outcome without hampering the performance of your queries. So just for example, let’s say that we create a temp table with the identity property, and then we try to insert into that temp table. Now I’m using the tab lock hint here because one thing that is very useful generally is to like when you’re inserting into a temp table and the goal is to get a fully parallel insert is to use the tab lock hint with the insert.

It doesn’t always work. Sometimes select into does work better. But in our case, if we are if this is our big attempt here, then, you know, one thing that will prevent the insert from being in the parallel zone is going to be the identity property.

If we look at this query plan, what we would hope for is that this gather streams operator would be way over here. Right. And that the table insert would happen within the parallel zone, like over here.

But we don’t get that. Right. The this this query does scan the post table in parallel, does aggregate the data in parallel. And but then we we exit the parallel zone early and we repartition streams.

Like I said, the identity property is something that will prevent you from being able to do a fully parallel insert. Right. And there’s another sort of silly way of doing this.

And that’s when you say select into like this. And I don’t know why this I need to change that keyboard shortcut. But if you do something like this, you can you can also say ID equals identity integer one one.

And you can create an identity on the fly. The thing is, this is this is one of the few things that will like stifle the insert from being parallel. We’re going to see the exact same estimated plan here where the gather streams ends early, the in the table insert is outside of the parallel zone.

So this is not what we want to see again. Again, this compute scale are right here is going to be where that identity starts getting assigned. And this is what causes this is that requires the parallel zone to end where it does.

You can do this yourself with a row number with by using row number. Remember, row number starts at one and just counts up until you run out of rows. So this is often a much faster way of giving yourself what is essentially an auto incremented ID column, but with what with better performance along the way.

Note in this query plan, when we get the estimated execution plan, this is what we wanted to see in the first place. We have everything happening, everything that we care about happening within the parallel zone. So there is the clustered index scan.

There’s the aggregate. In this case, we need to sort some data in order to generate the row number. But, you know, you can index your way around that. I just haven’t gone that far with this simple demo. So then we use our window aggregate in order to produce the row number.

And then notice here, this is what we this is what we cared about last time. This is what we cared about in this video that wasn’t happening in the last couple of queries where the table insert is before the gather streams. So all of the rows that end up in this table would have been spread out across multiple threads.

And this can really, really help larger inserts into temp tables. Small, small row, small row count inserts is probably not going to make a very meaningful difference. And depending on like what your max stop is, sometimes it really helps to bump max stop up to make sure this goes quickly.

But like, you know, there are some like factors involved with figuring out if this is worth pursuing. But when it is, this can make a very, very big difference into how fast rows end up going into your temp table and getting loaded in from the was part of the insert. So just something to consider when you’re tuning queries, if the speed of your insert into a temp table is not what you want.

You might try the tab lock hint. Of course, if you’re going to use a tab lock hint, there are many things that you like. Like you should really just start with a insert into a heap temp table and then add in any indexes later that you want.

But this does have some ramifications around like plan caching and reuse and recompiles and stuff. So, you know, but if it’s the type of thing where, you know, it’s a process and you’re loading a lot of rows in and it’s not like, you know, some old crazy OLTP query that needs to run every, you know, three mil, three milliseconds or something. Then this kind of like larger effort tuning is worth it.

So something to think about when you’re tuning your queries, especially your big, big heavy lifting data movers. If you want to get the fully parallel insert much easily, much more easily. It can be a very, very neat sort of workaround to just use row number instead of an identity attribute.

Get yourself that auto incrementing thing. Create whatever indexes you need later, perhaps, and just use that. Anyway, thank you for watching.

I hope you enjoyed yourselves. I hope you learned something and I will see you in the next video where I think are we talking about? We might be talking about something sequency. I forget.

I don’t know. I don’t want to go look and spoil it for both of us at this point. Anyway, thanks for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Learn T-SQL With Erik: SELECT INTO

Learn T-SQL With Erik: SELECT INTO


Video Summary

In this video, I delve into some lesser-known aspects of T-SQL’s `SELECT INTO` statement, sharing practical tips and insights that can be particularly useful when working with table backups or creating temporary tables. I explore how to use `SELECT INTO` for schema-only copies, which is handy for quickly duplicating a table structure without the indexes and other metadata. Additionally, I discuss some of the quirks and limitations associated with using `SELECT INTO`, such as handling nullability issues and preserving identity columns. By walking through these examples, I aim to help you better understand how to leverage this powerful T-SQL feature in your database management tasks. Whether you’re a beginner looking to expand your T-SQL knowledge or an experienced DBA seeking new tricks, there’s plenty here to discover and apply.

Full Transcript

Erik Darling here with Darling Data. And we are going to talk a little bit more about T-SQL today. We’re going to do some stuff about Select Into. This is, of course, some teaser material from my T-SQL course, Learn T-SQL with Erik. That’s me. All 23 hours of the beginner content is out there in the world, floating around for you to watch and start to learn from. And this is, of course, still at the presale price of 250 US bucks. And that will be going up to 500 bucks when the course is fully published after the summer. And just to reiterate, this is good for life. So you will not have to renew or resubscribe this purchase at any point. Anyway, let’s talk about Select Into a little bit. Now, there are some fun things that you can do with Select Into if you want to create a backup of your table. For example, you can say Select Top Zero and this will just make a schema-only copy of your table. Of course, schema-only has some limitations. It’s not going to bring over indexes and stuff like that. So if you need, like, a schema-only copy of your table, you can just make a schema-only copy of your table.

You might need to script out. It might be easier to either script out the whole thing or just script out additional items after you’ve made the schema-only copy. But for most people, just the schema-only copy is enough. I actually use this technique a bit when I am writing my diagnostic queries because it’s fairly easy to, you know, say, like, select into a regular table from, like, a DMV that may not have, like, a create table statement for it. And then I can script out the created table from it and, you know, just kind of, like, validate some stuff to make sure everything came out all right.

Another thing that you can do in a similar vein is to, say, select into where some, with some where clause that, like, can’t ever possibly be true. So in this case, I’m using where 0 equals 1 to imitate the top, you know, the top 0 thing. You can also do that with a temp table, of course.

And, you know, like I said, when you do this, you have to add any indexes after the fact. Sometimes that’s okay. Sometimes, you know, you’ll want to create the table. You know, there are some interesting downsides that can occur when you do select into, and your select is not just, like, you know, select the columns from a table into something else.

An example of that is something like this. Now, there are, of course, other, like, instances where this can get weird and interesting. For example, if, you know, you’re using, you know, like, like, if you’re, like, combining columns in some way, if you are, you know, concatenating strings or, you know, you have some other expression in your select list, you do have to be careful that whatever the, however that expression is evaluated on the select into results in the correct data type in the final table that you’re putting that data in.

And, like, and this is worth checking because there can be some strange inferences that SQL Server makes when you do select into. So, if you, like, for some select into statements, it is worthwhile and it does make sense to use convert to make sure that you carefully control the data type of what ends up in whatever temporary or staging table you are creating via select into. I think one interesting thing about select into is around nullability.

So, if we run this and we say select into this table, and this doesn’t return any rows, which is fine, we don’t have to. But if we say select all this stuff into this table where account ID is not null, and then we try to create a, we try to create a clustered primary key on the account ID column, even though we’re, in here, we’re saying, like, you know, we’re, like, where account ID is not null, and, you know, we’re, you know, grouping by stuff.

Like, the group by is, like, the group by at least gets us, like, the, like, the unique part of it, but the column still inherits this nullability trait. But there is a way around that, and that is to use is null. This does not work with coalesce.

Coalesce does not give you the same behavior. But if you say is null account ID zero, then, of course, you can get rid of the nullability of this thing, and then you would be allowed to create the primary key on it. Now, this does have some funny limitations.

Well, not really limitations, but just something to be aware of, is that you have to be very sure that you are not, that you don’t have at least multiple nulls in here, right? If you have one null, then it might be okay, unless you have an account ID of zero, because, remember, you’re replacing nulls with a value. So, you might need to use a better canary value, depending on what you’re doing, like negative two billion something, right?

To make sure that at least it won’t, wouldn’t conflict with any naturally occurring values in here that would prevent it from being unique. And in this case, because we’re trying to create a primary key on it. So, that’s something to just think about and be aware of if you are going to use this.

Another thing that I think is neat with select into is that, so like when I said that, you know, a lot of the table, like metadata stuff is not carried over when you do select into. One thing that is carried over is the identity attribute. It does get reset, of course, but like the identity attribute will at least be present on whatever column.

So, if we, let’s just say we’re going to take the users table and we’re going to use the ID column. And in the users table, the ID column is the cluster primary key. It also has the identity attribute to it or identity property.

So, if we just make sure this table doesn’t exist first and then we select the ID column into a new table and we use SP help to show the table definition, you’ll see that the ID column does indeed retain the identity property here. Right. So, this is still an identity column in the new table.

It, you know, does, it is seeded back to one and it does increment by one. So, if you had different stuff for you, like it doesn’t pull over all the, like the current seed value and stuff from your other table. But there’s a way to get rid of that.

And that is if you, on your select into, just say something like select ID equals ID plus zero, then that turns into an expression and SQL Server no longer attempts to, I don’t know, maybe attempts to is the wrong word there. But SQL Server no longer preserves the identity attribute for this table. If we look at the results of SP help after running this thing where select, where ID is ID plus zero, we’ll have slightly different results down here where it says no identity column is defined.

So, there are some neat things that you can do with select into. And I chose this portion of the material because, you know, just talking about inserts with, you know, like values and selects and all the other stuff, you know, it, it, it, it, it’s neat, but this stuff is a little bit neater to me. So, just some fun T-SQL tricks and quirks around select into for you to think about and be aware of perhaps when you are writing queries that use this.

I hope you enjoyed yourselves. I hope you learned something. And I will see you over in the next video.

We’re going to, we’re going to talk a little bit more about the identity, the identity attribute or the identity property, however you prefer to, however you prefer to call it. And some of the effects that it can have on performance when we are doing inserts or select into, however you want to call it. Anyway, thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.