Learn T-SQL With Erik: Some Merge Tips

Learn T-SQL With Erik: Some Merge Tips


Video Summary

In this video, I delve into some essential tips and tricks for working with SQL Server’s `MERGE` statement, focusing on making your code more efficient and less prone to errors. I start by emphasizing the importance of using the `SERIALIZABLE` hint when performing multiple actions in a merge statement to prevent concurrency issues. Then, I demonstrate how simplifying the matched clause can save you from complex null checks, showing a cleaner way to update records with concise SQL. Lastly, I discuss the `USING` clause and explain why it’s more akin to a `FROM` clause than a traditional join, highlighting its importance in ensuring that your merge logic behaves as expected. By the end of this video, you’ll have a better understanding of how to write effective and maintainable `MERGE` statements.

Full Transcript

Hey, it’s T-SQL time with Erik Darling. All right. Usual, usual song and dance here. All 23 hours of the beginner content from my T-SQL course, Learn T-SQL with Erik is available now. The price is $250 and it will be advancing to double in value.

It will go up to $500. So you can, you can, you can short me if you want. When the advanced material drops after the summer months have concluded.

So in this video, we’re going to talk a little bit about some, like a couple of things you should know about merge. Like I’m not going to sit here and be like, this is how you write a merge statement. I’m going to just show you some stuff that can kind of make working with merge somewhat less painful and maybe explain why you might get weird results with merge sometimes.

So let’s, let’s do that. Let’s have that, let’s have that kind of fun today. So the first thing, when you are, when you’re writing a merge statement and you are, you intend that merge statement to have multiple actions.

So like update and insert, like the upsert, like form of merge is probably the most common. The first thing is that you absolutely need this serializable hint here to prevent strange things from happening. When you run that, there are all sorts of strange concurrency phenomena that may occur if you do not use this.

So this is the very first thing here. The second thing I want to show you in this, in this section is how to make the matched section for, to perform the update portion of the merge a little bit easier to write. So when, what I see in a lot of matched clauses is not this, right?

What I see is a lot of stuff like where T dot name is not equal to S dot name and T dot name is no, or S dot name is no. And T dot user ID is not equal to S dot user ID or T dot user ID or S dot. Like it just goes on forever with these, like not equal to or no, like foreverness things, because you might have no’s and you can’t do the not equal to no’s.

And the whole thing just turns into a nightmare. This is a much more clean and concise way of writing this. You say select like the, the target columns, except select the S dot columns.

Uh, you could potentially, well, well, you would, you would have to have made after reverse some other stuff if, if you did, if you reversed it. But, uh, if you do this, this will save you all the null checking because except handles nulls, uh, graciously for you. So that is the, that is the main thing here.

The other thing that I want to talk about is the using clause. Now, the using clause for a lot of people feels like a join, uh, because there is an on clause and that that’s, that’s, that’s reasonable. But you do have to be, uh, you, what you should be aware of is that using is somewhat more like a from clause than a join clause.

Um, and what I mean by that is if you were to write a query like this and you were to say using badges, uh, badges stage as S on like S dot ID equals T dot ID. Uh, like, like any, any, anyone, anyone from the, from this who didn’t like match this exactly would go to the, when not matched by target and would go to the insert portion. Uh, which is probably not what you intend.

So when you’re writing your using clause, a lot of the times what you want to do, and this, this might seem sort of similar to when we talked about like, um, pivot and unpivot. How, when you write the pivot query, uh, you kind of want, like, if you use a derived table expression, uh, you can, you control better the columns that SQL Server will attempt to do, attempt to do the like implicit grouping by thing. Uh, so when you’re, when you, when you write a query where you only want to get certain stuff from a table to use for your merge, what you want to do is wrap that up into something like this.

So you’re only getting, uh, like you’re what the data source that you’re using is a select from the staging table where user ID equals two, two, six, five, six. So this will limit it to just that portion of the data rather, and you won’t end up with like weird, uh, bugs and potential other things going on when you hit them when not matched by portion. So, um, just a couple of things that might help you write somewhat better merge statements in there.

Uh, one, if you are performing multiple actions, you must use the serializable, uh, uh, uh, uh, uh, uh, uh, locking hint, uh, on the, uh, target table. Two, if, uh, you are using, uh, if, if, if in your matched clause, you have to write an excessive amount of null checking, it is a lot easier to just say, and exists, select columns that you care about, except select the other columns that you care about to do your update. And, uh, when you are writing the using clause, if there is any additional sort of filtering or anything else that you, you want to do here, um, confine it to a derived table so that you actually, you start with the correct data source and you don’t have weird things, uh, flying around your, uh, matched and not matched clauses in your merge statement.

So, uh, that’s about it here. Um, you know, there, there, if you want to read a lot about merge, um, I would highly suggest looking at Michael J. Swartz blog. Uh, he says lots of fantastic things about merges and upserts, uh, because he uses them a lot.

Uh, I don’t know if he says any of this stuff directly, but, um, if, if he, if he doesn’t, I’m sure he says other very, very smart things about it. So, anyway, thank you for watching. I hope you enjoyed yourselves.

I hope you learned something and I will see you in the next video, uh, where we will talk. I believe, uh, we’re going to talk a little bit about output next. So that’ll be, that’ll be great fun for all of us, won’t it? All right.

Thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Learn T-SQL With Erik: Neat Update Tricks

Learn T-SQL With Erik: Neat Update Tricks


Video Summary

In this video, I dive into some advanced T-SQL techniques focused on updates, showcasing how to write and validate update queries effectively. I emphasize two key rules: always start by writing your updates as a select statement to ensure the intended results before committing any changes, and highlight from the bottom up to minimize errors. Additionally, I demonstrate a unique trick involving updating local variables or parameters within an update query, which can be incredibly useful for tracking changes without using OUTPUT clauses. This video is part of my comprehensive T-SQL course, offering insights that might not be commonly discussed in everyday practice but are invaluable for mastering the language. The 23 hours of beginner content are now available at a presale price of $250 and will double to $500 after the summer release, giving you plenty of time to start learning and improving your T-SQL skills.

Full Transcript

Erik Darling, Darling Data, same as it ever was, same as it ever will be. Consulting without end. So this is going to be some more preview material from my T-SQL course. We’re going to talk about some neat stuff you can do with updates that are nothing to do with anything being in order like they were with deletes. All 23 hours of the beginner content is out there and available for you to start learning anything, getting better at stuff, specifically T-SQL with. I don’t know if it’s going to make you better at anything else. Who knows? Maybe, maybe it will. I don’t know. Give you some superpowers. Uh, but this is still available at the course presale price of 250 bucks, which will double in value, uh, to $500 when the beginner material publishes after the summer, uh, when I have had time to recuperate some brain cells. So let’s talk about updates. Now, uh, when it comes to modification queries, I have, I have two rules that are, I am pretty steadfast about, uh, now if inserts, inserts a bit less because inserts involve writing a select anyway, when you are writing the portion of the query, that’s going to get data to insert into a table. But for updates and deletes specifically, two big rules that I have are one, you should always write them as a select first. Always write them as a select first. So you can validate that the results are, what you, what you intend that to me, that makes a lot of sense. Sure. You can say begin transaction and then have like a rollback and a commit at the end. But you know, then you’re also like performing some modification, hoping that you remember to highlight, begin transaction, uh, exploring data after you have modified it within your transaction, scrolling through stuff, checking things out. Oh, is that right? That look okay to you? I’m not sure. Uh, and then hoping that you have modified it within your transaction.

That you remember to commit or rollback the transaction. And even worse, I know you, I see you out there. Uh, all your other queries have no lock hints on them. And so while you are in the midst of your transaction, uh, perhaps not causing any blocking problems because all your other queries have no lock hints on them. You may very well just be letting data, letting users read a whole bunch of like, you know, uncommitted data. Uh, well, you figure out if everything is right or not. So the, the first rule is to always write your updates and deletes as selects. The second rule is to, uh, always highlight your modification queries from the bottom up. And the reason I say that is because if we were to, uh, we were to highlight this update from the bottom down, we, we, we might, we might, we might flake on something. We might, we might only get to here. We might get distracted and look away.

And you know, this would be, if we run this, we would in fact just delete, uh, or rather we would update all the dates in the table to add, uh, add one day to them. Granted, this would be fairly easy to fix because we would just have to say data day minus one to, to adjust it back. But depending on how long that takes and some other, uh, you know, local factors that might not be an enjoyable experience for you. So the reason I say to always highlight from the bottom up is because let’s say we do the same thing and we get distracted and we only get to here. We get this very, very helpful error message. Incorrect syntax near dot.

Okay. Uh, no incorrect syntax before dot though. Nothing else is wrong with this query. There’s just incorrect syntax near a dot. Okay. Okay. Anyway, always highlight your queries, your, your updates and deletes from the bottom up. Uh, I suppose that would also go for inserts too. Cause if your select query has some form of where clause on it, that might be important as well. So modification queries, write them as selects first highlight from the bottom up.

All right. Now I want to show you a cool update trick. And this is something that not a lot of people, uh, well, I mean maybe ever know exists, but that’s the value of these amazing courses is you get to learn about stuff that you might never, uh, practically hear about or use in your life. So in this, uh, badges update table that I have created, um, we have, uh, a row for you for user ID 41, but no rows for user ID six, right?

This isn’t like there, something exists for this one, but nothing exists for this one. Right. And this is going to play into the next thing I show you. So one thing that you can do with updates is you can not only update columns, but you can update, uh, either local variables, declared local variables, like I have up there or parameters. And if you’re going to like, you, sometimes this can be very useful, uh, if you don’t want to use output to like, like if you’re affecting a single row, uh, and you don’t want to use output for this, you like, cause you can’t like output into like variables like this, but you can update these local variables or parameters, uh, to values based on what you updated in the table.

So in this case, we are going to update old user ID to the user ID column. We are going to update new user ID equals user ID equals six. So we’re setting this to six and then we’re setting this to six. And then we are going to set, did it update to true? Uh, right now, these three things up here are declared, uh, old user ID and new user ID have are assigned null marks and did it update is assigned a false value. So what we’re going to do after this is we’re just going to make sure that everything sort of happened the way that we thought it would. We’re going to select, uh, our values from these local variables to see what they ended up as.

And we’re also going to look at the, uh, badges update table to ensure that our row changed from 41 to six. So now highlighting from the bottom up, we are going to, uh, get all of, all of this query that we care to run in one go, and we’re going to execute it. And when we check in on our local variables, they have all been assigned the correct values.

Old user ID was 41, new user ID is six and did it update is true. And then when we look in the badges update table, we will see that we did indeed change user ID 41 to user ID six. Cool. We’re all good there.

Uh, and we did roll that transaction back because we don’t want that to stay in there because it’ll ruin the next thing I’m going to show you. Okay. When there are no rows, then, uh, this will not get changed and these will remain null. So now we’re going to do this sort of in reverse and we’re going to say, Hey, let’s try to update user, where user ID six, we’ll do the same thing in here.

And we’ll try to change user ID to 41. So we’ve got a new one, but of course now highlighting from the bottom up again, if we run this, we will see that, uh, we got back nulls because no VALU, no values were assigned or either overwritten or overwrote the null values. the null marks here and did it update remains a zero because nothing came back to change it from being false here right when we attempted to set it to true here we didn’t get a row in order to actually make that assignment change and of course when we look in the badges update table we still have user id 41 so we were we were not able to switch 6 to 41 because there is no row for 6 so some neat stuff that you can do with updates that is not just you know changing some data doing a join CTE blah blah blah hope you enjoyed yourselves hope you learned something and i’ll see you over in the next video where i believe we’re going to talk a little bit about some merge stuff there are a couple things that you should know about merge aside from uh the fact that you know maybe for a lot of cases you should be avoiding merge of course all right anyway thank you for watching

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

SQL Server Performance Office Hours Episode 25

SQL Server Performance Office Hours Episode 25



To ask your questions, head over here.

I know using MAX for columns has downsides but what about 4000/8000?
What is the hardest subject to teach people about ?
Have you ever had a consulting engagement not go well? What happened?
I get really frustrated while tuning queries because a technique that worked one time won’t work another time. Does that happen to you? HOw do you deal with it?
Do you ever want to get out of tech/working with databases?

Video Summary

In this video, I embarked on an office hours adventure where we tackled some interesting and practical SQL Server questions. We discussed the downsides of using large string columns in tables, delved into teaching isolation levels—a notoriously difficult topic—explored consulting challenges when clients aren’t ready to implement solutions, and shared insights on why tuning techniques that work one time might not work another. It was a great session where we aimed to provide value through real-world examples and practical advice. I hope you found it as informative and enjoyable as I did!

Full Transcript

Oh, look who it is. How you doing? Erik Darling here with Darling Data, and we are going to embark on an office hours adventure for this video. I hope that you find it enjoyable. So if you want to ask questions that I answer on these things, there’s a link here that allows you to do that. It’s a wonderful deal. You get to submit a question for free. I get to answer it for free. And, uh, I don’t know, I guess that’s, that’s about the end of it. Uh, if you want to support this channel, you can also, you can sign up for a membership for as low as $4 a month. Uh, you can do that down in the old video description. Uh, otherwise, uh, you can just sit there and, uh, get it all for as much as it costs you to, I don’t know, have the internet, pay for electricity, all that other stuff that kind of goes, goes along with, uh, watching things online. Uh, if you need SQL Server consulting help, I have, of course, in the best in the world at all of these things outside of New Zealand, of course, the lawyers make me say that every time. Uh, and as always, my rates are reasonable. If you would like to get my performance tuning training, uh, there’s all 24 hours of it available to you via the everything bundle, which is everything that I’ve done about performance tuning. Uh, and you can get that for about 150 USD and that’ll last you for the rest of your life.

So, you know, it’s kind of a good one-time purchase there. Again, as we get into these summer months, when you, uh, want to just lock yourself in a room with an air conditioner and not deal with anything. Well, what, what better way to do that than get some SQL Server performance training while you’re at it? Who needs music and movies and whatnot? Joy. Get better at databases. All right. Uh, my new T-SQL course, uh, all 23 hours of the beginner content. The best event is out there and available. It is currently $250 on the pre-sale price. It will double in value to $500 when the advanced material is done, uh, over after the summer. Uh, I will also be leaving the house a lot this summer. I will be leaving my, my pleasant air condition and life. and I will be traveling to the faraway land of New York City, August 18th and 19th, for the Pass on Tour series of events.

Also going to Dallas, which hopefully is air conditioned, even though it’s all September in Texas. Yeah, that’s a thing. September 15th and 16th, the Hamlet of Utrecht in the Netherlands, October 1st and 2nd, and of course, Pass Data Community Summit in Seattle, November 17th to 21st.

All good things coming up over the summer and fall months. But with that out of the way, let’s do some office hours in here. All right.

We have some questions. We have some very important questions to answer. I know using Macs for columns has downsides, but what about 4,000, 8,000? I assume you mean in VARCAR 4,000 and VARCAR 8,000.

Yeah, of course there are downsides. I mean, you know, your developers get to be lazy and perhaps not deal with truncation errors when they insert their dirty data sources into your beautiful database.

But, you know, you also run into some stuff too where, you know, you can’t have those columns in the key of an index, which might be important at some point.

You know, when you select data out of that table, if you, you know, need, if there’s any memory grant, it will be inflated by those much, much larger string columns.

So, I can’t really, you know, and this isn’t me being like a disk cheapskate, being like, disk is expensive. You should always use the right data type because disk space, blah, blah, blah, blah, blah.

This is like practical performance stuff. Like, like, don’t do it. Still, still a bad idea. Don’t, don’t, don’t hose yourself with overly long string columns because they can come back to bite you in many ways.

All right. Oh, here’s an interesting question. What is the hardest subject to teach people about? Oh, without a doubt, it is isolation levels. That is the toughest material to teach people about because almost no one is approaching it mentally from the, the right perspective.

And, uh, they often come with a lot of preconceived notions, uh, about isolation levels that require the, require unteaching before you can actually, apply any new teaching.

Um, there is a lot of really, really bad blog content out there about, um, especially role versioning isolation levels that should be thrown in the dumpster and set, set, set ablaze.

But, uh, you know, that’ll be the day. Um, all right. Uh, have you ever had a consulting engagement not go well? What happened?

Um, so yeah, of course not, not, not all of them are as successful as, as some of, some of the others. Um, I think what ultimately makes a consulting engagement unsuccessful is when someone needs help, but is not ready to get help.

Um, it’s, it’s sort of like, you know, you, you go in there and you can give them the analysis and you can give them the stuff to do when you can show them like, you know, Hey, if we change this query to do this, this will get better.

If we change this index to do this, like you can, like, you can like, like show them like, like real proof that this stuff happens, but like there might not, like they might not follow through on anything. Right.

Like they just might not end up doing anything. And, uh, you know, like they’ll, they’ll still like hit you up later and be like, Hey, you know, we’re still having problems. And you go look and just like, nothing’s been done.

And like, you know, it’s, it’s frustrating for you because like, you know, like you, you, you, you’re, you’re, you’re giving these people everything, all the tools that they need to solve problems and be successful and like get things done. But there’s just no follow through.

There’s no willingness to actually make changes. And when you ask them why it’s just, Oh, we haven’t gotten to it yet. Oh, we were afraid about this. Oh, you know, there’s a list of excuses.

So, uh, you know, those, those are, those are the, the toughest sort of things to deal with is, um, you know, like people whose servers are just in bad, bad shape, but, uh, there’s just no one willing to sort of take responsibility for it.

You know, actually like go through and do stuff. And, um, you know, when, when you, you’re like, well, is there anything you want me to do? And they’re like, well, I don’t know.

We’ll see. Like, you know, it’s just hems and haws and like the bucket gets kicked. So, yeah, those are, those are probably the worst ones. Uh, all right, let’s see here. I get really frustrated while tuning queries because a technique that worked one time won’t work another time.

Does that happen to you? How do you deal with it? Well, yeah, of course it happens to me. You know, uh, you, you build up this bag of tricks and knowledge and stuff over the years.

And, you know, something that works beautifully, uh, to tune up one query, uh, has no effect on another query, or maybe has an opposite effect, an opposing effect on a query that you’re working with.

Uh, I think, you know, the, the trick is that as you like, like every change is, is, is feedback, right?

So don’t like, like, yes, it can be frustrating, but like, ultimately it’s kind of like what you make of it. So, you know, like if you make, if you try something and it doesn’t work, that’s a data point, right?

That’s you figuring out, Oh, like when I do this, like, you know, this didn’t get the right change or like, this didn’t change the plan, the way I thought it would like, like, what do I need to do next to try and like get past that? And like, as you go through that process more and more, it’ll happen less and less.

So as you sort of like, like sort of like, you know, like expose yourself more to times when something doesn’t work, you start to understand when it doesn’t work and why it doesn’t work. And you start to sort of like, like, like not try it out when you recognize that those circumstances are cropping up again.

So like, like, you know, like, like it’ll happen to you less. It’ll still happen. It still happens to me where I’m like, well, you know, like, you know, I don’t know what some examples, um, you know, doing like, uh, you know, like, like there are a few things where like, you know, I like things that I, I’m like, Oh, well this, this, this would probably be a meaningful change.

Like changing like the left join with a null check to not exists, or sometimes doing like a, like a, like a introducing a top into something to get a row goal, or sometimes using like cross apply, uh, when I’m generating a row number, rather than like joining to, uh, like a whole, like derived join, um, you know, introducing batch mode, like stuff that, you know, like usually has, uh, like a fairly quick and obvious, uh, performance when there’s a feedback mechanism.

Sometimes you try it and you’re like, Oh, well that, that, that didn’t, that didn’t go the way I thought this time. Uh, you know, you kind of go back to the drawing board. Uh, but you know, again, the, the more you do that kind of work, the more you can sometimes recognize like the situation that like, like made it unsuccessful and kind of skip over that.

Or I don’t know, maybe just try it, but be a little bit more prepared for failure. I don’t know. That’s another, that’s another good, uh, thing to master is, uh, being prepared for something to not work out.

Right. Cause sometimes things don’t work out. Speaking of not working out, there we go. Wow. Do you ever want to get out of tech and working with databases? Uh, yeah, of course.

Um, you know, I mean, as, as much as I enjoy it, it’s, uh, it’s hard to imagine doing this at like 70, 80 or something. Like that’d be kind of wild.

Um, but like if, if I, if I got out of tech and working with databases, I think, um, what, what I would, what I would love, even, even though I realized it’s not a money printing machine, I would love to own a gym, uh, just, you know, like, like real basic barbell equipment, not a single Nautilus thing, no leg press, no leg extensions, no, none of like the goofy stuff that I hate.

Uh, like when I go to a gym, uh, just like, you know, barbells, squat racks, benches, um, you know, limited set of dumbbells, uh, and just no, no real nonsense is, is, is, would be, would be my take on it.

So, maybe someday, who knows? Anyway, uh, that is five questions. One, two, three, four, five. Uh, we did it. We made it through. We didn’t swear. We didn’t curse.

Congratulations. Uh, we did a great job on that. Uh, thank you for watching. I hope you enjoyed yourselves. I hope you learned something. I will see you, uh, in tomorrow’s video, which I think will be something to see equally, but who knows?

All right. All right. Thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Learn T-SQL With Erik: Ordered Deletes and Table Expressions

Learn T-SQL With Erik: Ordered Deletes and Table Expressions


Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Learn T-SQL With Erik: Identity Annoyances

Learn T-SQL With Erik: Identity Annoyances


Video Summary

In this video, I delve into some common annoyances related to the `IDENTITY` property when working with temporary tables in T-SQL. Specifically, I discuss how using an identity column can hinder parallel inserts and provide alternative methods for generating auto-incrementing IDs that maintain query performance. I also highlight the benefits of using a `ROW_NUMBER()` function instead of relying on the `IDENTITY` attribute, especially for larger insert operations into temporary tables. If you’re working with heavy data movement queries or need to optimize your T-SQL scripts, this video offers valuable insights and practical solutions. Whether you’re just starting out with SQL Server or looking to refine your advanced query tuning skills, there’s something here for everyone.

Full Transcript

Erik Darling, your friend from Darling Data. I don’t know, like at this point, I don’t know, I either need to change my name or the company name and spice up these intros a little bit. Something new to say. This is going to be another video of material from my new T-SQL course.

This video is going to be, I don’t know, hopefully a rather short one to just go over some annoyances with the identity property when you are trying to insert data into a table. And not for like single row inserts, but let’s say, you know, you’re creating a temp table that you want to work off of and, you know, you need some incrementing number in there.

Some stuff that I run into, well, like performance tuning queries to do that. All 23 hours of the beginner content is live and published. The course is at the pre-sale price of $250. It will double in value.

After the summer when the advanced material comes out. So I do, I would urge you to buy now and save yourself $250. Unless you are, unless you’re just like, if you’re psyched on giving me an extra money, cool.

But I appreciate that. But, you know, if you want to save yourself some money, get in there, get in there sooner rather than later. So let’s talk about these identity annoyances. Now we don’t have a ton to cover here. It’s just some example queries that I want to show you.

So one thing that I lean on a lot when I am tuning queries is, you know, like, like temp tables are a wonderful mechanism for making queries go faster. You can materialize a result set. You can, you know, create indexes across, you know, a combined set of columns. Like if you have to join two tables together, you can put those two, you can put those rows into a temp table.

And that would allow you to create an index across columns that might come from different tables, which ordinarily you couldn’t do. And would make maybe your performance tuning or just general indexing job a little bit tougher to make things go faster. So there are very, very good reasons to put data into a temp table.

Something that I see a lot of people do is add an identity property to a temp table. It’s never entirely clear to me why, but I see this quite a bit. And, you know, like later in queries, it does end up getting used a little bit for stuff, which is, you know, of course, curious to me.

But there are better ways of giving yourself that same outcome without hampering the performance of your queries. So just for example, let’s say that we create a temp table with the identity property, and then we try to insert into that temp table. Now I’m using the tab lock hint here because one thing that is very useful generally is to like when you’re inserting into a temp table and the goal is to get a fully parallel insert is to use the tab lock hint with the insert.

It doesn’t always work. Sometimes select into does work better. But in our case, if we are if this is our big attempt here, then, you know, one thing that will prevent the insert from being in the parallel zone is going to be the identity property.

If we look at this query plan, what we would hope for is that this gather streams operator would be way over here. Right. And that the table insert would happen within the parallel zone, like over here.

But we don’t get that. Right. The this this query does scan the post table in parallel, does aggregate the data in parallel. And but then we we exit the parallel zone early and we repartition streams.

Like I said, the identity property is something that will prevent you from being able to do a fully parallel insert. Right. And there’s another sort of silly way of doing this.

And that’s when you say select into like this. And I don’t know why this I need to change that keyboard shortcut. But if you do something like this, you can you can also say ID equals identity integer one one.

And you can create an identity on the fly. The thing is, this is this is one of the few things that will like stifle the insert from being parallel. We’re going to see the exact same estimated plan here where the gather streams ends early, the in the table insert is outside of the parallel zone.

So this is not what we want to see again. Again, this compute scale are right here is going to be where that identity starts getting assigned. And this is what causes this is that requires the parallel zone to end where it does.

You can do this yourself with a row number with by using row number. Remember, row number starts at one and just counts up until you run out of rows. So this is often a much faster way of giving yourself what is essentially an auto incremented ID column, but with what with better performance along the way.

Note in this query plan, when we get the estimated execution plan, this is what we wanted to see in the first place. We have everything happening, everything that we care about happening within the parallel zone. So there is the clustered index scan.

There’s the aggregate. In this case, we need to sort some data in order to generate the row number. But, you know, you can index your way around that. I just haven’t gone that far with this simple demo. So then we use our window aggregate in order to produce the row number.

And then notice here, this is what we this is what we cared about last time. This is what we cared about in this video that wasn’t happening in the last couple of queries where the table insert is before the gather streams. So all of the rows that end up in this table would have been spread out across multiple threads.

And this can really, really help larger inserts into temp tables. Small, small row, small row count inserts is probably not going to make a very meaningful difference. And depending on like what your max stop is, sometimes it really helps to bump max stop up to make sure this goes quickly.

But like, you know, there are some like factors involved with figuring out if this is worth pursuing. But when it is, this can make a very, very big difference into how fast rows end up going into your temp table and getting loaded in from the was part of the insert. So just something to consider when you’re tuning queries, if the speed of your insert into a temp table is not what you want.

You might try the tab lock hint. Of course, if you’re going to use a tab lock hint, there are many things that you like. Like you should really just start with a insert into a heap temp table and then add in any indexes later that you want.

But this does have some ramifications around like plan caching and reuse and recompiles and stuff. So, you know, but if it’s the type of thing where, you know, it’s a process and you’re loading a lot of rows in and it’s not like, you know, some old crazy OLTP query that needs to run every, you know, three mil, three milliseconds or something. Then this kind of like larger effort tuning is worth it.

So something to think about when you’re tuning your queries, especially your big, big heavy lifting data movers. If you want to get the fully parallel insert much easily, much more easily. It can be a very, very neat sort of workaround to just use row number instead of an identity attribute.

Get yourself that auto incrementing thing. Create whatever indexes you need later, perhaps, and just use that. Anyway, thank you for watching.

I hope you enjoyed yourselves. I hope you learned something and I will see you in the next video where I think are we talking about? We might be talking about something sequency. I forget.

I don’t know. I don’t want to go look and spoil it for both of us at this point. Anyway, thanks for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Learn T-SQL With Erik: SELECT INTO

Learn T-SQL With Erik: SELECT INTO


Video Summary

In this video, I delve into some lesser-known aspects of T-SQL’s `SELECT INTO` statement, sharing practical tips and insights that can be particularly useful when working with table backups or creating temporary tables. I explore how to use `SELECT INTO` for schema-only copies, which is handy for quickly duplicating a table structure without the indexes and other metadata. Additionally, I discuss some of the quirks and limitations associated with using `SELECT INTO`, such as handling nullability issues and preserving identity columns. By walking through these examples, I aim to help you better understand how to leverage this powerful T-SQL feature in your database management tasks. Whether you’re a beginner looking to expand your T-SQL knowledge or an experienced DBA seeking new tricks, there’s plenty here to discover and apply.

Full Transcript

Erik Darling here with Darling Data. And we are going to talk a little bit more about T-SQL today. We’re going to do some stuff about Select Into. This is, of course, some teaser material from my T-SQL course, Learn T-SQL with Erik. That’s me. All 23 hours of the beginner content is out there in the world, floating around for you to watch and start to learn from. And this is, of course, still at the presale price of 250 US bucks. And that will be going up to 500 bucks when the course is fully published after the summer. And just to reiterate, this is good for life. So you will not have to renew or resubscribe this purchase at any point. Anyway, let’s talk about Select Into a little bit. Now, there are some fun things that you can do with Select Into if you want to create a backup of your table. For example, you can say Select Top Zero and this will just make a schema-only copy of your table. Of course, schema-only has some limitations. It’s not going to bring over indexes and stuff like that. So if you need, like, a schema-only copy of your table, you can just make a schema-only copy of your table.

You might need to script out. It might be easier to either script out the whole thing or just script out additional items after you’ve made the schema-only copy. But for most people, just the schema-only copy is enough. I actually use this technique a bit when I am writing my diagnostic queries because it’s fairly easy to, you know, say, like, select into a regular table from, like, a DMV that may not have, like, a create table statement for it. And then I can script out the created table from it and, you know, just kind of, like, validate some stuff to make sure everything came out all right.

Another thing that you can do in a similar vein is to, say, select into where some, with some where clause that, like, can’t ever possibly be true. So in this case, I’m using where 0 equals 1 to imitate the top, you know, the top 0 thing. You can also do that with a temp table, of course.

And, you know, like I said, when you do this, you have to add any indexes after the fact. Sometimes that’s okay. Sometimes, you know, you’ll want to create the table. You know, there are some interesting downsides that can occur when you do select into, and your select is not just, like, you know, select the columns from a table into something else.

An example of that is something like this. Now, there are, of course, other, like, instances where this can get weird and interesting. For example, if, you know, you’re using, you know, like, like, if you’re, like, combining columns in some way, if you are, you know, concatenating strings or, you know, you have some other expression in your select list, you do have to be careful that whatever the, however that expression is evaluated on the select into results in the correct data type in the final table that you’re putting that data in.

And, like, and this is worth checking because there can be some strange inferences that SQL Server makes when you do select into. So, if you, like, for some select into statements, it is worthwhile and it does make sense to use convert to make sure that you carefully control the data type of what ends up in whatever temporary or staging table you are creating via select into. I think one interesting thing about select into is around nullability.

So, if we run this and we say select into this table, and this doesn’t return any rows, which is fine, we don’t have to. But if we say select all this stuff into this table where account ID is not null, and then we try to create a, we try to create a clustered primary key on the account ID column, even though we’re, in here, we’re saying, like, you know, we’re, like, where account ID is not null, and, you know, we’re, you know, grouping by stuff.

Like, the group by is, like, the group by at least gets us, like, the, like, the unique part of it, but the column still inherits this nullability trait. But there is a way around that, and that is to use is null. This does not work with coalesce.

Coalesce does not give you the same behavior. But if you say is null account ID zero, then, of course, you can get rid of the nullability of this thing, and then you would be allowed to create the primary key on it. Now, this does have some funny limitations.

Well, not really limitations, but just something to be aware of, is that you have to be very sure that you are not, that you don’t have at least multiple nulls in here, right? If you have one null, then it might be okay, unless you have an account ID of zero, because, remember, you’re replacing nulls with a value. So, you might need to use a better canary value, depending on what you’re doing, like negative two billion something, right?

To make sure that at least it won’t, wouldn’t conflict with any naturally occurring values in here that would prevent it from being unique. And in this case, because we’re trying to create a primary key on it. So, that’s something to just think about and be aware of if you are going to use this.

Another thing that I think is neat with select into is that, so like when I said that, you know, a lot of the table, like metadata stuff is not carried over when you do select into. One thing that is carried over is the identity attribute. It does get reset, of course, but like the identity attribute will at least be present on whatever column.

So, if we, let’s just say we’re going to take the users table and we’re going to use the ID column. And in the users table, the ID column is the cluster primary key. It also has the identity attribute to it or identity property.

So, if we just make sure this table doesn’t exist first and then we select the ID column into a new table and we use SP help to show the table definition, you’ll see that the ID column does indeed retain the identity property here. Right. So, this is still an identity column in the new table.

It, you know, does, it is seeded back to one and it does increment by one. So, if you had different stuff for you, like it doesn’t pull over all the, like the current seed value and stuff from your other table. But there’s a way to get rid of that.

And that is if you, on your select into, just say something like select ID equals ID plus zero, then that turns into an expression and SQL Server no longer attempts to, I don’t know, maybe attempts to is the wrong word there. But SQL Server no longer preserves the identity attribute for this table. If we look at the results of SP help after running this thing where select, where ID is ID plus zero, we’ll have slightly different results down here where it says no identity column is defined.

So, there are some neat things that you can do with select into. And I chose this portion of the material because, you know, just talking about inserts with, you know, like values and selects and all the other stuff, you know, it, it, it, it, it’s neat, but this stuff is a little bit neater to me. So, just some fun T-SQL tricks and quirks around select into for you to think about and be aware of perhaps when you are writing queries that use this.

I hope you enjoyed yourselves. I hope you learned something. And I will see you over in the next video.

We’re going to, we’re going to talk a little bit more about the identity, the identity attribute or the identity property, however you prefer to, however you prefer to call it. And some of the effects that it can have on performance when we are doing inserts or select into, however you want to call it. Anyway, thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Learn T-SQL With Erik: Expression SARGability

Learn T-SQL With Erik: Expression SARGability


Video Summary

In this video, I delve into the concept of sargability and its limitations in table expressions like Common Table Expressions (CTEs) and derived tables. I explain that these constructs do not materialize their results, which is a common misconception among T-SQL developers. By examining execution plans and demonstrating with examples, I illustrate how SQL Server processes these queries without materializing the intermediate result sets, leading to potential performance issues. The video also covers the importance of using computed columns for indexing expressions within table expressions, as this allows for better query optimization and improved performance.

Full Transcript

Hey! It’s still me, Erik Darling, with Darling Data. You are still learning T-SQL with Erik. That’s me. This is, of course, the teaser material for my T-SQL course. It is currently $250. It will go up to $500 after the summer when the advanced content is done and recorded. All 23 hours of the beginner content is available, and you can start watching that now. And in this video, we are going to talk about sargability to a somewhat limited degree because there are many things to talk about with it, but of course, there’s only so much time we have when we’re doing a teaser. So, let’s get on with that. Where is SQL Server? I’m still not used to this new icon down here yet. That’s still very strange to me. I don’t quite know if it’s like some harnessing or a saddle or infinity or an hourglass. or a mobius strip. What are you going for with this logo? I don’t get it. It’s just weird blob. Anyway, the thing that I want to focus on in this video is that there is some delusion that remains amongst T-SQL queriers out there in the world that if you put some expression into a table expression, well, we’re going to use CTE here as a convenient vehicle for this example, but the same thing is true of derived tables and pretty much anything else that you just query all in one shot, that if you put some expression into one of these that you materialize that expression when you absolutely do not. There is no materialization of anything that you have to do all in one go.

So, let’s turn on execution plans up here. And we’re going to look at a query that says, select age underscore i. This is the CTE version, of course. You can tell because it starts with this foolish width. And it replaces all, remember the age column in the user’s table is all nulls. So, it replaces all the nulls with zero. And then we select a count out here where this column in this CTE equals zero. So, that is our where clause down here. Now, if we did this and we expected some materialization of the result, we might expect to see something in here like a spool or something else that noted that like a result, like something from our results was put into a temporary structure in the query.

But we do not have that. All we have is a clustered index scan and a stream aggregate which takes care of the count that we did. And if we hover over the clustered index scan, you will see that this predicate is applied where is null, yada, yada, yada. So, there is like this did not get materialized here. And it’s the same thing if we were to use a derived table, right?

So, if we say select count from and then in here we nest our expression and then we stick a where clause on it to say where this equals zero, we will see an identical query plan here. We’ll see where there is no like materialization of anything. There is a stream aggregate and there is a clustered index scan. And if we hover over this, we will see the same predicate applied here.

Now, this isn’t to say that you can’t, you can’t like fix this. It is to say though that table expressions cannot be indexed. You can put an index on the underlying table, but if you wrap your column in a function, you’re going to ruin how well that index can be used. So, like we don’t have an index here, which is whatever, because there’s not a point at creating one, right?

We wouldn’t be able to seek to anything in it anyway. If we put an index on the age column and we wrap that column in is null or coalesce or like left, right, replace, upper, lower, L trim, R trim, len, data len, any one of those things. SQL Server can no longer seek to rows or values in that index. So, don’t do that.

The way that you can tell if, like, you know, if there were like, if we were really going to go on board with like materializing result sets within derived tables or CTE, then it would be like an additional layer of being able to define indexes for that result set, right? Like that might be kind of fun or not. I mean, I think that would just be, that would be like an eager index fool, wouldn’t it?

I hate, we hate those things. But, you know, like Microsoft, we don’t have CTE or table expression, like derived table, table expression materialization. There’s not a hint for it.

There’s not a, at least as far as I know, there’s not a trace flag for it. So, we can’t do that. Like I said, we can put indexes on the underlying tables and the optimizer can pick those up and use them. But there’s no way to define an index on a table expression like this.

And likewise, there’s no way to materialize a table expression like this to like store the result of it somewhere. So, what you have to do if you want to fix something like this is you have to add a computed column that gives you that expression, which then would allow you to index that expression, right? And we can do this for free and we can do this for, well, not as free, but it is pretty easy for just a single column there.

And now, if we run either of these, we will be able to seek to those values because we created a computed column which mimics this expression and then we indexed that expression. But without that, we can’t do any of these things. And SQL Server does not lift a finger to help us one bit.

So, when you’re considering like, you know, query performance and things like that, a lot of the stuff that people will bury away into CTE to make their queries more readable or that they’ll, you know, like they’ll stick in there and think, oh, I have this magical like, you know, like result set that I can just refer to over and over again. It’s just absolute nonsense. So, don’t get stuck in that train of thought because you will end up writing some of the worst performing queries in your entire life.

Anyway, thank you for watching. I hope you enjoyed yourselves. I hope you learned something.

And I will see you in the next video, which I don’t, I forget. I think we might be getting into like modification queries next, which is exciting. It’s very exciting. Don’t discount the fun and excitement of modification queries.

After all, none of us would have up-to-date resumes if it were not for update, delete, truncate, drop table, all that other good stuff, right? Yeah. How else do you keep those things fresh?

Anyway, thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

SQL Server Performance Office Hours Episode 24

SQL Server Performance Office Hours Episode 24



To ask your questions, head over here.

Hi Erik! Can you demonstrate or share topic about HA Setup with Two Database Node and one AG Listener?
You made a comment recently on LI about switching from int to bigint not being as easy as it sounds with compression tricks. What problems have you hit?
If you worked at Microsoft, what would you do with SQL Server to make it competitive with Postgres?
Do you ever offer discounts on consulting or just training?
Do you plan on offering Postgres training or content?

Video Summary

In this video, I dive into some interesting questions from our community and share my insights on various topics. First, I address the challenge of setting up a High Availability (HA) environment with two database nodes and one AG listener, explaining why it’s not as straightforward as it might seem. Then, we delve into the complexities of switching data types from `int` to `bigint`, particularly when dealing with compression techniques. I highlight how most real-world scenarios are more complicated than simple table setups, emphasizing the need for careful planning and potential dependency issues. Lastly, I discuss my hypothetical role at Microsoft if I were to help shape SQL Server’s future, focusing on modernizing T-SQL and enhancing query store features to better support monitoring and performance tuning. It’s a lively session packed with practical advice and candid thoughts!

Full Transcript

Erik Darling here with Darling Data. And of course, we have five questions to answer, so we are going to office hour. Or as the ancient Egyptians said, office horus. Alright, still working on that one. Anyway, if you like this channel and this content, you can sign up for a membership. Link in the video description. Link in bio, as they say. If you want to ask questions, privately, that I will answer publicly on these office hours episodes, you can go to this link, which is also in the video description, and you can ask me questions. If you need some help beyond what asking a question on this whole YouTube thing can possibly get you out of, maybe you’re just in a lot of trouble. I am still a SQL Server consultant. I do all these things at a very reasonable rate. And I am available for hiring. So, get it well done. It’s hot. If you would like to get my performance tuning training, all 24 hours of it are available to you for about 150 USD at that link with that discount code. If you go down to the link down in the video description, you will be very happy to know that it is all wound up in there for you. And I also have my new T-SQL course available, Learn T-SQL with Erik. All 23 hours of the beginner content is fully available.

to you. It is down at the pre-sale price of 250 bucks, which will go up to 500 bucks after the summer when the advanced content is recorded. Speaking of summer, boy, oh boy, we’re wearing a lot of shorts out there. I am going on tour with Redgate on all the Pass On Tour events, as well as Pass Data Community Summit. Pass On Tour is, of course, hitting New York City, Dallas, and Utrecht. And Pass Data Summit is, of course, hitting Seattle, November 17th to 21st. So, get in on those if you are in the nearby area. But let’s go answer some questions now and let’s get through all this stuff. All right. First question. Oh boy. Hi, Eric. Can you demonstrate or share topic about HA setup with two database node and one AG listener? Hell no.

I don’t ever talk about that stuff. I don’t care for AGs. I don’t care for boring setup tasks like this. I don’t care to do walkthroughs. There are other people who do. God bless them. They like this stuff. I am all query performance. That is not something I want to get into. All right. Next up. You made a comment recently on L.I., I believe that’s LinkedIn for the folks at home, about switching from int to big int not being as easy as it sounds with compression tricks.

What problems have you hit? Well, if you saw me make that comment, then you may have seen the video link that I put in there about my thoughts on it. The biggest problem is that, you know, every blog post that tells you about this one weird trick, it’s like a table with like just a clustered and not a clustered primary key, but a clustered index on it. That’s the only index. And it’s just like, oh, magic with row compression or page compression. I can just switch from int to big int without doing anything.

The problem is most of your tables aren’t going to have just one index on them. And even if you rebuild all your indexes with row or page compression, if like you like, you’re going to have all these weird object dependency issues that you have to sort out. So it’s like not only do you have to like, like either drop off a bunch of indexes that that that column is a part of, but like, you know, anyone’s remaining, you have to add, add the compression to, and then maybe you can do it all very easily online. But most people don’t have that simple of a setup. So it’s like, it’s near useless.

I really wish Microsoft would just make this a metadata only change where new pages assigned to the table would be able to use would be designated with the big int. And you could either like rebuild the whole thing later to align everything to a big int or just let new pages come in with with that thing as a big int. But, you know, instead, we got fabric, stupid ass fabric.

Here’s a neat question. If you worked at Microsoft, what would you do with SQL Server to make it competitive with Postgres? Well, you know, like the reason why Postgres is competitive is because it’s free.

Not a lot of people are like feature shopping Postgres and being like, wow, like this, this spec sheet just beats the pants off SQL Server. Or let’s, let’s go with this, like, you know, like, let’s go with this champion work, like racehorse. Most people aren’t go, aren’t, aren’t, don’t, don’t, don’t do that.

Most people are like, does it tick enough boxes that I can use it as a data as a relational database? Yeah. Oh, okay. Great. You know, cool.

Let’s, let’s, yeah. Great. It’s got tables. It can select data from them. It can insert, update and delete. Yeah. All right. Great. Like, what else do I need? Like, you know, SQL Server is expensive, right?

Standard edition, $2,000 a core and Microsoft still hamstrings you with crappy hardware limitations. Enterprise edition is seven grand a core. And, you know, God almighty.

Look, if you look at Microsoft’s pathetic offerings for like platform as a service SQL Server, what are these VMs? Like even with that new knob for like adjusting memory on its own, like you’re, like you still have, like you can still only make it in like one of the available SKUs. Right?

Like you can’t like, like you can’t handcraft it. You can’t get a server with like eight cores and like 560 gigs of memory. Speaking of which, 560 gigs of memory is still the top one. You can’t get a terabyte of memory for SQL Server.

It’s like Microsoft doesn’t even know its own product. It’s like, what are you doing? But like if I work there, you know, obviously like the big things with SQL Server that, you know, people complain about are like the sort of like half-assedness of a lot of the things. You know, like a lot of features that have been around for a long time that, you know, got pushed to like 75, 80% done and then never touched again.

Aside from like maybe bug fixes. T-SQL like hardly sees any improvement. You know, we got all this vector crap, but like T-SQL itself like barely got any like meaningful improvements for SQL Server 2025.

Like there’s just so much stuff that, you know, that other databases are eating Microsoft’s breakfast, lunch and dinner on like, like from a development point of view that that’s really, that’s really where they ought to focus. Or that’s really, rather that’s where I would focus is, you know, just kind of like the modernization of T-SQL. And I think, you know, if like, if I, if I really wanted to get in there, I would, I would make query store much more of a, of a monitor, like a legit monitoring repository than it is.

You know, right now it collects query performance data, but it doesn’t collect stuff like blocking or deadlocks. It doesn’t collect any like, you know, like, like it doesn’t collect weight stats. Well, I think the nice way of putting it, you know, there’s, there’s a lot of stuff that, you know, like as a consultant or, you know, like, you know, you as a DBA might monitor and look at in SQL Server that is not centralized in any way that makes it easy.

Uh, and I don’t understand why Microsoft doesn’t try to make it easier for like, like, let’s just say the average person who’s like, I don’t know what I’m doing. Please just give me an answer. Like, just give me something like, like, like has to now become an expert in like five different things and like learn like XML querying to like, you know, get any good data out of, out of SQL Server.

So like, you know, there’s, there’s all that stuff there, but as far as like what, you know, competing with Postgres, because it’s hard to compete with free, especially hard to compete with free when like free is out featuring you on a whole bunch of stuff. So, uh, you know, but Hey, we got fabric.

Uh, God almighty. These people never learn. All right. Uh, do you ever offer discounts or on consulting or just training? Uh, I am a wheeling and dealing man.

And if, uh, if, if, if you, if you email me directly and you want to, you want to wheel and deal on, on pricing for things, I will wheel and deal all day. I’m happy. I’m happy to do that.

Uh, I, I, I do. There are limits to how much wheeling and dealing I will do, but I will always, uh, wheel and deal. All right. We have another, another Postgres question here.

Jeez Louise. Uh, do you plan on offering Postgres training or content? So here’s the thing. Uh, I could certainly play with Postgres enough to learn stuff about it and produce training content on it. However, you know, part of the comfort there is part of the comfort that I have with producing the training content on SQL Server is, uh, experience.

Right. Right. So like, you know, I’m, I’m not run, like, I’m not obviously not running Postgres and production anywhere.

Uh, I’ve never worked with Postgres live in production and, you know, there are certainly, you know, like. Things that you can still be technically correct about and you can still like read the documentation and maybe put together in a friendlier way. Or even offers like, well, if you use SQL Server and you want to learn Postgres, here’s A and B, like stuff like that.

But, you know, um, you know, especially from a, like a performance tuning perspective, um, you know, the, the type of stuff that you, like the type of stuff that makes for good content is often the type of stuff that you figure out. Uh, working with like real production workloads and seeing real meaningful differences there. Um, you know, that’s also, you know, where you learn a lot of the sort of, uh, limitations, restrictions, uh, you know, just stuff that, um, stuff that is not easy to find just running a query and like, you know, on a server, uh, you know, type, type things.

So, you know, I’ll never say never, but, um, you know, like it’s, it’s, it’s something where, uh, I would, I would need to be exposed to it in like a, a, a realistic way. I think before I felt confident in thinking that I could train others on like how to, how to, how to do things best with it from a, from a performance point of view. All right.

Well, that’s five questions, I think. One, two, three, four, five. Yep. We did it. Hey, good job us. All right. Thank you for watching. I hope you enjoyed yourselves. I hope you learned something and I will see you over in the next video where I don’t know, whatever it is, figure it out when we get there, I suppose. All right.

Thank you.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Learn T-SQL With Erik: NULL Trickery

Learn T-SQL With Erik: NULL Trickery


Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Learn T-SQL With Erik: DISTINCT FROM?!

Learn T-SQL With Erik: DISTINCT FROM?!


Video Summary

In this video, I dive into the exciting new `DISTINCT FROM` syntax introduced in SQL Server 2022, which simplifies how we handle nulls and equality checks in our queries. While it might not be as revolutionary as some of the other features that have been added over the years, it certainly offers a more elegant way to write certain types of queries. I walk through an example where `DISTINCT FROM` is used to replace clunky syntax for checking null values, demonstrating how it can improve readability and potentially performance by making us eligible for hash joins. Additionally, I explore its potential use in handling optional predicates, highlighting both the benefits and limitations of this feature.

Full Transcript

Erik Darling here, Darling Data. Woo! Look at those, look at those Darling Data colors, those beautiful teals. I wish that I could find adidas t-shirts with that same teal on the logo. Then it’d really, really be a company color man. Anyway, in this video, we’re going to talk about something, I guess, sort of new. I say sort of new. It came up with SQL Server 2022. And it is the, is it, well, the, the, the, distinct from syntax. So there is, is distinct from, and there is, is not distinct from. And we’re going to talk about that one here. Uh, even though, uh, you know, uh, SQL Server 2025 did not add anything nearly as cool as, is, is distinct from, even though distinct from is, is only kind of cool, uh, for, because like, when, I don’t know, we’ll, we’ll talk about in the video. Anyway, uh, all this course material is available at the presale price. And if you have a price of $250, it’ll go up to $500 at the end of the summer. So hop on that potato now before, before it, uh, gets, gets old and soggy. Cause you don’t, you don’t want to eat that. Anyway, uh, let’s talk about, uh, distinct from. So, uh, this, this query, if you watch a lot of my, if you watch my other videos, this query might look sort of familiar. I talked about, um, something like this in a video the other day, uh, where we had to write the query like this.

In order for it to be eligible for a hash join, right? So we write this query and we run this query. SQL Server will count, uh, will join the table, uh, post to the table comments on owner user ID being equal to user ID where, uh, owner user ID is null or they’re both null, right? Which is kind of nasty, clunky syntax. What distinct from allows you to do. And why, why it’s sort of cool is because you can just do this instead.

So you can say, select count from post join to comments on owner user ID is not distinct from user ID. And the distinct from stuff handles nulls without you having to write a whole bunch of extra or predicate nonsense. So that’s, that’s kind of nice.

And if we look at the estimated execution plan, we will see that we, we, we, we do, we are indeed eligible for a hash join. And if we run this, we will eventually get correct results back. And, and that, that is the same number that we got back last time.

So that’s, that’s, that’s neat and dandy there. Uh, prior to this, if you didn’t feel like writing all this wacky syntax for that, for, to check for this, you could write the query with a join on exists.

The number of people who I’ve met in my life who are aware that you can use an exists predicate in a join clause. It’s very small. The number of people who I know who are aware of the fact that you can write an exist clause with select a column with no from clause intersected to selecting another column with no from clause is even smaller.

In fact, the, the, the Venn diagram of people who know this, that this is a thing at all. And the people who need to buy, uh, my course content, uh, is, is actually a complete overlapping match, right? There, there are like you, you, you are in dire need of, of learning.

Uh, maybe not this specifically, but just T SQL in general. So if we, if we look at the query plan for this, we will see that we are still indeed eligible for yield hash join. This is just fine, right?

This is just an alternate way of expressing this. But this is just another one of those things where, uh, handling nulls gets very, very tricky in, in databases in general. SQL Server is not like a weird outlier in this regard.

Uh, if you look through anything about Postgres, Oracle, DB2, Snowflake, DuckDB, any of the new cool kid databases, what’s the other one? Red, Red Shaft or something. Uh, they all have the same weird null stuff going on that you have to be aware of.

So, but there are ways to write queries that are a little bit more, oh God, am I going to say it? They’re a little bit more elegant when it comes to handling nulls. So you could explicitly say join on this, or this is null and this is null.

Or you can say join on this is not distinct from this. Or you can say join on exists, blah, blah, blah, blah. Now, when I was initially talking about distinct from, I was, I was talking about how like, you know, like it’s only sort of cool, right?

It’s nice that it, you can do that with it. You can shorten up some queries. You can still get, you can still get good performance and all the other stuff.

But like one of, like one of the common things you’ll see when people start talking about, um, is distinct from, or is not distinct from, is the ability to sort of like handle optional predicates with it. So like, or optional parameters. So like, you know, like this, this right now, this is 22656.

And if we run this, we’re going to get this, this number back, right? And like, it’d be actually grab the execution plan for this. And if we look, this is, you know, we scan this table and we look at some stuff.

And like, you can, what you can do is you can say, well, null here, right? And you can say, oh, well, if this is null, then, you know, we’re going to do this. And, you know, you can put in is not distinct from, and you can do this, all right?

You can get some rows back. And what is neat is that when, with, you know, you can get a seek from some of these plans. And if we put 22656 back in here, then we’ll, you know, be eligible to, you know, get going and seek to some stuff.

But, like, the reason why I think this is only sort of cool is because, like, when you see something like this and you’re like, like, your mind immediately starts spinning and turning. And you’re like, oh, my God, I can go replace all these, like, optional parameter value searches that I have with, like, is distinct from or is not distinct from. And my life can get a lot easier.

The problem is that this only works like an equality predicate, right? And as soon as you get into adding, like, trying to, like, get, like, and or logic correct with this, you stop getting the results that you think you would get back. Like, stapling more stuff onto this gets real, real, real tricky to the point where I’m not even sure that, like, it means the same thing no matter how hard you try.

So, like, you know, like, think about all those optional parameter predicate queries you have where, you know, you might have some equality predicates in there, but then you’re also searching on, like, a date range or, like, a like or, you know, something like that where it’s not just a direct equality. This stuff just doesn’t handle those cases. So, it’s, if you truly just have a query that looks like this, and sometimes, you know, user ID will be null and someone wants to return, like, everything, and sometimes user ID will have a value and they just want to return results for that value.

Cool. I guess you can use it there instead of doing, like, you know, one of the billion other things that you can do that don’t require SQL Server 2022 to do it with. But, you know, I find the number of those cases to be very, very, very small. So, you know, when I said it’s just, it’s kind of cool, yeah, it is kind of cool.

It’s worth knowing about because it might save you some time and trouble for some other queries. But for this specific thing, it just doesn’t scratch the itch that I have. Anyway, that is just a little bit about Distinct From.

Thank you for watching. I hope you enjoyed yourselves. I hope you learned something. And I will see you in the next video where we will talk a little bit more about T-SQL. All right.

Thank you.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.