How To Write SQL Server Queries Correctly: INTERSECT And EXCEPT
Thanks for watching!
Video Summary
In this video, I dive into the lesser-known but incredibly useful `INTERSECT` and `EXCEPT` operators in SQL. These operations are particularly handy for comparing columns across multiple queries without the need for verbose and error-prone null handling logic. By exploring these operators through practical examples, you’ll see how they can simplify complex comparisons and improve query performance. I also delve into their unique operator precedence rules and discuss when to use them effectively in your SQL queries. Whether you’re a seasoned database professional or just starting out, understanding `INTERSECT` and `EXCEPT` will give you a powerful toolset for writing more efficient and readable code. Additionally, I touch on the recent additions like `IS DISTINCT FROM` and `IS NOT DISTINCT FROM` in SQL Server 2022, highlighting how they further enhance T-SQL’s capabilities. By the end of this video, you’ll have a solid grasp on when and how to leverage these operators for your own projects.
Full Transcript
Erik Darling here with Darling Data. We’re going to do a great job today. I can just feel it. I’m feeling in my bones. And I can tell we’re going to do a great job because we’re talking about a subject that almost no one ever talks about. Intersect and accept. You might notice that that word looks a little funny. Brevity is the soul of wit, I think. So, you know, these are not things that I just want to say. I see people use in production queries terribly often. Most of the time when I see people use these, they’re trying to compare, like, the entire contents of one table to another and try to figure out where there are differences. You don’t see people use these a lot in, like, smart ways in queries. They would much rather sit there and, you know, figure out if columns match or are both no. Or if, like, like, like, like wrapping things in is null and coalesce like, like the fools they are and these giant or clauses. And, and gosh, gosh, is it nice to just retype those as, as intersect or accept queries because not only do they, they look prettier, but they often perform better. And there’s often a lot of, a lot of room and potential for logical incorrectness.
in, uh, extended and, uh, or clause queries, uh, that, you know, uh, usually you’ll find a couple few bugs in those. So, we’re gonna have fun today. But before we have fun, we need to talk about, uh, stuff you can buy from me. Yeah, cause everyone’s trying to sell you something. I guess that, that includes me. Merry Christmas. Uh, if you would like to become a member of this channel, uh, and, and, and support my, my, my, enduring efforts to bring you quality SQL Server content, uh, there’s a link right in the video description that says become a member in which you can become a member for as low as $4 a month.
If the $4 a month is just too rich for your blood, uh, you can like, you can comment, you can subscribe. There are all sorts of buttons you can push that push my joy buttons, which is just the, probably one of the, the best things you can do in life, right? Uh, what is it? It costs nothing to be kind. So, it says the sages of every social media platform overrun with people, uh, being selectively kind. Uh, if you need help with SQL Server, if you’re looking at your SQL Server and thinking, gosh, this thing is slow, uh, Erik Darling, uh, from Darling Data does all of these things, the best in the world.
Uh, so you can, you can, you can hire him, me, us, as, as a, as a package and get this handsome devil to show up and make your SQL Server faster in exchange for money. And as always, my rates are reasonable. Uh, if you would like to get some training on SQL Server, I have about 24, 25 hours of it. Uh, there is a discount code you can use to get 75% off that brings it to about 150 USD and you get that for the rest of your natural life.
Uh, again, it is the end of 2024. I might just take this slide out because I’m getting sick of saying it. 2025. But, you know, then if I, if I stop saying it, maybe you’ll forget that I, I, I come places and I speak and also in exchange for money. Uh, so if you would like me at your event, tell, tell me what your event is.
With that out of the way, let us intersect and accept ourselves into oblivion. Now, one of the first times I actually ever saw someone use, uh, these operators was, of course, uh, my, my dear friend and my, my wine distributor from New Zealand, uh, Paul White. And, uh, one thing I want to note is, uh, you see that URL up there? Um, it’s, uh, right, right, right about there.
Uh, I want you to just take a quick look at the date in here. 2011. So June of 2011. Uh, so there, originally I was going to put a trigger warning on this because, uh, you know, obviously code from, uh, gosh, almost 15 years ago, uh, does not live up to today’s modern standards of code formatting style choices.
Uh, but instead I have reformatted this code to fit modern conventions, coding standards in T-SQL. Um, I’ve also taken the liberty of replace, uh, in the original, uh, queries, the, the table, the temp tables were table variables. I’ve only, I haven’t changed those for any overarching performance reasons only to make each of the select queries below a little bit more portable.
Okay. Uh, because otherwise I would have to declare and populate and run the select queries for each of the things below. And that, that just gets a little unwieldy. So, uh, we’re going to, uh, just do this instead. Uh, we’re going to make our table variable, our temp tables, not table variables. Uh, and then we’re going to insert some data into them.
Um, and then, uh, so the, the cool thing about Paul’s post is that he walks through like the query that you people, a lot of people would write, uh, that would get them the incorrect results. Right. Like this, because this does not account for nulls. This just returns us one row, uh, for the five that we put in both of these. This messes up anything where there are nulls and, and, and that doesn’t get us what we want back.
Right. This is obviously an incorrect result. Uh, you could rewrite the query. And this is how I see a lot of people do it where they have this gigantic or clause. A lot of ends sprinkled in a lot of people don’t get their parentheses, right?
Or don’t get there, or maybe they like copy, there are copy and paste errors in here. There’s a lot of things that can go wrong when you start writing, uh, non-trivial complicated queries like this. Uh, so this will get you the correct results.
Um, what is a little annoying here is this filter operator. Which expands out that entire or, the, the, all of, all of those or clauses for a couple of five row tables. Obviously this doesn’t make a performance difference, but when you’re dealing with larger data or darling data, which is the largest data you can get, uh, then these things do start to make a big difference.
This does get us correct results, right? This does bring us back everything that we care about. Uh, this will also get you correct results.
But as I’ve said in 10 million videos at this point, wrapping where and join clause columns in, uh, in functions, even built-in ones is just a recipe for performance disaster. Uh, so we can run this. We can get correct results, uh, but we would end up with, um, uh, just a weird sort of bunch of stuff happening, uh, in here.
There we go. So, uh, we have this whole predicate against this table, which I, which I suppose is a step up from a filter operator. But this whole predicate is just a series of case expressions for all the coalesces that we had in there.
So, uh, you can see the various case when, blah, blah, blah. Uh, case expressions are another thing that I, I, I strongly advise against putting into your join and where clause column, uh, predicates because you will, you will likewise be in for a bad time. Uh, there is also an example with isnel, and isnel should look roughly the same, except rather than a series of case expressions, we will have a predicate that is a series of isnels.
This is not necessarily a better situation. They’re both about equivalent performance wise once data becomes of a certain size. So be careful with that.
Uh, this is a much better way of writing the query because intersect will correctly deal with the nulls and it’s a whole lot less typing and a whole lot less error prone. So if we run this, we get back the correct results and we don’t have any weird stuff.
Uh, in our query plan as far as like gross predicates go, uh, all of this stuff just gets evaluated, uh, in a nested, in the nested loops join and we evaluate everything across there.
So it all ends up being fairly easy and straightforward. Um, there is sort of an unfortunate thing where not exist does not, uh, provide us with, uh, the entirely correct results on this.
We get back, um, an additional role that we don’t care for there. Uh, so, you know, uh, this is the best version of the query that I think could possibly happen. Intersect and accept are very useful for these sort of like, you know, uh, extended column comparison things.
So, uh, like I said before, um, I’ve never, uh, seen anyone really use these like when they should have, uh, in production queries.
Um, I think probably part of the problem is that it’s somewhat unclear what they do, uh, when you’re reading through any sort of SQL guide, uh, or rather even when you’re, you’re just looking at like, um, when the linguistic possibilities of SQL, you’re going to look at like, you know, stuff like join and exists and not exists and where and group by and order by and all the other words that come up.
And you’re going to say, huh, those make sense. I can sort of figure out what they do there. Intersect and accept.
it’s not really clear what they do from how they’re named. Um, uh, there are some weird rules around operator precedence with these two things. uh, has, uh, very specific operator precedence rules that other things don’t.
Uh, intersect will give you a unique set of rows from both queries and except will only give you a unique set of rows from the first query. First is going to be an air quotes there.
Uh, because, uh, like, you know, we’re just, we’re thinking about the queries as written in order. So the first one that you do, then the except, then the other thing, but then you could put other accepts and intersects below that.
Really, we just, we care about like the first one with except. Um, but the, the set of rows that you get back from them is going to be uniqueified.
So, uh, these are operators that work, uh, somewhat better or reason, not somewhat better.
I would say, uh, somewhere, somewhere between somewhat and profoundly better. If you have primary keys or unique indexes defined on the columns that you’re looking at, or at least one of the columns that you’re comparing there.
Uh, but I think probably the best part about them is that they handle null comparisons, uh, without a lot of crazy syntax, like we saw above with the ands, ors, and the is null coalesce and blah, blah, blah, blah, blah.
Um, that what is tricky about them is of course, figuring out when you should use them and what order to write things in. Often this takes quite a bit of experimentation to get, to start to get a real handle on.
And if you don’t use them for a few days, you will probably lose that handle entirely and have to come back and start from scratch. That’s sort of like when I have to write any XML query, I’m like, Oh God.
And I have to go reference everything that I’ve ever done before to figure out what I need to do now. But, uh, just to give you a couple examples of what they do, uh, let’s have, uh, let’s just do this.
And let’s say we want to intersect, uh, everything with the score over two with everything with the score over three. Uh, and we, we, we get the results we want.
Down here, we get about 6,500 rows and the old, but the, all the results are going to be everything where there’s a score over four from this, from both of these queries, right?
So this one is looking for greater than two. This one is looking for greater than three. And where those two results set start to actually find matching rows is when we hit a score of four. So everything in here is going to have a score of four and up, right?
So pretty easy stuff there. That’s where these two results start to overlap at four. We have greater than two. We have greater than three. What’s after three, four. What’s a couple after two, four.
That’s where we start showing what’s going to come back. Um, and like I said, uh, the user ID column, which is nullable and has nulls and it does not give us any issues here. Um, so if you, if you are writing queries and you have, find yourself having to like, you know, that compare, uh, columns like this, uh, especially like across a number of columns, often intersect or accept would give you better performance and, uh, better sort of, uh, code clarity and everything else because they handle nulls without a lot of, uh, overly verbose, uh, you know, uh, querying or adding in functions that replace nulls with canary values.
Uh, so this is, uh, this is the same two queries this time just using except. And, uh, this is going to give us just results from the first query, right?
Just from this one, right? Because we want to see everything from here except what’s in here. So we don’t show anything at all from this query. We only show the stuff from this query. Now this is only going to show results again from the first query, big air quotes there.
Uh, you could also call it the left, the left most query or the outer query with a score of three, because that’s the only data that exists in it. That’s also in the second query or the inner queer, right?
Or that’s, that’s not also in this one. So this we’re saying greater than three, this is greater than four. So where these start to collide is just at score three, right?
We’re only showing the threes in here. Um, so it is a bit like using not exists and that the rows are only checked from the second, from the, from this query. They’re not, we don’t project anything out from this one.
Uh, and again, the nulls are handled quite well. Uh, SQL Server 2022, uh, wow.
They, they, they really modernized T SQL with this one. Um, they added the, uh, is distinct from and is not distinct from syntax to SQL Server 2022. And I suppose at this point, we could be happy that they showed up at all because is distinct from was introduced in 1999.
99. This thing was old enough to drink before Microsoft got around to adding this in. And, uh, actually this one, this, this one was too.
Okay. I guess this would have been the funnier joke. Uh, this one was added in 2003. Well, I guess in 2022, it wasn’t quite old enough to drink. I guess it depends on where you live. If you’re, if you’re in those devil may care European countries where teenagers just get sloshed in the streets.
Well, my, actually that sounds kind of good to me. Uh, that’s what I did.
And I, I just had to find sneaky ways to do it because I didn’t want to get busted by the federal allies. Uh, so I mean, I understand no, no database really adheres like perfectly to ANSI standards, but waiting like 20 years to get, uh, basic syntax added to T SQL is, man, it is the, the choices Microsoft makes wild when it comes to this stuff.
Uh, I realized that like, you know, people aren’t exactly clamoring for like some of these things. things, but just like basic stock functionality where like, if, if you wanted to, you know, bring an application from, uh, like Postgres or Oracle or DB2 and put it in SQL Server and like, like, you had queries with this stuff written in there and you were like, Oh, that’s broken.
How do we rewrite it? It’s like, great. Awesome. Right. Good stuff. Good stuff.
Good stuff. Microsoft. We really appreciate all the wonderful things you’ve added to the product in the meantime that have changed everyone’s lives. So anyway, they are, the, the, the, the is distinct from, it is not distinct from what will be useful to someone someday when they start.
I don’t know. At this point, the SQL Server 2025 is probably going to be where people go. Not a lot of people went to SQL Server 2022. Apparently everyone was pretty happy with 2019.
Uh, I don’t know. I don’t, I don’t know fully what 2025 is going to bring us. Uh, there was just the announcements that it, it ignited fairly recently, which were, uh, I, I suppose, whelming at best.
Uh, of course, there was a lot of talk about, uh, fabric and AI, which, um, are two stupid things that, uh, of course being stapled onto, into SQL Server or SQL servers being stapled into them.
I don’t know. Uh, but there, there wasn’t a lot of, uh, wasn’t, it wasn’t a very good highlight reel about what’s actually in SQL Server 2025. Aside from stupid current hype cycle memes that are being shoved down all of our throats with, uh, very little care.
Anyway, uh, let’s look at how you can use is, is not distinct from, uh, I don’t, I don’t know if I have an is distinct from example here, but, uh, let’s say that, you know, one of my least favorite things to see in, in, in a join is an or clause.
All right. Something like this. Yuck, grotesque, awful, you do this sort of thing. You should have your keyboard removed from your hands, placed elsewhere.
Uh, but let’s run these two queries. And do, do, do, do, do, do, Yep.
Okay. Uh, there we go. They finally finished. So, uh, this first query, uh, took 6.2 seconds. This is the bad query with the or clause in it.
Uh, and it has all of the hallmarks. I’m going to make this a little bit bigger so we can see both the queries on, both the query plans on one screen here. Uh, this, this, this has all the hallmarks of a join with an or clause.
Uh, uh, we start off scanning the, well, I mean, you’re not always going to see a scan. You, you could technically see a seek here if we had a where clause on the user’s table, but we take all 2.4, some odd million rows from the user’s table.
We feed them through the constant scan operator twice. Notice that we have two sets, two full sets of rows from the user’s table. We have, um, uh, one set to find, uh, u dot ID, uh, in here.
So that’s, that’s quite a bit of not fun. And then down in here, we seek into, we, well, we do a bunch of work, right? We try to collapse all this stuff into here and we can catenate them and we sort them and we merge interval them to remove duplicates.
And then we spend time in here doing a nested loops join to the post table over and over again. And then finally 6.2 seconds later, we come up with a result set.
Hooray, hooray, hooray. We did it. Uh, this query down here, now granted, you know, there, there might be some indexing stuff we could do that would make this a little bit happier, but, uh, this takes about two and a half seconds, a pretty good improvement from six seconds, right?
Maybe, maybe not where I would stop if I were, you know, if I were being paid to tune this query, but I would, I would at least say, Hey, we could just do a little rewrite and get this in a better place. So that’s, that’s cool there. So I am excited for, to be able to start using these.
Uh, there’s just not a lot, a lot of opportunity for that yet. Um, you know, uh, I, I, I think that, um, T SQL additions like this should get back to, backported because like, they’re, they’re important, but, um, what do I know about backporting?
I’m just a, just a guy who writes door procedures. Uh, you could, there’s also, you could also read, it’s in the spirit of intersect and accept. We could also rewrite the queries in this way.
Now, uh, uh, that what’s, what’s fun about when I say fun, again, lots of air quotes in this, this, this video.
Uh, what’s, what’s interesting about query tuning these days is, um, the amount of things that may or may not kick in depending on how SQL Server feels at the time. Uh, things like batch mode on rowstore are, unless you force the issue with like, you know, some sort of, uh, column story thing, uh, it is entirely based on heuristics.
And so you can end up with like just strange performance differences, like for two queries that are, do they do sort of equivalent things like the join with the or clauses just get out. But if we were to run these two queries we’re using is not distinct from here.
And we’re using sort of the point of the video intersect here. All right. And if we look at these two query plans, this one runs for 2.6 seconds, just like above.
And this one runs for six, 600 milliseconds, which is quite different from above, right? Like the intersect query went faster. The intersect query went faster because SQL Server naturally chose batch mode on rowstore for this, for this query plan.
We have the batch there. We have the batch here. We have the batch here. Uh, I don’t think anything down here is eligible for the batch, except maybe reading from some of these, but, uh, we go, so we got, we got the batch up here when we read from the post table as well.
So SQL Server was like, yeah, batch mode on rowstore. Sounds great. Did not choose that here. Uh, and like I said, we could, we could force the issue, um, running, uh, this query and doing a funny little left join to a empty table with a columnstore index on it.
And if we look at the execution plan, now that we’ve got some batch mode in here, this one is all of a sudden competitive with the, uh, the intersect version up there.
So like I said, query tuning these days, real fun. Cause who knows what tiny little change you can make that will awaken SQL Server’s senses and say, oh yes, we should use batch mode.
We’re doing stuff with a lot of rows here. That would be smart. Um, so, you know, uh, keep an eye out for these things. Uh, I don’t really know where else to go with that.
Um, it’s a good, it’s a good time. It’s a real good time. Uh, so, uh, that’s about it for intersect and accept.
Um, I’m going to wrap this one up and I’m going to apparently talk about views versus CTE, which will be a rollicking ride. Uh, we’ll do that, I guess, as soon as this uploads.
So, we, we, see you then. Thank you for watching. I hope you enjoyed yourselves. I hope you learned something. I hope that you will expand your SQL vocabulary and start using intersect and accept. And I hope that if you, as soon as you are able to, you’ll start using is distinct from, and is not distinct from, as they become linguistically available to you.
Whatever SQL Server version or edition you end up on next. Thank you for watching. there’s a bright at heart and começa the людocker option, I wonder if or not could help you know why. A pastidade or assistant speakers,
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.
ANY & ALL next ?
Definitely not.