The How To Write SQL Server Queries Correctly Cheat Sheet: INTERSECT And EXCEPT

Who Am I Kidding?

I have never once seen anyone use these. The most glaring issue with them is that unlike a lot of other directives in SQL, these ones just don’t do a good job of telling you what they do, and their behavior is sort of weird.

Unlike EXISTS and NOT EXISTS, which state their case very plainly, as do UNION and UNION ALL, figuring these out is not the most straightforward thing. Especially since INTERSECT has operator precedence rules that many other directives do not.

INTERSECT gives you a set of unique rows from both queries
EXCEPT gives you a set of unique rows from the “first” query

So, cool, if you know you want a unique set of rows from somewhere, these are good places to start.

What’s better, is that they handle NULL values without a lot of overly-protective syntax with ISNULL, COALESCE, or expansive and confusing OR logic.

The tricky part is spotting when you should use these things, and how to write a query that makes the most of them.

And in what order.

Easy Examples

Often the best way to get a feel for how things work is to run simple queries and test the results vs. your expectations, whatever they may be.

I like these queries, because the UserId column in the Comments table is not only NULLable, but contains actual NULLs. Wild, right?

SELECT
    c.*
FROM dbo.Comments AS c
WHERE c.UserId IS NULL
AND   c.Score > 2

INTERSECT

SELECT
    c.*
FROM dbo.Comments AS c
WHERE c.UserId IS NULL
AND   c.Score > 3
ORDER BY
    c.Score;

Running this will return results where a Comment’s Score is greater than 3 only, because that’s the starting point for where both query results begin to match results across all the columns.

Note that the UserId column being NULL doesn’t pose any problems here, and doesn’t require any special handling. Like I said. And will keep saying. Please remember what I say, I beg of you.

Moving on to EXCEPT:

SELECT
    c.*
FROM dbo.Comments AS c
WHERE c.UserId IS NULL
AND   c.Score > 2

EXCEPT

SELECT
    c.*
FROM dbo.Comments AS c
WHERE c.UserId IS NULL
AND   c.Score > 3
ORDER BY
    c.Score;

This will only return results from the “first” query (often referred to as the left, or outer query) with a Score of 3, because that’s the only data that exists in it that isn’t also in the “second” (or right, or inner) query.

Both queries will find many of the same rows after Score hits 2 or 3, but those get filtered out to show only the difference(s) between the two.

In case it wasn’t obvious, it’s a bit like using NOT EXISTS, in that rows are only checked, and not projected from the second/right/inner query, looking for Scores greater than 3.

Again, NULLs in the UserId column are handled just fine. No ISNULL/COALESCE/OR gymnastics required.

I’m really trying to drive this home, here.

In The Year 2000

SQL Server 2022 introduced some “modern” ANSI syntax. By modern, I mean that IS DISTINCT FROM was introduced to the standard in 1999, and IS NOT DISTINCT FROM was introduced in 2003.

While no database platform adheres strictly or urgently to ANSI standards, waiting 20 years for an implementation in SQL Server is kind of really-extra-super-duper son-of-a-gun boy-howdy dag-nabbit-buster alright-bucko hold-your-horses listen-here-pal levels of irritating.

Think of all the useless, deprecated, and retired things we’ve gotten in the past 20 years instead of basic functionality. It’s infinitely miffing.

Anyway, I like these additions quite a lot. In many ways, these are extensions of INTERSECT and EXCEPT, because the workarounds involved for them involved those very directives. Sort of like Microsoft finally adding GREATEST and LEAST, after decades of developers wondering just what the hell to do instead, I hope they didn’t show up too late to keep SQL Server from being bullied by developers who are used to other platforms.

We can finally start to replace mutton-headed, ill-performing syntax like this:

SELECT
    c = COUNT_BIG(*)
FROM dbo.Users AS u
JOIN dbo.Posts AS p
  ON u.Id = p.LastEditorUserId
  OR (p.LastEditorUserId IS NULL);

With stuff that doesn’t suck, like this:

SELECT
    c = COUNT_BIG(*)
FROM dbo.Users AS u
JOIN dbo.Posts AS p
  ON u.Id IS NOT DISTINCT FROM p.LastEditorUserId;

The query plan timings tell enough of a story here:

sql server query plan — i’ve been waiting for so long

But not everyone is able to use the latest and greatest (or least and greatest, ha ha ha) syntax. And the newest syntax isn’t always better for performance, without additional tweaks.

And that’s okay with me. I do performance tuning for a living, and my job is to know all the available options and test them.

Like here. Like now.

The Only One I Know

Let’s compare these two queries. It’ll be fun, and if you don’t think it’s fun, that’s why you’ll pay me. Hopefully.

SELECT
    c = COUNT_BIG(*)
FROM dbo.Users AS u
JOIN dbo.Posts AS p
  ON u.Id IS NOT DISTINCT FROM p.LastEditorUserId;

SELECT
    c = COUNT_BIG(*)
FROM dbo.Users AS u
JOIN dbo.Posts AS p
  ON u.Id = p.LastEditorUserId
WHERE EXISTS
(
    SELECT p.LastEditorUserId FROM dbo.Posts AS p
    INTERSECT
    SELECT u.Id FROM dbo.Users AS u
);

Here’s the supporting index that I have for these queries:

CREATE INDEX 
    LastEditorUserId 
ON dbo.Posts
    (LastEditorUserId)
WITH
    (SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);

It’s good enough. That’s what counts, I guess. Showing up.

20th Century Boy

At first glance, many queries may appear to be quite astoundingly better. SQL Server has many tricks up its sleeves in newer versions, assuming that you’re ready to embrace higher compatibility levels, and pay through the nose for Enterprise Edition.

This is a great example. Looking at the final query timing, you might think that the new IS [NOT] DISTINCT FROM syntax is a real dumb dumb head.

But unless you’re invested in examining these types of things, you’ll miss subtle query plan difference, which is why you’ll pay me, hopefully,

The second query receives the blessing of Batch Mode On Row Store, while the first does not. If we use the a helper object to get them both functioning on even terms, performance is quite close:

SELECT
    c = COUNT_BIG(*)
FROM dbo.Users AS u
JOIN dbo.Posts AS p
  ON u.Id IS NOT DISTINCT FROM p.LastEditorUserId 
LEFT JOIN dbo.columnstore_helper AS ch 
  ON 1 = 0;

In this case, the slightly tweaked query just slightly edges out the older version of writing the query.

I Can’t Imagine The World Without Me

There are many ways to write a query, and examine the performance characteristics. As SQL Server adds more options, syntax, capabilities, and considerations, testing and judging them all (especially with various indexing strategies) becomes quite an endeavor.

I don’t blame developers for being unaware or, or unable to test a variety of different rewrites and scenarios. The level of understanding that it takes to tune many queries extends quite beyond common knowledge or sense.

The aim of these posts is to give developers a wider array of techniques, and a better understanding of what works and why, while exposing them to newer options available as upgrade cycles march bravely into the future.

Keeping up with SQL Server isn’t exactly a full time job. Things are changed and added from release to release, which are years apart.

But quite often I find companies full of people struggling to understand basic concepts, troubleshooting, and remediations that are nearly as old as patches for Y2K bugs.

My rates are reasonable, etc.

Thanks for reading!

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.

16 thoughts on “The How To Write SQL Server Queries Correctly Cheat Sheet: INTERSECT And EXCEPT”

Henrik Staun Poulsen says:

March 27, 2024 at 12:26 pm

I have used INTERSECT and EXCEPT in once script only:
https://stackoverflow.com/questions/4602083/sql-how-to-compare-two-tables-for-same-data-content

But there it really shines, on how it compares null with null
1. Erik Darling says:
  
  March 27, 2024 at 12:53 pm
  
  Yeah, it’s really good for that kind of stuff. I just wish it were more obvious that it’s really good that that kind of stuff.
happydba says:

March 27, 2024 at 1:37 pm

“Like here. Like now.”

Thanks now I have Fatboy Slim stuck in my head 😀
1. Erik Darling says:
  
  March 27, 2024 at 1:53 pm
  
  Oh that’s funny, I wasn’t even thinking about that when I wrote it.
2. DW says:
  
  April 2, 2024 at 7:43 pm
  
  I got Jesus Jones stuck in my head. Didn’t know of the Fatboy Slim tune.
Pingback: Working with INTERSECT and EXCEPT – Curated SQL
Paul Hunter says:

March 31, 2024 at 1:36 pm

I often use except & intersect as part of our MDM process during a db release.
Select columns from source
except
Select columns from target

Yields the data we want to change

Invert target & source and you have new data.
1. Erik Darling says:
  
  April 1, 2024 at 9:55 am
  
  Yeah, that’s a great use for them. I suppose in the grand scheme of things, it’s a bit harder to find other recreational uses for them.
Doug says:

April 1, 2024 at 1:39 pm

Best use of EXCEPT is to QA a change:
SELECT * FROM TABLE1
EXCEPT
SELECT * FROM TABLE2
UNION
SELECT * FROM TABLE2
EXCEPT
SELECT 8 FROM TABLE1

May take a bit, but way better than manual QA 🙂
1. Erik Darling says:
  
  April 1, 2024 at 2:09 pm
  
  Ah, the ol’ “select 8” trick 😂
sdmcnitt says:

April 1, 2024 at 2:57 pm

Should the example explanation for INTERSECT be “Comment’s Score is greater than 3 only…”
1. Erik Darling says:
  
  April 1, 2024 at 9:56 pm
  
  Hah, yep! Good catch, thanks!
Kim Crosser says:

April 1, 2024 at 3:47 pm

I use EXCEPT all the time when doing data conversion/migration tasks, or simply when testing new code that affects multiple rows.
I create test table(s) as exact copies of the original master tables, run the process to do the multiple record updates, and then use EXCEPT (in both directions!) to verify that only the expected rows were altered.
The biggest challenge in using these is dealing with Identity columns. Those usually have to be excluded, or you have to be careful when setting up the initial conditions (copy with SET IDENTITY_INSERT ON, and make sure the Identity column next value matches the source table column).
INTERSECT can be used in a similar manner to check that the result table rows that should be “unchanged” are the same in both tables. Again, I test “select * from a intersect select * from b” as well as the reverse.
1. Erik Darling says:
  
  April 1, 2024 at 9:57 pm
  
  Ooh, examples like that would be great to see in a blog post on your site. I’ve never done that before.
2. Nick Ryan says:
  
  April 4, 2024 at 10:53 pm
  
  Yeah, me too on the use of EXCEPT. Discovering it for that job saved me a huge amount of typing.
Stephen Morris says:

April 2, 2024 at 4:30 am

Great for ETL
Itzik showed me this cool trick
If you’re super cautious could be rewritten as separate INSERTs and UPDATES to avoid the MERGE

MERGE INTO dbo.Customers AS TGT
USING dbo.CustomersStage AS SRC
ON TGT.custid = SRC.custid
WHEN MATCHED AND EXISTS ( SELECT SRC.* EXCEPT SELECT TGT.* ) THEN
UPDATE SET
TGT.companyname = SRC.companyname,
TGT.phone = SRC.phone,
TGT.address = SRC.address
WHEN NOT MATCHED THEN
INSERT (custid, companyname, phone, address)
VALUES (SRC.custid, SRC.companyname, SRC.phone, SRC.address);