Why Comparing Columns Doesn't Always Seek In SQL Server

Adjoining

When writing queries, sometimes you have the pleasure of being able to pass a literal value, parameter, or scalar expression as a predicate.

With a suitable index in place, any one of them can seek appropriately to the row(s) you care about.

But what about when you need to compare the contents of one column to another?

It gets a little bit more complicated.

All About Algorithms

Take this query to start with, joining Users to Posts.

SELECT
    c = COUNT_BIG(*)
FROM dbo.Users AS u
JOIN dbo.Posts AS p
    ON p.OwnerUserId = u.Id;

The OwnerUserId column doesn’t have an index on it, but the Id column on Users it the primary key and the clustered index.

But the type of join that’s chosen is Hash, and since there’s no where clause, there’s no predicate to apply to either table for filtering.

This is complicated slightly by the Bitmap, which is created on the OwnerUserId column from the Posts table and applied to the Id column from the Users table as an early filter.

However, it’s a useless Bitmap, and Bitmaps don’t really seek anyway.

The same pattern can generally be observed with Merge Joins. Where things are a bit different is with Nested Loops.

Shoop Da Loop

If we use a query hint, we can see what would happen with a Nested Loops Join.

SELECT
    c = COUNT_BIG(*)
FROM dbo.Users AS u
JOIN dbo.Posts AS p
    ON p.OwnerUserId = u.Id
OPTION(LOOP JOIN);

The plan looks like this now, with a Seek on the Users table.

The reason is that this flavor of Nested Loops, known as Apply Nested Loops, takes each row from the outer input and uses it as a scalar operator on the inner input.

An example of Regular Joe Nested Loops™ looks like this:

SELECT
    c = COUNT_BIG(*)
FROM dbo.Users AS u
WHERE u.Reputation = 
(
    SELECT
        MIN(p.Score)
    FROM dbo.Posts AS p
);

Where the predicate is applied at the Nested Loops operator:

Like most things, indexing is key, but there are limits.

Innermost

Let’s create this index:

CREATE INDEX ud ON dbo.Users(UpVotes, DownVotes);

And run this query:

SELECT
    c = COUNT_BIG(*)
FROM dbo.Users AS u
WHERE u.UpVotes = u.DownVotes;

The resulting query plan looks like this:

But what other choice is there? If we want a seek, we need a particular thing or things to seek to.

SELECT
    c = COUNT_BIG(*)
FROM dbo.Users AS u
WHERE u.UpVotes = u.DownVotes
AND   u.UpVotes = 1;

We seek to everyone with an UpVote of 1, and then somewhat awkwardly search the DownVotes column for values >= 1 and <= 1.

But again, these are specific values we can search for.

Thanks for reading!

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

One thought on “Why Comparing Columns Doesn’t Always Seek In SQL Server”

Stefano Torelli says:

June 22, 2021 at 2:45 pm

That involves more complexity to the query optimizer: in few ms it should be able to help producing a “good estimated” query.
That is because the prior requirement: spend less time as mach as you can, because the client is waiting for results.
Interesting idea… what about a post execution re-optimization occurring when there is less work to do (or a separate process having this job)?

Comments are closed.