Indexing Archives | Page 13 of 21

In SQL Server, Lookups Are Just Nested Loops Joins With No Choices

Posted on July 2, 2020May 16, 2022 by Erik Darling

Such Optimize

At this point in your life, you’ve probably seen, and perhaps even struggled with how to fix a key lookup that was causing you some grief.

This post isn’t going to go terribly deep into anything, but I do want to make a few things about them more clear, because I don’t usually see them mentioned anywhere.

Lookups are joins between two indexes on the same table
Lookups can only be done via nested loops joins
Lookups can’t be moved around in the execution plan

I don’t want you to think that every lookup is bad and needs to be fixed, but I do want you to understand some of the limitations around optimizing them.

The Join

When you see a lookup in an execution plan, it’s natural to focus on just what the lookup is doing.

But there’s something else lurking in here, too.

SQL Server Query Plan — but you say he’s just a join

That nested loops join is what’s bringing the data from a nonclustered index to the data in a clustered index (or heap, but but whatever).

For every row that comes out of the index seek on the nonclustered index, we go back to the clustered index to find whatever data is missing from it in the clustered index. It could be columns in the select list, where clause, or both.

Much like index union or index intersection, but much more common. For a table with a clustered index, the join condition will be on the clustered index key column(s), because in SQL Server, nonclustered indexes inherit clustered index key columns. For heaps, it’ll be on the row identifier (RID).

You can most often see that by looking at the tool tip for the Lookup, under Seek Predicates.

The Loop

At this point, SQL Server’s optimizer can’t use merge or hash joins to implement a lookup.

It can only use nested loops joins.

That’s a pretty big part of why they can be so tricky in plans with parameter sniffing issues. At some point, the number of loops you can end up doing is far more work than just scanning as clustered index all in one shot.

There’s also no “adaptive join” component to them, where SQL Server can bail on a loop join after so many executions and use a scan instead. Maybe someday, but for now this isn’t anything that intelligent query processing touches.

They can look especially off in Star Join plans sometimes, where it’s difficult to figure out why the optimizer went with the lookup for many more rows than what people often call the “tipping point” between lookups and clustered index scans.

The Glue

Another pesky issue with lookups is that the optimizer doesn’t currently support moving the join between the two indexes around at all.

You can get this behavior on your own by rewriting the lookup as a self join (which is all a lookup really is anyway — a self join that the optimizer chose for you).

For instance, here are two query plans. The first one is where the optimizer chose a lookup plan. The second is one where I wrote the query to self join the Users table to itself.

The thing to understand here is that when there’s a lookup in a query plan, it is inseparably coupled.

When you write queries as self joins, the optimizer has many more choices available to it as far as join order, join type, and all the other usual steps that it can take during optimization. A simplified example of doing that (not related to the query plans above), would look like this:

CREATE INDEX joan_jett
ON dbo.Posts
(
    PostTypeId, Score
);

                                 /* Not In The Index */
SELECT p.Id, p.PostTypeId, p.Score, p.CreationDate
FROM dbo.Posts AS p
WHERE p.PostTypeId = 7
AND   p.Score = 0
AND   p.OwnerUserId = -1;
       /* Not In The Index*/



                                    /* From p2 */
SELECT p.Id, p.PostTypeId, p.Score, p2.CreationDate
FROM dbo.Posts AS p
JOIN dbo.Posts AS p2 --Self join
    ON p2.Id = p.Id
WHERE p.PostTypeId = 7
AND   p.Score = 0
AND   p2.OwnerUserId = -1;
      /* From p2 */

The index is only on PostTypeId and Score, which means the CreationDate and OwnerUserId columns need to come from somewhere.

Probably more interesting is the second query. The Posts table is joined to itself on the Id column, which is the primary key and clustered index (for style points, I suppose), and the columns not present in the nonclustered index are selected from the “p2” alias of the Posts table.

AND BASICALLY

Sometimes I take these thing for granted, because I learned them a long time ago. Or at least what seems like a long time ago.

But they’re things I end up talking with clients about frequently, and sometimes even though they’re not optimizer oddities they’re good posts to write.

Hopefully they’re also good posts for reading, too.

Thanks for doing that.

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

SQL Server’s Optimizer Could Be Smarter About Constraints

Posted on June 10, 2020May 16, 2022 by Erik Darling

Couldn’t We All

Let me ask you a question: If I told you that all the numbers in an integer column were either:

> 0
>= 1

You’d probably agree that the lower number you could possible see is 1.

And that’s exactly the case with the Reputation column in Stack Overflow.

Non-Alignment Pact

Assume that I am being truthful about creating this index:

CREATE INDEX constraints_are_silly
ON dbo.Users
(
Reputation,
UpVotes
) INCLUDE (DisplayName);

Also assume that this is the most important query ever written by human hands for the benefit of humanity:

SELECT TOP (1000)
u.Reputation,
u.UpVotes,
u.DisplayName
FROM dbo.Users AS u
WHERE u.Reputation <= 1
ORDER BY u.UpVotes;

However, I’m dissatisfied with the query plan. This requires no assumption. It’s a bit much for what I’m asking, I think.

Since a current implementation rule for the database is that no one can have a Reputation of 0 or less, I add this constraint hoping that SQL Server will see this and stop sorting data, because it knows that 1 is the lowest integer it will find, and the order of UpVotes won’t reset for Reputation = 0.

ALTER TABLE dbo.Users ADD CONSTRAINT checko CHECK (Reputation > 0);

But I still end up with the same execution plan. In neither case is the plan a) trivial, or b) simple parameterized. We can’t blame the optimizer trying to be helpful.

Now assume that I get really mad and change my constraint. This requires minimal assumption.

ALTER TABLE dbo.Users ADD CONSTRAINT checko CHECK (Reputation >= 1);

And now I get a query plan that does not have a sort in it. My approval does not require assumption.

Why does one constraint remove the need to sort, and one not?

Over My Head

The answer is in the query plan. Sometimes I have to be reminded to look at these.

The Seek Predicate on the left is from when we defined the constraint as > 0. It has a <= 1 predicate.
The Seek Predicate on the right is an equality on = 1

For a little more detail, I asked a question. Apparently the optimizer… doesn’t consider data types here.

Thanks for reading!

Going Further

Enabling Optimize For Sequential Key For A SQL Server Index

Posted on June 9, 2020May 16, 2022 by Erik Darling

Sweetness, Shortness

Recently I blogged about last page contention solutions, and one thing I mentioned is that turning on OPTIMIZE_FOR_SEQUENTIAL_KEY doesn’t require you to rebuild an index. That’s awesome, because a whole lot of changes to indexes require you to rebuild them.

So how exactly do you do that?

Either when you create the table:

CREATE TABLE dbo.Votes_Insert
(
    Id INT IDENTITY(1, 1) NOT NULL,
    PostId INT NOT NULL,
    UserId INT NULL,
    BountyAmount INT NULL,
    VoteTypeId INT NOT NULL,
    CreationDate DATETIME NOT NULL,
    CONSTRAINT PK_Votes_Insert_Id
        PRIMARY KEY CLUSTERED (Id ASC) 
    WITH (OPTIMIZE_FOR_SEQUENTIAL_KEY  = ON) 
);

When you create the index:

CREATE INDEX so_optimized 
    ON dbo.Votes_Insert (Id) WITH (OPTIMIZE_FOR_SEQUENTIAL_KEY = ON);

Or if you need to alter an existing index:

ALTER INDEX so_optimized 
    ON dbo.Votes_Insert SET(OPTIMIZE_FOR_SEQUENTIAL_KEY  = ON);

Get Back To Work

You’ll find this post again in a few years when you finally migrate to SQL Server 2019.

Thanks for reading!

Going Further

Locking And Blocking Week: Recap, And How To Get More SQL Server Help

Posted on June 5, 2020May 16, 2022 by Erik Darling

Sonic Deducer

Going Further

Locking And Blocking Week: Troubleshooting Abnormal Blocking Issues

Posted on June 4, 2020May 16, 2022 by Erik Darling

Dead and Alive

Going Further

Locking And Blocking Week: Performance Tuning Transactions In SQL Server

Posted on June 3, 2020May 16, 2022 by Erik Darling

Beginning OF The Tran

Going Further

Locking And Blocking Week: Tuning Blocking With Indexes In SQL Server

Posted on June 2, 2020May 16, 2022 by Erik Darling

All This And More

Going Further

Locking And Blocking Week: Introduction To Locking And Blocking In SQL Server

Posted on June 1, 2020May 16, 2022 by Erik Darling

I Wanna Know What Lock Is

Going Further

SQL Server Index Choice and Batch Mode On Rowstore

Posted on May 22, 2020July 8, 2024 by Erik Darling

A/B Testing

Let’s look at a pretty simple query against the Votes table:

SELECT v.VoteTypeId, 
       MIN(v.CreationDate) AS min_creation_date,
       MAX(v.CreationDate) AS max_creation_date,
       COUNT_BIG(*) AS records
FROM dbo.Votes AS v
GROUP BY v.VoteTypeId
ORDER BY v.VoteTypeId;

There are only two columns involved:

VoteTypeId, which we’re grouping on
CreationDate, which we’re finding min and max values for

There are two different ways to approach this:

CREATE INDEX even_steven 
    ON dbo.Votes(VoteTypeId, CreationDate);

CREATE INDEX steven_even 
    ON dbo.Votes(CreationDate, VoteTypeId);

But first, let’s look at the query with just the clustered index, which is on the Id column, not mentioned at all in the query.

We’ll call this our baseline.

This takes 2 seconds.

Even Steven

With an index on VoteTypeId, CreationDate, what happens?

It’ll take 5 seconds no matter how many times I run it.

This might sound like a very good index though, because even though we don’t have a where clause looking for VoteTypeIds, we’re grouping by them.

Having CreationDate next in the index key should make it really easy to find a min and max value for each VoteTypeId, because CreationDate will be in order.

And you know what? That sort of works out. We get a Stream Aggregate in the plan without a Sort operator.

But it still sucks: Why?

Steven Even

With this index, we go right back to… What we had before.

We went through all the trouble of adding indexes, to have one be slower, and one not get us any faster than just using the clustered index.

What gives?

Teenage Angst

I’ve been avoiding something a little bit, dear reader. You see, I’m using SQL Server 2019.

The first plan and the third plan — the ones that finished in 2 seconds — they both used batch mode on rowstore. That’s an Enterprise Edition optimizer feature available in compat level 150.

If you were to run this on SQL Server 2017 or earlier, you would find no measurable difference between any one of these queries.

And look, batch mode on row store does represent a good improvement in many cases for large aggregation queries — the type of queries that would benefit from columnstore in general — and maybe in places where you’re unable to use columnstore today because you’re also using triggers, cursors, Replication, or another feature that it disagrees with.

If you suddenly saw a 60% improvement in some of your “big” queries, you’d probably be pretty happy. I’m not saying it comes for free, or that it’s a magickal world where everything is perfect for every query now.

You can only get that if you have PLAN GUIDES FOR EVERY QUERY. H ah aa ah ah ha a.

But let’s consider something else

It only kicked in when our indexes were lacking.

When we had a “good index”, SQL Server chose a plan with no batch-y mode-y at all.

If you’ve carefully crafted some indexes over the years, even a sure shot for the type of query you want to get some batch mode love may not see it.

Remember: It only gets used if your query is tall enough to ride.

Thanks for reading!

Going Further

SQL Server 2019 Disappointment: sp_estimate_data_compression_savings

Posted on May 7, 2020May 16, 2022 by Erik Darling

Missed It The First Time

This is documented, but I know a lot of people are allergic to documentation, whether it’s writing or reading.

In SQL Server 2019:

Exciting stuff: In memory tempdb!
Exciting stuff: sp_estimate_data_compression_savings can evaluate columnstore compression!
Disappointing stuff: If you use in memory tempdb, you can’t have any columnstore anything in tempdb

That means if you’re using sneaky tricks like clustered columnstore indexes on temp tables to induce batch mode, you’re gonna get a lot of errors.

Likewise, you won’t be able to evaluate if columnstore will help your tables.

EXEC sp_estimate_data_compression_savings 
     @schema_name = 'dbo',
     @object_name = 'Badges',
     @index_id = 1,
     @partition_number = 1,
     @data_compression = 'COLUMNSTORE';

Msg 11442, Level 16, State 1, Line 4
Columnstore index creation is not supported in tempdb when memory-optimized metadata mode is enabled.

There’s no workaround for this, either. You can’t tell it to use a different database, this is just the way it’s built.

Hopefully in the future, there will be more cooperation between these two features.

Thanks for reading!