Indexing SQL Server Queries For Performance: Fixing Predicate Key Lookups

Make Things Fast


Our very lofty goal as query tuners is to make the time from request to result shorter. Sometimes that requires a bit of sacrifice on the server side:

  • Queries going parallel, using more CPU to reduce wall clock time
  • Queries using more memory to avoid spilling to disk
  • Queries using space in tempdb via a #temp table to store results
  • Queries having better indexes available to them for searching

Just about every query tuning technique requires sacrificing some resource on the server (and sometimes more than one resource) to achieve our very lofty goal.

The only exclusion is fixing some miserably-written query to bring it back to sensibility. That requires only a sacrifice on your side.

Indexes are not free rides, of course, though much of the cost attributed to them is overblown. I’m not here to give you license to add 100 indexes to a table, but I am giving you permission to judge indexes by quality (read vs write usage) over quantity.

Anyway, let’s look at some of my favorite index design patterns when I’m tuning queries for clients.

Many of the issues are going to contextually framed by issues with parameter sensitivity in order to exacerbate them. You may see similar bad judgement without parameter sensitivity, of course. Either the optimizer screws up some cardinality estimation, or you do something that screws up cardinality estimation for the optimizer. Common causes for that are local variables, table variables, non-SARGable predicates, overly complicated queries, etc.

I don’t write queries that do stupid things, except to show you when things are stupid.

In this post, I’m going to cover the fixing predicate key lookups! Because why not? We’re already here. Stuck together.

Forever.

Key Lookups In General


Key lookups represent a choice made by the optimizer between:

  • Clustered index scan
  • Nonclustered index seek/scan + Clustered index lookup

Lookups can be quite useful to avoid creating quite wide nonclustered index, but the optimizer is heavily biased against them. It does not like random I/O.

Typically lookups get chosen when a relatively small number of rows are expected to be retrieved from the nonclustered index.

Bad estimates in general, and self-inflicted bad estimates from the list above, can contribute to lookups being chosen inappropriately.

Also like I said above, parameter sensitivity is a very easy way to demonstrate the problem.

Before we go on though, let’s talk about the two things lookups can be assigned to do:

  • Retrieve columns from the clustered index that aren’t part of the nonclustered index definition
  • Evaluate predicates in the clustered index that aren’t part of the nonclustered index definition

Often, fixing lookups only to avoid retrieving columns is a last resort for me, because it can mean greatly expanding the number of columns included in a nonclustered index.

But fixing lookups that have predicates in them is quite appealing to me, because a good index should support locating and filtering as much data is possible for important queries.

Okay, now we can go on.

Predicate Key Lookups: Good


Let’s start with this index:

CREATE INDEX
    p
ON dbo.Posts
    (Score, CreationDate, PostTypeId)
WITH
    (SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);
GO

And let’s start with this procedure:

CREATE OR ALTER PROCEDURE
    dbo.PredicateKeyLookup
(
    @Score integer,
    @CreationDate datetime,
    @PostTypeId integer,
    @OwnerUserId integer
)
AS
BEGIN
    SET NOCOUNT, XACT_ABORT ON;
   
    SELECT TOP (100)
        p.*
    FROM dbo.Posts AS p
    WHERE p.Score >= @Score
    AND   p.CreationDate >= @CreationDate
    AND   p.PostTypeId = @PostTypeId
    AND   p.OwnerUserId = @OwnerUserId
    ORDER BY
        p.Score DESC,
        p.CreationDate DESC,
        p.Id DESC;
END;
GO

I know, selecting star is a bad practice, but I’m going to share something shocking with you (and I’ll blog about this later, too): Lookups are assigned the same optimizer cost unit, whether you select one column, or 1000 columns.

The thing to note right here, is that the stored procedure filters on all the columns in our index, plus it has an additional filter on the OwnerUserId column. That (along with the select list columns) is the work our lookup will have to do.

Starting with a query that selects a small number of rows:

EXEC dbo.PredicateKeyLookup
    @Score = 0,
    @CreationDate = '20131101',
    @PostTypeId = 1,
    @OwnerUserId = 39677;

The query plan looks like this, and is very fast. Yay. We smart.

sql server query plan
easy lookup

Zooming in on the lookup a bit, here’s what it’s doing:

sql server query plan

  1. The clustered primary key on the Posts table is Id, which the nonclustered index inherits as a final key column in the nonclustered index (seek predicate), so this is how we match rows across the two
  2. The clustered index has all these columns in it that are not part of the nonclustered index definition, so we can retrieve any rows matched via the seek predicate
  3. The clustered index has OwnerUserId in it as well, which is also not part of the nonclustered index definition, so we can additionally filter any rows matched via the seek predicate

Wordy, I know, for bullet points, but the goal of these posts is to describe things to developers who may have never encountered such things.

Predicate Key Lookups: Bad


If we change our execution parameters to ones that are not selective at all, we’ll use the same execution plan, but speed will get a bit worse.

EXEC dbo.PredicateKeyLookup
    @Score = 0,
    @CreationDate = '20080101',
    @PostTypeId = 2,
    @OwnerUserId = 22656;

To run down what changed:

  • CreationDate will fetch several more years of posts (2008 vs 2013)
  • We’re looking for answers (PostTypeId 2) from questions (PostTypeId 1)
  • The OwnerUserId is now a much more active user on the site

Now, rather than finishing in ~300ms, the query takes ~3.5 seconds.

sql server query plan
degrading

The majority of the time is spent looping and looking up, which is not what we want at all. Again, indexes should help us find and filter as much data as possible in one place.

You can think of each index as a separate data source. If you need two data sources on one table to filter things, you’re leaving yourself open to performance issues.

Predicate Key Lookups: Fixing


In this case, we can fix things well-enough by changing the index to this:

CREATE INDEX
    p
ON dbo.Posts
    (Score, CreationDate, PostTypeId, OwnerUserId)
WITH
    (SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE, DROP_EXISTING = ON);
GO

Now, either execution, in either execution order, is fast (enough).

sql server query plan
good enough plan

Now, for sure, with enough domain knowledge, and enough understanding of the queries, we could do things differently.

  • If we know most/all queries will filter on OwnerUserId/PostTypeId
  • We’re allowed to create a completely new index to help this query if it’s important enough
CREATE INDEX
    p
ON dbo.Posts
    (OwnerUserId, PostTypeId, Score, CreationDate)
WITH
    (SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE, DROP_EXISTING = ON);
GO

In this index, OwnerUserId and PostTypeId are the leading columns. The way b-tree indexes work, equality predicates on leading column(s) maintain sorting of any following columns.

Since we’re ordering by Score and CreationDate, this is important (for this query, anyway). Now, both plans look like this.

sql server query plan
faster!

Predicate Key Lookups: Thoughts


Often, fixing a predicate key lookup by adding the filtering column to the nonclustered index your query is already using is good enough.

Other times, to squeeze every last bit of performance out of things, we may need to make additional adjustments. Changing existing indexes in a way that rearranges key column order can be perilous.

While the query you’re working on may work way better, others may not be so happy with your tinkering. Again, this is where you need additional domain knowledge.

If you find yourself wanting to really change things like that, you may be safer creating a new index instead.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.



3 thoughts on “Indexing SQL Server Queries For Performance: Fixing Predicate Key Lookups

  1. Thanks for the blog post. This was very useful even after reading dozens of other blogs on the same subject.

Comments are closed.