Column Selectivity And SQL Server Index Design Patterns

The Memes On The Blogs Fall Mainly On The Aughs


Unless your column is unique, and defined as unique, and people are searching for equality predicates on it — and I don’t mean column = column — I mean column = value, it might not be a great first column in your index. Many unique columns I see are identity columns that don’t necessarily define a relationship or usable search values. They’re cool for keeping the clustered index sane, but no one’s looking at the values in them.

The problem with the advice that you should “always put the most selective column first” is that not many columns are uniformly selective. For some ranges, they may be selective, for other ranges, they may not be.

Let’s look at some examples.

Usery


Let’s look at some tables in the Stack Overflow data dump. I realize this isn’t a perfect data set, but it has a lot of things in common with data sets I see out in the world.

  • The site has gotten more popular over time, so year over year dates become less selective
  • The site has definite groups of “power users” and “one and done” users
  • Certain site activities are more common than others: votes cast, types of posts made
  • Certain user attributes, like badges, are more common than others

All of these patterns are generally observable in real world data, too. Growth is a near constant, and with growth is going to come lumpy patterns.

SQL Server Management Studio Query Results
Not like the others

Looking at significant number differences here, the top vote type (an upvote) has 37 million entries. The next most popular one has 3.7 million.

Are either of those selective? No.

But when you get down to the bottom, you reach some selectivity.

The dates the vote were cast become less selective over time, too.

SQL Server Management Studio Query Results
Tinder

Within User Reputations, things become skewed towards the bottom end.

Though the site gets more users overall, Reputation is still largely skewed towards power users.

SQL Server Management Studio Query Results
Coconut colored

What Does This Mean For You?


  • Don’t assume that just because you search for something with an equality that it’s the most selective predicate.
  • Don’t assume that any search will always be selective (unless the column is unique).
  • Don’t assume that the most selective predicate should always be the first column in an index; there are other query operations that should be considered as well

I can’t count the number of times that someone has told me something like “this query is fast, except when someone searches for X”, or “this query is fast, except when they ask for a year of data”, and the solution has been creating alternate indexes with key columns in a different order, or flipping current index key columns around.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

SQL Server’s Batch Mode On Row Store Is Fickle

Thanks Though


I’m excited about this feature. I’m not being negative, here. I just want you, dear reader, to have reasonable expectations about it.

This isn’t a post about it making a query slower, but I do have some demos of that happening. I want to show you an example of it not kicking in when it probably should. I’m going to use an Extended Events session that I first read about on Dmitry Pilugin’s blog here. It’ll look something like this.

CREATE EVENT SESSION heristix
    ON SERVER
    ADD EVENT sqlserver.batch_mode_heuristics
        ( ACTION( sqlserver.sql_text ))
    ADD TARGET package0.event_file
        ( SET filename = N'c:\temp\heristix' )
    WITH
        ( MAX_MEMORY = 4096KB,
          EVENT_RETENTION_MODE = ALLOW_SINGLE_EVENT_LOSS,
          MAX_DISPATCH_LATENCY = 1 SECONDS,
          MAX_EVENT_SIZE = 0KB,
          MEMORY_PARTITION_MODE = NONE,
          TRACK_CAUSALITY = OFF,
          STARTUP_STATE = ON );
GO

The Setup


Let’s start with some familiar indexes and a familiar query from other posts the last couple weeks.

CREATE INDEX something ON dbo.Posts(PostTypeId, Id, CreationDate);
CREATE INDEX stuffy ON dbo.Comments(PostId, Score, CreationDate);
SELECT COUNT(*)
FROM   dbo.Posts AS p
JOIN   dbo.Comments AS c
    ON  p.Id = c.PostId
WHERE DATEDIFF(YEAR, p.CreationDate, c.CreationDate) > 1
AND p.PostTypeId = 1
AND c.Score > 0;

The query plan is unimportant. It just doesn’t use any Batch Mode, and takes right about 2 seconds.

SQL Server Query Plan
Blech Mode

If we look at the entry for this query in our XE session, we can see that the optimizer considered the heck out of Batch Mode, but decided against it.

SQL Server Extended Events
All The Heck

Curiouser


If we add a hash join hint to the query, it finishes in about 800ms.

SELECT COUNT(*)
FROM   dbo.Posts AS p
JOIN   dbo.Comments AS c
    ON  p.Id = c.PostId
WHERE DATEDIFF(YEAR, p.CreationDate, c.CreationDate) > 1
AND p.PostTypeId = 1
AND c.Score > 0
OPTION(HASH JOIN);
SQL Server Query Plan
Every Time

All the operators in this plan except Gather Streams are run in Batch Mode. Clearly it was helpful.

SQL Server Extended Events
See Ghosts

And according to the XE session, we can see that decision in writing. Yay.

Alt Roq


If we modify our indexes slightly, we can get an Adaptive Join plan.

CREATE INDEX something_alt ON dbo.Posts(PostTypeId, CreationDate, Id);
CREATE INDEX stuffy_alt ON dbo.Comments(Score, CreationDate, PostId);

And, yes, this is about twice as fast now (compared to the last Batch Mode query), mostly because of the better indexing.

SQL Server Query Plan
Montage

Is There A Moral So Far?


Yes, don’t count on Batch Mode to kick in for every query where it would be helpful.

If you want queries to consistently use Batch Mode, you’ll need to do something like this.

SELECT COUNT(*)
FROM   dbo.Posts AS p
JOIN   dbo.Comments AS c
    ON  p.Id = c.PostId
LEFT JOIN dbo.t ON 1 = 0
WHERE DATEDIFF(YEAR, p.CreationDate, c.CreationDate) > 1
AND p.PostTypeId = 1
AND c.Score > 0;

But you have to be careful there too.

SQL Server Query Plan
Mad Mad Mad Mad

You might lose your nice parallel plan and end up with a slower query.

Huh.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

What Kind Of Indexes Can You Create On SQL Server Temporary Objects?

That, There


Intermediate result materialization is one of the most successful tuning methods that I regularly use.

Separating complexity is only second to eliminating complexity. Of course, sometimes those intermediate results need indexing to finish the job.

Not always. But you know. I deal with some weird stuff.

Like anime weird.

Listings!


Regular ol’ #temp tables can have just about any kind of index plopped on them. Clustered, nonclustered, filtered, column store (unless you’re using in-memory tempdb with SQL Server 2019). That’s nice parity with regular tables.

Of course, lots of indexes on temp tables have the same problem as lots of indexes on regular tables. They can slow down loading data in, especially if there’s a lot. That’s why I usually tell people load first, create indexes later.

There are a couple ways to create indexes on #temp tables:

Create, then add

/*Create, then add*/
CREATE TABLE #t (id INT NOT NULL);
/*insert data*/
CREATE CLUSTERED INDEX c ON #t(id);

Create inline

/*Create inline*/
CREATE TABLE #t(id INT NOT NULL,
                INDEX c CLUSTERED (id));

It depends on what problem you’re trying to solve:

  • Recompiles caused by #temp tables: Create Inline
  • Slow data loads: Create, then add

Another option to help with #temp table recompiles is the KEEPFIXED PLAN hint, but to wit I’ve only ever seen it used in sp_WhoIsActive.

Forgotten


Often forgotten is that table variables can be indexed in many of the same ways (at least post SQL Server 2014, when the inline index create syntax came about). The only kinds of indexes that I care about that you can’t create on a table variable are column store and filtered (column store generally, filtered pre-2019).

Other than that, it’s all fair game.

DECLARE @t TABLE( id INT NOT NULL,
                  INDEX c CLUSTERED (id),
				  INDEX n NONCLUSTERED (id) );

You can create clustered and nonclustered indexes on them, they can be unique, you can add primary keys.

It’s a whole thing.

Futuristic


In SQL Server 2019, we can also create indexes with included columns and filtered indexes with the inline syntax.

CREATE TABLE #t( id INT, 
                 more_id INT, 
				 INDEX c CLUSTERED (id),
                 INDEX n NONCLUSTERED (more_id) INCLUDE(id),
				 INDEX f NONCLUSTERED (more_id) WHERE more_id > 1 );


DECLARE @t TABLE ( id INT, 
                   more_id INT, 
				   INDEX c CLUSTERED (id),
                   INDEX n NONCLUSTERED (more_id) INCLUDE(id),
				   INDEX F NONCLUSTERED (more_id) WHERE more_id > 1 );

 

Missing Persons


Notice that I’m not talking about CTEs here. You can’t index create indexes on those.

Perhaps that’s why they’re called “common”.

Yes, you can index the underlying tables in your query, but the results of CTEs don’t get physically stored anywhere that would allow you to create an index on them.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Why Don’t I Have Any Missing Index Requests In My SQL Server Database?

This was originally posted by me as an answer here. I’m re-posting it locally for posterity.

There are many reasons why you may not have missing index requests!


We’ll look at a few of the reasons in more detail, and also talk about some of the general limitations of the feature.

General Limitations


First, from: Limitations of the Missing Indexes Feature:

  • It does not specify an order for columns to be used in an index.

As noted in this Q&A: How does SQL Server determine key column order in missing index requests?, the order of columns in the index definition is dictated by Equality vs Inequality predicate, and then column ordinal position in the table.

There are no guesses at selectivity, and there may be a better order available. It’s your job to figure that out.

Special Indexes

Missing index requests also don’t cover ‘special’ indexes, like:

  • Clustered
  • Filtered
  • Partitioned
  • Compressed
  • XML-ed
  • Spatial-ed
  • Columnstore-d
  • Indexed View-ed

What columns are considered?


Missing Index key columns are generated from columns used to filter results, like those in:

  • JOINs
  • WHERE clause

Missing Index Included columns are generated from columns required by the query, like those in:

  • SELECT
  • GROUP BY
  • ORDER BY

Even though quite often, columns you’re ordering by or grouping by can be beneficial as key columns. This goes back to one of the Limitations:

  • It is not intended to fine tune an indexing configuration.

For example, this query will not register a missing index request, even though adding an index on LastAccessDate would prevent the need to Sort (and spill to disk).

SELECT TOP (1000) u.DisplayName FROM dbo.Users AS u ORDER BY u.LastAccessDate DESC;

 

SQL Server Query Plan
NUTS

Nor does this grouping query on Location.

SELECT TOP (20000) u.Location FROM dbo.Users AS u GROUP BY u.Location

 

SQL Server Query Plan
NUTS

That doesn’t sound very helpful!


Well, yeah, but it’s better than nothing. Think of missing index requests like a crying baby. You know there’s a problem, but it’s up to you as an adult to figure out what that problem is.

You still haven’t told me why I don’t have them, though…


Relax, bucko. We’re getting there.

Trace Flags


If you enable TF 2330, missing index requests won’t be logged. To find out if you have this enabled, run this:

DBCC TRACESTATUS;

Index Rebuilds


Rebuilding indexes will clear missing index requests. So before you go Hi-Ho-Silver-Away rebuilding every index the second an iota of fragmentation sneaks in, think about the information you’re clearing out every time you do that.

You may also want to think about Why Defragmenting Your Indexes Isn’t Helping, anyway. Unless you’re using Columnstore.

Adding, Removing, or Disabling Indexes


Adding, removing, or disabling an index will clear all of the missing index requests for that table. If you’re working through several index changes on the same table, make sure you script them all out before making any.

Trivial Plans


If a plan is simple enough, and the index access choice is obvious enough, and the cost is low enough, you’ll get a trivial plan.

This effectively means there were no cost based decisions for the optimizer to make.

Via Paul White:

The details of which types of query can benefit from Trivial Plan change frequently, but things like joins, subqueries, and inequality predicates generally prevent this optimization.

When a plan is trivial, additional optimization phases are not explored, and missing indexes are not requested.

See the difference between these queries and their plans:

SELECT * FROM dbo.Users AS u WHERE u.Reputation = 2; 

SELECT * FROM dbo.Users AS u WHERE u.Reputation = 2 AND 1 = (SELECT 1);

 

SQL Server Query Plan
NUTS

The first plan is trivial, and no request is shown. There may be cases where bugs prevent missing indexes from appearing in query plans; they are usually more reliably logged in the missing index DMVs, though.

SARGability


Predicates where the optimizer wouldn’t be able to use an index efficiently even with an index may prevent them from being logged.

Things that are generally not SARGable are:

  • Columns wrapped in functions
  • Column + SomeValue = SomePredicate
  • Column + AnotherColumn = SomePredicate
  • Column = @Variable OR @Variable IS NULL

Examples:


SELECT * FROM dbo.Users AS u WHERE ISNULL(u.Age, 1000) > 1000; 

SELECT * FROM dbo.Users AS u WHERE DATEDIFF(DAY, u.CreationDate, u.LastAccessDate) > 5000;

SELECT * FROM dbo.Users AS u WHERE u.UpVotes + u.DownVotes > 10000000; 

DECLARE @ThisWillHappenWithStoredProcedureParametersToo NVARCHAR(40) = N'Eggs McLaren';
SELECT * 
FROM dbo.Users AS u 
WHERE u.DisplayName LIKE @ThisWillHappenWithStoredProcedureParametersToo OR @ThisWillHappenWithStoredProcedureParametersToo IS NULL;

 

None of these queries will register missing index requests. For more information on these, check out the following links:

You Already Have An Okay Index


Take this index:

CREATE INDEX ix_whatever ON dbo.Posts(CreationDate, Score) INCLUDE(OwnerUserId);

It looks okay for this query:

SELECT p.OwnerUserId, p.Score 
FROM dbo.Posts AS p 
WHERE p.CreationDate >= '20070101' 
AND p.CreationDate < '20181231' 
AND p.Score >= 25000 
AND 1 = (SELECT 1) 
ORDER BY p.Score DESC;

The plan is a simple Seek…

SQL Server Query Plan
NUTS

But because the leading key column is for the less-selective predicate, we end up doing more work than we should:

Table ‘Posts’. Scan count 13, logical reads 136890

If we change the index key column order, we do a lot less work:

CREATE INDEX ix_whatever ON dbo.Posts(Score, CreationDate) INCLUDE(OwnerUserId);
SQL Server Query Plan
NUTS

And significantly fewer reads:

Table ‘Posts’. Scan count 1, logical reads 5

SQL Server Is Creating Indexes For you


In certain cases, SQL Server will choose to create an index on the fly via an index spool. When an index spool is present, a missing index request won’t be. Surely adding the index yourself could be a good idea, but don’t count on SQL Server helping you figure that out.

SQL Server Query Plan
NUTS

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

All The Performance Problems With Select * Queries In SQL Server

This was originally posted by me as an answer here. I’m re-posting it locally for posterity.

The two reasons that I find the most compelling not to use SELECT * in SQL Server are

  1. Memory Grants
  2. Index usage

Memory Grants


When queries need to Sort, Hash, or go Parallel, they ask for memory for those operations. The size of the memory grant is based on the size of the data, both row and column wise.

String data especially has an impact on this, since the optimizer guesses half of the defined length as the ‘fullness’ of the column. So for a VARCHAR 100, it’s 50 bytes * the number of rows.

Using Stack Overflow as an example, if I run these queries against the Users table:

SELECT TOP 1000 
       u.DisplayName 
FROM dbo.Users AS u 
ORDER BY u.Reputation;


SELECT   TOP 1000
         u.DisplayName,
         u.Location
FROM     dbo.Users AS u
ORDER BY u.Reputation;

 

DisplayName is NVARCHAR 40, and Location is NVARCHAR 100.

Without an index on Reputation, SQL Server needs to sort the data on its own.

SQL Server Query Plan
NUTS

But the memory it nearly doubles.

DisplayName:

NUTS

DisplayName, Location:

NUTS

This gets much worse with SELECT *, asking for 8.2 GB of memory:

NUTS

It does this to cope with the larger amount of data it needs to pass through the Sort operator, including the AboutMe column, which has a MAX length.

NUTS

Index Usage


If I have this index on the Users table:

CREATE NONCLUSTERED INDEX ix_Users ON dbo.Users ( CreationDate ASC, Reputation ASC, Id ASC );

 

And I have this query, with a WHERE clause that matches the index, but doesn’t cover/include all the columns the query is selecting…

SELECT   u.*,
         p.Id AS PostId
FROM     dbo.Users AS u
JOIN     dbo.Posts AS p
    ON p.OwnerUserId = u.Id
WHERE    u.CreationDate > '20171001'
AND      u.Reputation > 100
AND      p.PostTypeId = 1
ORDER BY u.Id;

The optimizer may choose not to use the narrow index with a key lookup, in favor of just scanning the clustered index.

SQL Server Query Plan
NUTS

You would either have to create a very wide index, or experiment with rewrites to get the narrow index chosen, even though using the narrow index results in a much faster query.

SQL Server Query Plan
NUTS

CX:

SQL Server Execution Times: CPU time = 6374 ms, elapsed time = 4165 ms.

 

NC:

SQL Server Execution Times: CPU time = 1623 ms, elapsed time = 875 ms.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Why Functions Are Bad in SQL Server Where Clauses

Raised Right


It seems like every time I check out a server, the query plans are a nightmare. Users are freaking out, and management’s coffee is more Irish than Colombian.

Many times, the issue is that people are using presentation layer functions for relational processes. The where clause, joins, group by, and order by parts of a query.

Think about built-in string and date functions, wrapped around columns, and the problems they can cause.

These are things you should actively be targeting in existing code, and fighting to keep out of new code.

Nooptional


When you’re trying to get rid of them, remember your better options

  • Cleaning data on input, or via triggers: Better than wrapping everything in RTRIM/LTRIM
  • Using computed columns: Better than relying on runtime calculations like DATEADD/DATEDIFF
  • Breaking queries up: Use UNION ALL to query for either outcome (think ISNULL)
  • Using indexed views: If you need to calculate things in columns across tables
  • Creating reporting tables: Sometimes it’s easier to denormalize a bit to make writing and indexing easier
  • Using #temp tables: If you have data that you need to persist a calculation in and the query to generate it is complicated

Note the things I’m not suggesting here:

  • CTEs: Don’t materialize anything
  • @table variables: Cause more problems than they solve
  • Views: Don’t materialize unless indexed
  • Functions: Just no, thanks

More Work


Yes, finding and fixing this stuff is more work for you. But it’s a whole lot less work for the optimizer, and your server, when you’re done.

If that’s the kind of thing you need help with, drop me a line.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

SQL Server Index Spools When You Have An Index

PAUL WHITE IS WRONG


Ha ha. Just kidding. He’s never wrong about anything.

But he did write about Eager Index Spools recently, and the post ended with the following statement:

Eager index spools are often a sign that a useful permanent index is missing from the database schema.

I’d like to show you a case where you may see an Eager Index Spool even when you have the index being spooled.

Funboy & Funboy & Funboy


Let’s say we’ve got a query that, for better or worse, was written like so:

SELECT SUM(records)
FROM dbo.Posts AS p 
CROSS APPLY 
(
    SELECT COUNT(p2.Id)
    FROM   dbo.Posts AS p2
    WHERE  p2.LastEditDate >= '20121231'
	AND    p.Id = p2.Id

    UNION ALL

    SELECT COUNT(p2.Id)
    FROM   dbo.Posts AS p2
    WHERE  p2.LastEditDate IS NULL
	AND    p.Id = p2.Id
) x (records);

Right now, we’ve got this index:

CREATE INDEX to_null_or_not_to_null ON dbo.Posts(LastEditDate);

Which means we’ve effectively got an index on (LastEditDate, Id), because of how clustered index key columns are inherited by nonclustered indexes.

The APPLY section of the query plan looks like so:

SQL Server Query Plan
Spooled to death.

Each spool runs for nearly 53 seconds. The entire plan runs for 1:52.

There have been times when I’ve seen index spools created to effectively re-order existing indexes.

Perhaps that’s the case here? Let’s add this index.

CREATE /*UNIQUE*/ INDEX that_is_a_question ON dbo.Posts(Id, LastEditDate);

I’ve got UNIQUE in there in case you’re playing along at home. It makes no difference to the outcome.

I’d expect you to ask about that. I have high expectations of you, dear reader. I love you.

PLEASE DON’T LEAVE ME.

Get Out


The new execution plan looks uh.

SQL Server Query Plan
I’m In It

That’s frustrating, isn’t it? Why would you do that?

When I asked Paul why the optimizer was wrong (I understand that many of you confuse Paul with the optimizer. To wit, they’ve never been seen together.), he said something along the lines of:

The issue is that you have a unique clustered index that prevents the index matching logic from finding the better nonclustered index.

Well okay yeah lemme just go drop that clustered index or something.

Workarounds


There are several workarounds, like using FORCESEEK inside the APPLY logic.

Of course, the better method is just to write the query so there’s no need for the optimizer to join a table to itself a couple times.

SELECT SUM(x.records)
FROM (    
	SELECT COUNT(p.records)
    FROM 
    (
        SELECT 1 AS records
        FROM   dbo.Posts AS p2
        WHERE  p2.LastEditDate >= '20121231'
    ) AS p
    
    UNION ALL
    
    SELECT COUNT(p.records)
    FROM 
    (
        SELECT 1 AS records
        FROM   dbo.Posts AS p2
        WHERE  p2.LastEditDate IS NULL
    ) AS p
) AS x (records);

Which’ll finish in about 1.5 seconds.

But hey, nifty demo.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

The Right Way To Compare Date Columns In Two Different SQL Server Tables

A Certain Ratio


I have a lot to say about this demo on SQL Server 2019, which I’ll get to in a future post.

For now, I’m going to concentrate on ways to make this situation suck a little less for you wherever you are.

Let’s talk.

Pinky Blue


Let’s get a couple indexes going:

CREATE INDEX something ON dbo.Posts(PostTypeId, Id, CreationDate);
CREATE INDEX stuffy ON dbo.Comments(Score, PostId, CreationDate);

And look at a maybe kinda sorta stupid query.

SELECT COUNT(*)
FROM   dbo.Posts AS p
JOIN   dbo.Comments AS c
    ON  p.Id = c.PostId
WHERE DATEDIFF(YEAR, p.CreationDate, c.CreationDate) > 1
AND p.PostTypeId = 1
AND c.Score > 0;

We wanna find questions where a comment was left a year after they were posted, and the comment was upvoted.

BIG DATA!

What We Know


From yesterday’s post, we know that even if we put our date columns first in the index, we wouldn’t have anything to seek to.

Unlike yesterday’s post, these columns are in two different tables. We can’t make a good computed column to calculate that.

The indexes that I created help us focus on the SARGable parts of the where clause and the join conditions.

That query takes about 2 seconds.

SQL Server Query Plan
Hm.

Switch Hitter


You might be tempted to try something like this, but it won’t turn out much better unless you change your indexes.

SELECT DATEDIFF(YEAR, p.CreationDate, c.CreationDate) AS Diffo
INTO #t
FROM   dbo.Posts AS p
JOIN   dbo.Comments AS c
    ON  p.Id = c.PostId
WHERE p.CreationDate < DATEADD(YEAR, -1, c.CreationDate)
AND   c.CreationDate > DATEADD(YEAR,  1, p.CreationDate)
AND p.PostTypeId = 1
AND c.Score > 0

SELECT COUNT(*)
FROM #t AS t
WHERE t.Diffo > 1

DROP TABLE #t;

Moving CreationDate to the second column helps the first query quite a bit.

CREATE INDEX something_alt ON dbo.Posts(PostTypeId, CreationDate, Id);
CREATE INDEX stuffy_alt ON dbo.Comments(Score, CreationDate, PostId);
SQL Server Query Plan
Chock full of Jacques

Opternatives


You could try an indexed view here, too.

CREATE VIEW dbo.Sneaky
WITH SCHEMABINDING
AS
SELECT COUNT_BIG(*) AS records
FROM   dbo.Posts AS p
JOIN   dbo.Comments AS c
    ON  p.Id = c.PostId
WHERE DATEDIFF(YEAR, p.CreationDate, c.CreationDate) > 1
AND p.PostTypeId = 1
AND c.Score > 0
GO 

CREATE UNIQUE CLUSTERED INDEX whatever ON Sneaky(records);
SQL Server Query Plan
Mexican Ham Radio

But, you know. That might be overkill.

Depending.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

The Right Way To Compare Date Columns In SQL Server Queries

Tell It To The Judge


A common dilemma is when you have two date columns, and you need to judge the gap between them for something.

For instance, say you have a table of orders and you want to figure out how long on average it takes an ordered item to ship, or a shipped item to be delivered.

You’re not left with many good ways to write the query to take advantage of indexes.

Let’s have a look-see.

Iron Mask


We’re gonna skip the “aw shucks, this stinks without an index” part.

CREATE INDEX gloom ON dbo.Posts(CreationDate, LastActivityDate);

SELECT COUNT(*) AS records
FROM dbo.Posts AS p
WHERE DATEDIFF(YEAR, p.CreationDate, p.LastActivityDate) > 9;

Because it still stinks with an index. Check it out.

SQL Server Query Plan
Why would you?

Though we have a predicate, and an index on both columns, we don’t have anything to seek to.

Why? Because our predicate isn’t on anything that the index is keeping track of.

Indexes don’t care how many years, months, days, hours, minutes, seconds, milliseconds, or microseconds difference there are between these two columns.

That’d be a really cool kind of index to have for sure, but insert a shrug that fills your screen the way dark matter fills the universe here.

All we got is workarounds.

Another For Instance


We can use a computed column:

ALTER TABLE dbo.Posts 
    ADD despair AS DATEDIFF(YEAR, CreationDate, LastActivityDate);

CREATE INDEX sadness ON dbo.Posts(despair);

The result is something we can seek to.

SQL Server Query Plan
How could you?

Which is probably the type of plan that you’d prefer.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

How To Do Date Math In Your Where Clause To Avoid SQL Server Performance Problems

Pattern Baldness


I see this pattern quite a bit in stored procedures, where the front end accepts an integer which gets passed to a query.

That integer is used to specify some time period — days, months, whatever — and the procedure then has to search some date column for the relevant values.

Here’s a simplified example using plain ol’ queries.

Still Alive


Here are my queries. The recompile hints are there to edify people who are hip to local variable problems.

DECLARE @months_back INT = 1;

SELECT COUNT(*) AS records
FROM dbo.Posts AS p
WHERE DATEADD(MONTH, @months_back * -1, p.CreationDate) <= '20100101' --Usually GETDATE() is here
OPTION(RECOMPILE);
GO 

DECLARE @months_back INT = 1;

SELECT COUNT(*) AS records
FROM dbo.Posts AS p
WHERE p.CreationDate <= DATEADD(MONTH, @months_back, '20100101') --Usually GETDATE() is here
OPTION(RECOMPILE);
GO

The problem with the first query is that the function is applied to the column, rather than to the variable.

If we look at the plans for these, the optimizer only thinks one of them is special enough for an index request.

SQL Server Query Plan
please respond

Sure, there’s also a ~200ms difference between the two, which is pretty repeatable.

But that’s not the point — where things change quite a bit more is when we have a useful index. Those two queries above are just using the clustered index, which is on a column unrelated to our where clause.

CREATE INDEX etadnoitaerc ON dbo.Posts(CreationDate);
SQL Server Query Plan
Oops.

Side by side:

  • The bad query uses >10x more CPU
  • Still runs for >3x as long
  • Scans the entire index
  • Reads 10x more rows
  • Has to go parallel to remain competitive

At MAXDOP 1, it runs for just about 2.2 seconds on a single thread. Bad news.

Qual


This is one example of misplacing logic, and why it can be bad for performance.

All these demos were run on SQL Server 2019, which unfortunately didn’t save us any trouble.

In the next post, I’m going to look at another way I see people make their own doom.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.