Why Some SQL Server Date Functions Get Better Cardinality Estimates

Date Debate


Searching dates is a common enough task. There are, of course, good and bad ways to do this.

Aaron Bertrand’s article, Bad habits to kick: mis-handling date / range queries, is a good place to start to learn about that subject.

This isn’t quite about the same thing, just about some behavior that I thought was interesting, and how it changes between cardinality estimator versions.

Bad Robot


If you’ve been query tuning for a while, you probably know about SARGability, and that wrapping columns in functions is generally a bad idea.

But just like there are slightly different rules for CAST and CONVERT with dates, the repercussions of the function also vary.

The examples I’m going to look at are for YEAR() and MONTH().

If you want a TL;DR, here you go.

Reality Bites

If you wanna keep going, follow me!

USING


The takeaway here isn’t that doing either of these is okay. You should fully avoid wrapping columns in functions in general.

One of the main problems with issuing queries with non-SARGable predicates is that the people who most often do it are the people who rely on missing index requests to direct tuning efforts, and non-SARGable queries can prevent those requests from surfacing, or ask for an even more sub-optimal index than usual.

If you have a copy of the StackOverflow2013 database, you can replicate the results pretty easily on SQL Server 2017.

They may be slightly different depending on how the histogram is generated, but the overarching theme is the same.

Yarly


If you run these queries, and look at the estimated and actual rows in the Clustered Index scan tooltip, you’ll see they change for every query.

DECLARE @blob_eater DATETIME;

    SELECT @blob_eater = u.CreationDate
    FROM dbo.Users AS u
    WHERE YEAR(u.CreationDate) = 2008;

    SELECT @blob_eater = u.CreationDate
    FROM dbo.Users AS u
    WHERE YEAR(u.CreationDate) = 2009;

    SELECT @blob_eater = u.CreationDate
    FROM dbo.Users AS u
    WHERE YEAR(u.CreationDate) = 2010;

    SELECT @blob_eater = u.CreationDate
    FROM dbo.Users AS u
    WHERE YEAR(u.CreationDate) = 2011;

    SELECT @blob_eater = u.CreationDate
    FROM dbo.Users AS u
    WHERE YEAR(u.CreationDate) = 2012;

    SELECT @blob_eater = u.CreationDate
    FROM dbo.Users AS u
    WHERE YEAR(u.CreationDate) = 2013;
    GO

Here’s a sample from the 2008 and 2009 queries.

Wild For The Night

ED: I took a break from writing this and “went to brunch”.

Any logical inconsistencies will work themselves out eventually.

Cash Your Checks And Come Up


Alright, let’s try that again with by month.

If you hit yourself in the head with a hammer and forgot the TL;DR, here’s what happens:

DECLARE @blob_eater DATETIME;

    SELECT @blob_eater = u.CreationDate
    FROM dbo.Users AS u
    WHERE MONTH(u.CreationDate) = 1;

    SELECT @blob_eater = u.CreationDate
    FROM dbo.Users AS u
    WHERE MONTH(u.CreationDate) = 2;

    SELECT @blob_eater = u.CreationDate
    FROM dbo.Users AS u
    WHERE MONTH(u.CreationDate) = 3;

    SELECT @blob_eater = u.CreationDate
    FROM dbo.Users AS u
    WHERE MONTH(u.CreationDate) = 4;

    SELECT @blob_eater = u.CreationDate
    FROM dbo.Users AS u
    WHERE MONTH(u.CreationDate) = 5;

    SELECT @blob_eater = u.CreationDate
    FROM dbo.Users AS u
    WHERE MONTH(u.CreationDate) = 6;

    SELECT @blob_eater = u.CreationDate
    FROM dbo.Users AS u
    WHERE MONTH(u.CreationDate) = 7;

    SELECT @blob_eater = u.CreationDate
    FROM dbo.Users AS u
    WHERE MONTH(u.CreationDate) = 8;

    SELECT @blob_eater = u.CreationDate
    FROM dbo.Users AS u
    WHERE MONTH(u.CreationDate) = 9;

    SELECT @blob_eater = u.CreationDate
    FROM dbo.Users AS u
    WHERE MONTH(u.CreationDate) = 10;

    SELECT @blob_eater = u.CreationDate
    FROM dbo.Users AS u
    WHERE MONTH(u.CreationDate) = 11;

    SELECT @blob_eater = u.CreationDate
    FROM dbo.Users AS u
    WHERE MONTH(u.CreationDate) = 12;

If you run these, they’ll all have the same guess on the clustered index scan.

To keep things simple, let’s look at the first couple:

BADLY

The difference here is that now every single row estimate will be 205,476.

Lesson learned: The optimizer can make a decent statistical guess at the year portion of a date, but not the month portion.

In a way, you can think of this like a LIKE query.

The optimizer can make a decent guess at ‘YEAR%’, but not at ‘%MONTH%’.

Actual Facts To Snack On And Chew


The same thing happens for both new and old cardinality estimators.

DECLARE @blob_eater DATETIME;

    SELECT @blob_eater = u.CreationDate
    FROM dbo.Users AS u
    WHERE YEAR(u.CreationDate) = 2008
    OPTION(USE HINT('FORCE_DEFAULT_CARDINALITY_ESTIMATION'));

    SELECT @blob_eater = u.CreationDate
    FROM dbo.Users AS u
    WHERE YEAR(u.CreationDate) = 2008
    OPTION(USE HINT('FORCE_LEGACY_CARDINALITY_ESTIMATION'));
    GO 

DECLARE @blob_eater DATETIME;

    SELECT @blob_eater = u.CreationDate
    FROM dbo.Users AS u
    WHERE MONTH(u.CreationDate) = 12
    OPTION(USE HINT('FORCE_DEFAULT_CARDINALITY_ESTIMATION'));


    SELECT @blob_eater = u.CreationDate
    FROM dbo.Users AS u
    WHERE MONTH(u.CreationDate) = 12
    OPTION(USE HINT('FORCE_LEGACY_CARDINALITY_ESTIMATION'));
    GO

Wouldn’t Get Far


But if we combine predicates, something really different happens between Linda Cardellini estimators.

DECLARE @blob_eater DATETIME;

    SELECT @blob_eater = u.CreationDate
    FROM dbo.Users AS u
    WHERE MONTH(u.CreationDate) = 12
    AND YEAR(u.CreationDate) = 2012
    OPTION(USE HINT('FORCE_DEFAULT_CARDINALITY_ESTIMATION'));


    SELECT @blob_eater = u.CreationDate
    FROM dbo.Users AS u
    WHERE MONTH(u.CreationDate) = 12
    AND YEAR(u.CreationDate) = 2012
    OPTION(USE HINT('FORCE_LEGACY_CARDINALITY_ESTIMATION'));
    GO
WRONG

In this case, the old CE (on the right), makes a very bad guess of 1 row.

The new CE (on the left) makes a slightly better, but still not great guess.

Ended


Neither of these is a good way to query date or time data.

You can see in every tooltip that, behind the scenes, the queries used the DATEPART function, which means that also doesn’t help.

The point of this post is that someone may use a function to query the year portion of a date and assume that SQL Server does a good job on any other portion, which isn’t the case.

None of these queries are SARGable, and at no point is a missing index request raised on the CreationDate column, even though if you add one it gets used and reduces reads.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

My Upcoming Speaking Schedule

Weather Is Here, Wish You Were Wonderful


April 23rd

Boston SQL User Group

I’ll be presenting The SQL Server Performance Tasting Menu.

You’re a DBA or Developer, and you’ve been using SQL Server for a few years.

You know there are different ways to make queries faster, but you’re not sure when to use them.

I’m Erik Darling, and I’ll be your sommelier for the evening.

Over several courses of delicious demos, I’ll show you the types of performance problems different tuning techniques pair well with, and which ones to avoid.

When we’re done, you’ll understand exactly what patterns to look for when you’re troubleshooting slow queries, and how to approach them.

You’ll have the secret recipe for gourmet queries.

May 10th


I’m doing a full training day for NESQL

You’ve been working with SQL Server for a while, and you’re sick and tired of not being able to understand how it all works.
Every day, you stare at monitoring tool charts and graphs, wait stats, and widgets, and none of it seems to add up.
In this fast-paced full day of demos, I’m going to show you how queries interact with your server, and each other.
You’ll discover that concurrency isn’t just about locking and blocking; concurrency is also about hardware.
I’m going to teach you how to diagnose your worst server issues and practical ways to solve them.

See you there!


I hope to add more to the list in the coming months.

If you’re looking for remote speakers, hit me up!

If you want to put on a full training day, hit me up!

If you need a SQL Saturday precon, hit me up!

I have stuff.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Last Week’s Almost Definitely Not Office Hours: April 12

ICYMI


Last week’s thrilling, stunning, flawless episode of whatever-you-wanna-call-it.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

SQL Server 2019 ELEVATE ONLINE Index Operations

I Still Don’t Like Index Rebuilds



(Except columnstore)

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

SQL Server 2019’s In Memory tempdb Improves Performance

And I’m Pins And I’m Needles



Thanks for watching!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Make Missing Indexes Great Again

WOOOOHOOOOOO



Thanks for watching!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Self Contained SQL Server Query Plans

Plan, Actually


SQL Server has started collecting a ton of information about a query when it executes.

Live query stats actually captures operator runtimes. Additionally, the stuff that’s captured in actual query plan XML has seen a lot of development.

SSMS 18 goes a step further and shows you those without ticking the Live Query Plan button.

What am I getting at?

Outside Shot


As a consultant, people sometimes send me query plans. They’re usually estimated, or cached plans.

That’s not bad! You can get a sense of some important things based on them, but there’s a ton of detail in actual plans that makes life easier.

One example is with parameter sniffing: estimated and cached plans look like they did something completely reasonable.

Getting an actual plan is tough, though, especially if it’s a long running query, or the query runs modifications.

Containers Are All The Rage


  • What if query plan XML had enough information in it for you to “execute” the query locally without returning any results?
  • What if you could press play, fast forward, and rewind on a query plan?
  • What if you could try things like using the new or old CE or other hints on the query?
  • What if parameters could be masked (but differentiated internally) to test parameter sniffing?

This might be possible with the right information collected, even if some of it is imperfect. In newer versions of SQL Server, even information about statistics is gathered by the plan.

The one missing piece would be index definitions, and perhaps reasons why indexes weren’t used.

With the direction Microsoft is finally going in collecting runtime information about queries, I wouldn’t be surprised if something like this became possible.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Last Week’s Almost Definitely Not Office Hours: April 5

ICYMI


Last week’s thrilling, stunning, flawless episode of whatever-you-wanna-call-it.

Thanks for watching!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Memory Grants For The SQL Variant Data Type In SQL Server

Great Question, You


During my (sold out, baby!) Madison precon, one attendee asked a great question while we were talking about memory grants.

Turns out, if you use the SQL Variant datatype, the memory grants function a lot like they do for any long string type.

From the documentation, which hopefully won’t move or get deleted:

sql_variant can have a maximum length of 8016 bytes. This includes both the base type information and the base type value. The maximum length of the actual base type value is 8,000 bytes.

Since the optimizer needs to plan for your laziness indecisiveness lack of respect for human life inexperience, you can end up getting some rather enormous memory grants, regardless of the type of data you store in variant columns.

Ol’ Dirty Demo


Here’s a table with a limited set of columns from the Users table.

CREATE TABLE dbo.UserVariant 
( 
    Id SQL_VARIANT, 
    CreationDate SQL_VARIANT, 
    DisplayName SQL_VARIANT,
    Orderer INT IDENTITY
);

INSERT dbo.UserVariant WITH(TABLOCKX)
( Id, CreationDate, DisplayName )
SELECT u.Id, u.CreationDate, u.DisplayName
FROM dbo.Users AS u

In all, about 2.4 million rows end up in there. In the real table, the Id column is an integer, the CreationDate column is a DATETIME, and the DisplayName column is an NVARCHAR 40.

Sadly, no matter which column we select, the memory grant is the same:

SELECT TOP (101) uv.Id
FROM dbo.UserVariant AS uv
ORDER BY uv.Orderer;

SELECT TOP (101) uv.CreationDate
FROM dbo.UserVariant AS uv
ORDER BY uv.Orderer;

SELECT TOP (101) uv.DisplayName
FROM dbo.UserVariant AS uv
ORDER BY uv.Orderer;

SELECT TOP (101) uv.Id, uv.CreationDate, uv.DisplayName
FROM dbo.UserVariant AS uv
ORDER BY uv.Orderer;

It’s also the maximum memory grant my laptop will allow: about 9.6GB.

SQL Server Query Plan
Large Marge

Get’em!


As if there aren’t enough reasons to avoid sql_variant, here’s another one.

Thanks for reading.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Stop Looking At SQL Server Wait Stats Without Looking At Server Uptime

Economy Of Waits


There’s a joke about two economists walking down the street.

One of them asks the other how they’re doing.

The punchline is that their response is “compared to what?”

It’s not the best joke, and it’s something to keep in mind when you’re measuring anything, but SQL Server specifically.

This isn’t a post about collecting baselines, though it’s a relevant concept.

Scenery, Yo


One of the best ways to find bottlenecks in SQL Server is to look at wait stats.

Lots of scripts and monitoring tools will show you top waits, percentages, signal waits, and even percentages of signal waits.

Oh baby, those datapoints.

But there’s frequently a missing axis: compared to what?

Weakly Links


Let’s say you’ve got 604,800 seconds of CX packet waits.

Let’s also say they’re 95% of your total server wait stats.

How does your opinion of that number change if your server has been up for:

  • One Day (86,400 seconds)
  • One Week (604,800 seconds)
  • One Month (2,592,000 seconds)
  • One Year (31,536,000 seconds)

Obviously, if your server has been up for a day, you might wanna pay more attention to that metric.

If your server has been up for two weeks, it becomes less of an issue.

Seven Year Abs


I’ll give you another example: OH MY GOD YOU ATE 20,000 CALORIES.

  • In a day, that might be cause for concern
  • In a week, you’re about average
  • In a month, you might need medical attention
  • In a year, well, you’re probably more calorically important to worms

Compared to what is a pretty important measure.

Forced Perspective


I get it. Someone can clear out wait stats, and judging uptime can be unreliable, and more difficult up in the cloud.

Looking at wait stats without knowing the period of time they were collected over isn’t terribly helpful.

I’d opened an issue to at least separate wait stats by database, though Microsoft doesn’t seem to be too into my idea.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.