SQL Server Archives | Page 153 of 164

Why Some SQL Server Date Functions Get Better Cardinality Estimates

Posted on April 22, 2019May 16, 2022 by Erik Darling

Date Debate

Searching dates is a common enough task. There are, of course, good and bad ways to do this.

Aaron Bertrand’s article, Bad habits to kick: mis-handling date / range queries, is a good place to start to learn about that subject.

This isn’t quite about the same thing, just about some behavior that I thought was interesting, and how it changes between cardinality estimator versions.

Bad Robot

If you’ve been query tuning for a while, you probably know about SARGability, and that wrapping columns in functions is generally a bad idea.

But just like there are slightly different rules for CAST and CONVERT with dates, the repercussions of the function also vary.

The examples I’m going to look at are for YEAR() and MONTH().

If you want a TL;DR, here you go.

Reality Bites

If you wanna keep going, follow me!

USING

The takeaway here isn’t that doing either of these is okay. You should fully avoid wrapping columns in functions in general.

One of the main problems with issuing queries with non-SARGable predicates is that the people who most often do it are the people who rely on missing index requests to direct tuning efforts, and non-SARGable queries can prevent those requests from surfacing, or ask for an even more sub-optimal index than usual.

If you have a copy of the StackOverflow2013 database, you can replicate the results pretty easily on SQL Server 2017.

They may be slightly different depending on how the histogram is generated, but the overarching theme is the same.

Yarly

If you run these queries, and look at the estimated and actual rows in the Clustered Index scan tooltip, you’ll see they change for every query.

DECLARE @blob_eater DATETIME;

    SELECT @blob_eater = u.CreationDate
    FROM dbo.Users AS u
    WHERE YEAR(u.CreationDate) = 2008;

    SELECT @blob_eater = u.CreationDate
    FROM dbo.Users AS u
    WHERE YEAR(u.CreationDate) = 2009;

    SELECT @blob_eater = u.CreationDate
    FROM dbo.Users AS u
    WHERE YEAR(u.CreationDate) = 2010;

    SELECT @blob_eater = u.CreationDate
    FROM dbo.Users AS u
    WHERE YEAR(u.CreationDate) = 2011;

    SELECT @blob_eater = u.CreationDate
    FROM dbo.Users AS u
    WHERE YEAR(u.CreationDate) = 2012;

    SELECT @blob_eater = u.CreationDate
    FROM dbo.Users AS u
    WHERE YEAR(u.CreationDate) = 2013;
    GO

Here’s a sample from the 2008 and 2009 queries.

ED: I took a break from writing this and “went to brunch”.

Any logical inconsistencies will work themselves out eventually.

Cash Your Checks And Come Up

Alright, let’s try that again with by month.

If you hit yourself in the head with a hammer and forgot the TL;DR, here’s what happens:

DECLARE @blob_eater DATETIME;

    SELECT @blob_eater = u.CreationDate
    FROM dbo.Users AS u
    WHERE MONTH(u.CreationDate) = 1;

    SELECT @blob_eater = u.CreationDate
    FROM dbo.Users AS u
    WHERE MONTH(u.CreationDate) = 2;

    SELECT @blob_eater = u.CreationDate
    FROM dbo.Users AS u
    WHERE MONTH(u.CreationDate) = 3;

    SELECT @blob_eater = u.CreationDate
    FROM dbo.Users AS u
    WHERE MONTH(u.CreationDate) = 4;

    SELECT @blob_eater = u.CreationDate
    FROM dbo.Users AS u
    WHERE MONTH(u.CreationDate) = 5;

    SELECT @blob_eater = u.CreationDate
    FROM dbo.Users AS u
    WHERE MONTH(u.CreationDate) = 6;

    SELECT @blob_eater = u.CreationDate
    FROM dbo.Users AS u
    WHERE MONTH(u.CreationDate) = 7;

    SELECT @blob_eater = u.CreationDate
    FROM dbo.Users AS u
    WHERE MONTH(u.CreationDate) = 8;

    SELECT @blob_eater = u.CreationDate
    FROM dbo.Users AS u
    WHERE MONTH(u.CreationDate) = 9;

    SELECT @blob_eater = u.CreationDate
    FROM dbo.Users AS u
    WHERE MONTH(u.CreationDate) = 10;

    SELECT @blob_eater = u.CreationDate
    FROM dbo.Users AS u
    WHERE MONTH(u.CreationDate) = 11;

    SELECT @blob_eater = u.CreationDate
    FROM dbo.Users AS u
    WHERE MONTH(u.CreationDate) = 12;

If you run these, they’ll all have the same guess on the clustered index scan.

To keep things simple, let’s look at the first couple:

The difference here is that now every single row estimate will be 205,476.

Lesson learned: The optimizer can make a decent statistical guess at the year portion of a date, but not the month portion.

In a way, you can think of this like a LIKE query.

The optimizer can make a decent guess at ‘YEAR%’, but not at ‘%MONTH%’.

Actual Facts To Snack On And Chew

The same thing happens for both new and old cardinality estimators.

DECLARE @blob_eater DATETIME;

    SELECT @blob_eater = u.CreationDate
    FROM dbo.Users AS u
    WHERE YEAR(u.CreationDate) = 2008
    OPTION(USE HINT('FORCE_DEFAULT_CARDINALITY_ESTIMATION'));

    SELECT @blob_eater = u.CreationDate
    FROM dbo.Users AS u
    WHERE YEAR(u.CreationDate) = 2008
    OPTION(USE HINT('FORCE_LEGACY_CARDINALITY_ESTIMATION'));
    GO 

DECLARE @blob_eater DATETIME;

    SELECT @blob_eater = u.CreationDate
    FROM dbo.Users AS u
    WHERE MONTH(u.CreationDate) = 12
    OPTION(USE HINT('FORCE_DEFAULT_CARDINALITY_ESTIMATION'));


    SELECT @blob_eater = u.CreationDate
    FROM dbo.Users AS u
    WHERE MONTH(u.CreationDate) = 12
    OPTION(USE HINT('FORCE_LEGACY_CARDINALITY_ESTIMATION'));
    GO

Wouldn’t Get Far

But if we combine predicates, something really different happens between Linda Cardellini estimators.

DECLARE @blob_eater DATETIME;

    SELECT @blob_eater = u.CreationDate
    FROM dbo.Users AS u
    WHERE MONTH(u.CreationDate) = 12
    AND YEAR(u.CreationDate) = 2012
    OPTION(USE HINT('FORCE_DEFAULT_CARDINALITY_ESTIMATION'));


    SELECT @blob_eater = u.CreationDate
    FROM dbo.Users AS u
    WHERE MONTH(u.CreationDate) = 12
    AND YEAR(u.CreationDate) = 2012
    OPTION(USE HINT('FORCE_LEGACY_CARDINALITY_ESTIMATION'));
    GO

In this case, the old CE (on the right), makes a very bad guess of 1 row.

The new CE (on the left) makes a slightly better, but still not great guess.

Ended

Neither of these is a good way to query date or time data.

You can see in every tooltip that, behind the scenes, the queries used the DATEPART function, which means that also doesn’t help.

The point of this post is that someone may use a function to query the year portion of a date and assume that SQL Server does a good job on any other portion, which isn’t the case.

None of these queries are SARGable, and at no point is a missing index request raised on the CreationDate column, even though if you add one it gets used and reduces reads.

Thanks for reading!

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

My Upcoming Speaking Schedule

Posted on April 19, 2019May 14, 2022 by Erik Darling

Weather Is Here, Wish You Were Wonderful

April 23rd

Boston SQL User Group

I’ll be presenting The SQL Server Performance Tasting Menu.

You’re a DBA or Developer, and you’ve been using SQL Server for a few years.

You know there are different ways to make queries faster, but you’re not sure when to use them.

I’m Erik Darling, and I’ll be your sommelier for the evening.

Over several courses of delicious demos, I’ll show you the types of performance problems different tuning techniques pair well with, and which ones to avoid.

When we’re done, you’ll understand exactly what patterns to look for when you’re troubleshooting slow queries, and how to approach them.

You’ll have the secret recipe for gourmet queries.

May 10th

I’m doing a full training day for NESQL

You’ve been working with SQL Server for a while, and you’re sick and tired of not being able to understand how it all works.

Every day, you stare at monitoring tool charts and graphs, wait stats, and widgets, and none of it seems to add up.

In this fast-paced full day of demos, I’m going to show you how queries interact with your server, and each other.

You’ll discover that concurrency isn’t just about locking and blocking; concurrency is also about hardware.

I’m going to teach you how to diagnose your worst server issues and practical ways to solve them.

See you there!

I hope to add more to the list in the coming months.

If you’re looking for remote speakers, hit me up!

If you want to put on a full training day, hit me up!

If you need a SQL Saturday precon, hit me up!

I have stuff.

Thanks for reading!

Going Further

Last Week’s Almost Definitely Not Office Hours: April 12

Posted on April 18, 2019May 16, 2022 by Erik Darling

ICYMI

Last week’s thrilling, stunning, flawless episode of whatever-you-wanna-call-it.

Going Further

SQL Server 2019 ELEVATE ONLINE Index Operations

Posted on April 17, 2019May 14, 2022 by Erik Darling

I Still Don’t Like Index Rebuilds

(Except columnstore)

Thanks for reading!

Going Further

SQL Server 2019’s In Memory tempdb Improves Performance

Posted on April 16, 2019May 16, 2022 by Erik Darling

And I’m Pins And I’m Needles

Thanks for watching!

Going Further

Make Missing Indexes Great Again

Posted on April 15, 2019May 16, 2022 by Erik Darling

WOOOOHOOOOOO

Thanks for watching!

Going Further

Self Contained SQL Server Query Plans

Posted on April 12, 2019May 14, 2022 by Erik Darling

Plan, Actually

SQL Server has started collecting a ton of information about a query when it executes.

Live query stats actually captures operator runtimes. Additionally, the stuff that’s captured in actual query plan XML has seen a lot of development.

SSMS 18 goes a step further and shows you those without ticking the Live Query Plan button.

What am I getting at?

Outside Shot

As a consultant, people sometimes send me query plans. They’re usually estimated, or cached plans.

That’s not bad! You can get a sense of some important things based on them, but there’s a ton of detail in actual plans that makes life easier.

One example is with parameter sniffing: estimated and cached plans look like they did something completely reasonable.

Getting an actual plan is tough, though, especially if it’s a long running query, or the query runs modifications.

Containers Are All The Rage

What if query plan XML had enough information in it for you to “execute” the query locally without returning any results?
What if you could press play, fast forward, and rewind on a query plan?
What if you could try things like using the new or old CE or other hints on the query?
What if parameters could be masked (but differentiated internally) to test parameter sniffing?

This might be possible with the right information collected, even if some of it is imperfect. In newer versions of SQL Server, even information about statistics is gathered by the plan.

The one missing piece would be index definitions, and perhaps reasons why indexes weren’t used.

With the direction Microsoft is finally going in collecting runtime information about queries, I wouldn’t be surprised if something like this became possible.

Thanks for reading!

Going Further

Last Week’s Almost Definitely Not Office Hours: April 5

Posted on April 11, 2019May 14, 2022 by Erik Darling

ICYMI

Last week’s thrilling, stunning, flawless episode of whatever-you-wanna-call-it.

Thanks for watching!

Going Further

Memory Grants For The SQL Variant Data Type In SQL Server

Posted on April 10, 2019May 16, 2022 by Erik Darling

Great Question, You

During my (sold out, baby!) Madison precon, one attendee asked a great question while we were talking about memory grants.

Turns out, if you use the SQL Variant datatype, the memory grants function a lot like they do for any long string type.

From the documentation, which hopefully won’t move or get deleted:

sql_variant can have a maximum length of 8016 bytes. This includes both the base type information and the base type value. The maximum length of the actual base type value is 8,000 bytes.

Since the optimizer needs to plan for your ~~laziness~~ ~~indecisiveness~~ ~~lack of respect for human life~~ inexperience, you can end up getting some rather enormous memory grants, regardless of the type of data you store in variant columns.

Ol’ Dirty Demo

Here’s a table with a limited set of columns from the Users table.

CREATE TABLE dbo.UserVariant 
( 
    Id SQL_VARIANT, 
    CreationDate SQL_VARIANT, 
    DisplayName SQL_VARIANT,
    Orderer INT IDENTITY
);

INSERT dbo.UserVariant WITH(TABLOCKX)
( Id, CreationDate, DisplayName )
SELECT u.Id, u.CreationDate, u.DisplayName
FROM dbo.Users AS u

In all, about 2.4 million rows end up in there. In the real table, the Id column is an integer, the CreationDate column is a DATETIME, and the DisplayName column is an NVARCHAR 40.

Sadly, no matter which column we select, the memory grant is the same:

SELECT TOP (101) uv.Id
FROM dbo.UserVariant AS uv
ORDER BY uv.Orderer;

SELECT TOP (101) uv.CreationDate
FROM dbo.UserVariant AS uv
ORDER BY uv.Orderer;

SELECT TOP (101) uv.DisplayName
FROM dbo.UserVariant AS uv
ORDER BY uv.Orderer;

SELECT TOP (101) uv.Id, uv.CreationDate, uv.DisplayName
FROM dbo.UserVariant AS uv
ORDER BY uv.Orderer;

It’s also the maximum memory grant my laptop will allow: about 9.6GB.

Get’em!

As if there aren’t enough reasons to avoid sql_variant, here’s another one.

Thanks for reading.

Going Further

Stop Looking At SQL Server Wait Stats Without Looking At Server Uptime

Posted on April 8, 2019May 14, 2022 by Erik Darling

Economy Of Waits

There’s a joke about two economists walking down the street.

One of them asks the other how they’re doing.

The punchline is that their response is “compared to what?”

It’s not the best joke, and it’s something to keep in mind when you’re measuring anything, but SQL Server specifically.

This isn’t a post about collecting baselines, though it’s a relevant concept.

Scenery, Yo

One of the best ways to find bottlenecks in SQL Server is to look at wait stats.

Lots of scripts and monitoring tools will show you top waits, percentages, signal waits, and even percentages of signal waits.

Oh baby, those datapoints.

But there’s frequently a missing axis: compared to what?

Weakly Links

Let’s say you’ve got 604,800 seconds of CX packet waits.

Let’s also say they’re 95% of your total server wait stats.

How does your opinion of that number change if your server has been up for:

One Day (86,400 seconds)
One Week (604,800 seconds)
One Month (2,592,000 seconds)
One Year (31,536,000 seconds)

Obviously, if your server has been up for a day, you might wanna pay more attention to that metric.

If your server has been up for two weeks, it becomes less of an issue.

Seven Year Abs

I’ll give you another example: OH MY GOD YOU ATE 20,000 CALORIES.

In a day, that might be cause for concern
In a week, you’re about average
In a month, you might need medical attention
In a year, well, you’re probably more calorically important to worms

Compared to what is a pretty important measure.

Forced Perspective

I get it. Someone can clear out wait stats, and judging uptime can be unreliable, and more difficult up in the cloud.

Looking at wait stats without knowing the period of time they were collected over isn’t terribly helpful.

I’d opened an issue to at least separate wait stats by database, though Microsoft doesn’t seem to be too into my idea.

Thanks for reading!