How To Do Date Math In Your Where Clause To Avoid SQL Server Performance Problems

Pattern Baldness


I see this pattern quite a bit in stored procedures, where the front end accepts an integer which gets passed to a query.

That integer is used to specify some time period — days, months, whatever — and the procedure then has to search some date column for the relevant values.

Here’s a simplified example using plain ol’ queries.

Still Alive


Here are my queries. The recompile hints are there to edify people who are hip to local variable problems.

DECLARE @months_back INT = 1;

SELECT COUNT(*) AS records
FROM dbo.Posts AS p
WHERE DATEADD(MONTH, @months_back * -1, p.CreationDate) <= '20100101' --Usually GETDATE() is here
OPTION(RECOMPILE);
GO 

DECLARE @months_back INT = 1;

SELECT COUNT(*) AS records
FROM dbo.Posts AS p
WHERE p.CreationDate <= DATEADD(MONTH, @months_back, '20100101') --Usually GETDATE() is here
OPTION(RECOMPILE);
GO

The problem with the first query is that the function is applied to the column, rather than to the variable.

If we look at the plans for these, the optimizer only thinks one of them is special enough for an index request.

SQL Server Query Plan
please respond

Sure, there’s also a ~200ms difference between the two, which is pretty repeatable.

But that’s not the point — where things change quite a bit more is when we have a useful index. Those two queries above are just using the clustered index, which is on a column unrelated to our where clause.

CREATE INDEX etadnoitaerc ON dbo.Posts(CreationDate);
SQL Server Query Plan
Oops.

Side by side:

  • The bad query uses >10x more CPU
  • Still runs for >3x as long
  • Scans the entire index
  • Reads 10x more rows
  • Has to go parallel to remain competitive

At MAXDOP 1, it runs for just about 2.2 seconds on a single thread. Bad news.

Qual


This is one example of misplacing logic, and why it can be bad for performance.

All these demos were run on SQL Server 2019, which unfortunately didn’t save us any trouble.

In the next post, I’m going to look at another way I see people make their own doom.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

A Quick Primer On Date Math In SQL Server

Come Up


I’m going to be writing a few posts about things I see people doing with date math, or as I’ve started calling it, “dath*”, that can really hurt performance.

To get you ready for those, I need to talk about how functions like DATEADD and DATEDIFF work. It seems like that’s missing knowledge in many places.

Make Myself A Winner


As of this writing, if I run this query, I get these results:

SELECT SYSDATETIME() AS sys_date;

sys_date
2019-11-08 17:22:10.6886473

It has a date part, and a time part. Right? Cool.

The first thing I need to show you is that with the DATEADD and DATEDIFF functions, some dates and numbers are interchangeable.

For instance:

SELECT DATEDIFF(DAY, 0, SYSDATETIME()) AS [what does zero mean?],
          DATEDIFF(DAY, '19000101', SYSDATETIME()) AS [it means 1900-01-01];

Running these will both return the same value for me: 43,775. In other words, 0 and 1900-01-01 are interchangeable.

If you’re playing along at home, stick negative numbers in there too.

It might be a different result from mine when you run it, but they’ll both be the same locally.

To prove this out a little more: If we add 43,775 days to 0 (1900-01-01), we get today’s date (2019-11-08).

SELECT DATEADD( DAY, 
                DATEDIFF(DAY, 0, SYSDATETIME()), 
			    0 ) 
			    AS [add 43775 days to 1900-01-01];

In this example, I’m saying that I want to add the results of the datediff function to the date 1900-01-01.

These functions are pretty powerful, because they can accept expressions, literals, variables, parameters, column values, and so on.

The important part about this one is that the time is zeroed out.

What’s The Point?


I’ll often see people need to “flatten” dates to certain intervals.

By flatten, I mean the start of a day, week, month, or year, and likewise for the end values to all those intervals.

I’ve seen some really bad implementations of this. Most notable was probably a scalar valued function that converted a datetime to a 10 character string to remove the time portion.

In every where clause.

For much better implementations, check out these posts:

Thanks for reading!

*I don’t really call it that don’t be ludicrous

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

SQL Saturday Washington, DC: Three Weeks To Go!

HELLO GUAM!


If you’re planning on attending SQL Saturday Washington, DC, why not play hooky from work and spend the day with me learning about all the atrocities SQL Server is capable of?

I’m going to be delivering my Total Server Tuning material, which has been a hit at a whole bunch of events this past year.

In case you’ve never seen it, here’s the trailer I recorded for SQLBits earlier this year for it:

See you there!

For a limited time, use the coupon code “votesql” for $50 off.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Can You Stop People From Creating SQL Server Indexes With Low Fill Factors?

Yeah nah


So like, presented without much comment, this server level trigger will, in my limited testing, “work”.

Just make sure you understand something: this doesn’t stop the index from being created, it only rolls creation back afterwards.

If someone creates a gigantic index on an equally gigantic table, you’re in for a hell of a ride. I’d probably only deploy this on local dev boxes, and only if I really needed to prove a point.

CREATE OR ALTER TRIGGER CheckFillFactor
ON ALL SERVER
FOR CREATE_INDEX, ALTER_INDEX
AS
DECLARE @FillFactor NVARCHAR(4000);
DECLARE @Percent INT;
SELECT @FillFactor = EVENTDATA().value('(/EVENT_INSTANCE/TSQLCommand/CommandText)[1]', 'NVARCHAR(4000)');
IF UPPER(@FillFactor) LIKE '%FILLFACTOR%'
    BEGIN
        SET @FillFactor = REPLACE(@FillFactor, ' ', '');
        PRINT @FillFactor;
        SELECT @FillFactor = SUBSTRING(@FillFactor, CHARINDEX(N'FILLFACTOR=', @FillFactor) + LEN(N'FILLFACTOR='), PATINDEX('%[^0-9]%', @FillFactor) + 2);
        
		IF TRY_CONVERT(INT, @FillFactor) IS NULL
		    BEGIN
			    SET @Percent = LEFT(@FillFactor, 2);
			END;
		ELSE
		    BEGIN
			    SET @Percent = @FillFactor;
			END;

		IF @Percent < 80
            BEGIN
                RAISERROR('WHY YOU DO THAT?', 0, 1) WITH NOWAIT;
                ROLLBACK;
            END;
    END;
GO

 

It’ll work for create or alter index commands, i.e.

--Fails, under 80
CREATE INDEX whatever ON dbo.Users (Reputation) WHERE Reputation > 100000 WITH (FILLFACTOR = 70);

--Works, over 80
CREATE INDEX whatever ON dbo.Users (Reputation) WHERE Reputation > 100000 WITH (FILLFACTOR = 90);

--Fails, under 80
ALTER INDEX whatever ON dbo.Users REBUILD WITH (FILLFACTOR = 70);

--Works, uses default
CREATE INDEX whatever ON dbo.Users (Reputation) WHERE Reputation > 100000;

Pink Blood


Is it perfect? Probably not, but I threw it together quickly as a POC.

For instance, my first stab broke when fill factor wasn’t specified in the command.

My second stab broke when I changed the spacing around the “=”.

Let me know in the comments if you can get around it or break it, other than by changing server settings — I can’t go that far here.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

How TABLOCK Hints Can Hurt Column Store Insert Performance

Utility


There are a lot of good reasons to use tablock on your inserts, as long as you can handle the reduced concurrency of a table level lock.

You may get minimal logging here, here. And you may even get parallel inserts. You may even avoid weird compression side effects.

All very good reasons to vie for the affections of tablock hints!

You should do that.

You should try.

Because Sometimes


You may run into oddball scenarios where the results without tablock are better.

For example, I was experimenting with clustered column store temp tables for another post, and I came across this oddity.

Assume two identical inserts, but one has a tablock hint and one doesn’t.

DROP TABLE IF EXISTS #t_cci;
GO 

CREATE TABLE #t_cci (UserId INT, VoteTypeId INT, CreationDate DATETIME, INDEX c CLUSTERED COLUMNSTORE);

INSERT #t_cci WITH (TABLOCK) -- A tab of locks
    ( UserId, VoteTypeId, CreationDate )
SELECT v.UserId, v.VoteTypeId, MAX(v.CreationDate)
FROM dbo.Votes AS v
WHERE v.UserId IS NOT NULL
GROUP BY v.UserId, v.VoteTypeId;

DROP TABLE IF EXISTS #t_cci;
GO 

CREATE TABLE #t_cci (UserId INT, VoteTypeId INT, CreationDate DATETIME, INDEX c CLUSTERED COLUMNSTORE);

INSERT #t_cci -- A lack of tabs
    ( UserId, VoteTypeId, CreationDate )
SELECT v.UserId, v.VoteTypeId, MAX(v.CreationDate)
FROM dbo.Votes AS v
WHERE v.UserId IS NOT NULL
GROUP BY v.UserId, v.VoteTypeId;

The plans for those queries, in that order, look like this:

SQL Server Query Plan
Grey Skies

The fully parallel query takes 4.6 seconds, and the serial insert query takes 1.7 seconds.

Even more strange, the insert with tablock leaves four open rowgroups, and the non-tablock query has one compressed rowgroup.

SQL Server Performance Metrics
It was not a dream.

Adding to that, using sp_spaceused on the temp tables, we can see some important differences.

SQL Server Index Sizes
neener

The uncompressed temp table is much larger, has more unused space, and has… index size?!

Yep, because open row groups, or delta stores, are stored as b-trees.

the columnstore index might store some data temporarily into a clustered index called a deltastore

I believe this happens with the parallel insert, because each thread inserts ~75k rows, which is fewer than the minimum 102,400 needed to qualify for skipping the delta store and being compressed. A simple way to test this is to run the tablock query at MAXDOP 2.

SQL Server Query Plan
Hoowee

That ends up being a little faster than the serial insert, and also gets full compression.

Each thread is ~150k rows and gets compressed. Hooray.

Looking at a PerfView diff of the good and bad samples, this path is only present in the slow run:

PerfView
Step by step

When inserting to the delta store, we need to acquire exclusive latches like just about any b-tree insert, and reserve log space to do so.

Morality Hilarity


To summarize the problem:

  • The TABLOCK hint gets us a fully parallel insert plan
  • There aren’t enough rows on each parallel thread to get row group compression
  • We need to insert into a clustered index behind the scenes that makes up the delta store
  • Which results in threads waiting on exclusive latches and reserving log space

Will you ever hit this? Maybe someday.

This will also happen with real tables (not just temp tables), so I suspect it’s a potentially common scenario for column store users.

You get all worked up getting these parallel inserts and then your compression sucks.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

No Column Store Indexes With In-Memory Tempdb For SQL Server? No Problem!

Old Habits


Let’s say you’re on SQL Server 2019. No, seriously. It’s been out for a couple weeks now.

You could be.

I say that you could be because you’re the kind of brave person who tries new things and experiments with their body server.

You may even do crazy things like this.

Stone Cold


CREATE TABLE #t ( id INT, INDEX c CLUSTERED COLUMNSTORE );

SELECT    COUNT_BIG(*) AS records
FROM      dbo.Users AS u
JOIN      dbo.Posts AS p
    ON u.Id = p.OwnerUserId
JOIN      dbo.Comments AS c
    ON u.Id = c.UserId
LEFT JOIN #t AS t ON 1 = 0;

Woah ho ho. What happened there? A #temp table with a clustered column store index on it left joined on 1 = 0?

Yes. People do this.

People do this because it’s getting some batch mode operations “for free”, which have the nasty habit of making big reporting queries run a lot faster.

Yonder Problem


When you enable 2019’s new in memory tempdb, which can really help with stuff tempdb needs help with, you may find yourself hitting errors.

Msg 11442, Level 16, State 1, Line 14
Columnstore index creation is not support in tempdb when memory-optimized metadata mode is enabled.
Msg 1750, Level 16, State 1, Line 14
Could not create constraint or index. See previous errors.

The good news is that this works with *real* tables, too.

CREATE TABLE dbo.t ( id INT, INDEX c CLUSTERED COLUMNSTORE );

SELECT    COUNT_BIG(*) AS records
FROM      dbo.Users AS u
JOIN      dbo.Posts AS p
    ON u.Id = p.OwnerUserId
JOIN      dbo.Comments AS c
    ON u.Id = c.UserId
LEFT JOIN dbo.t AS t ON 1 = 0;

And you can get plans with all sorts of Batchy goodness in them.

SQL Server Query Plan
Long way from home

Yeah, you’re gonna have to change some code, but don’t worry.

You’re the kind of person who enjoys that.

Right?

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

When SELECT * Doesn’t Matter In SQL Server Queries

Pavlovian


There’s a strange response to some things in the SQL Server community that borders on religious fervor. I once worked with someone who chastised me for having SELECT * in some places in the Blitz scripts. It was odd and awkward.

Odd because this person was the most Senior DBA in the company, and awkward because they didn’t believe me that it didn’t matter in some cases.

People care about SELECT * for many valid reasons, but context is everything.

One For The Money


The first place it doesn’t matter is EXISTS. Take this index and this query:

CREATE INDEX specatular_blob ON dbo.Posts(PostTypeId, OwnerUserId);

SELECT COUNT(*) AS records
FROM   dbo.Users AS u
WHERE  EXISTS ( SELECT * 
                FROM dbo.Posts AS p 
				WHERE p.OwnerUserId = u.Id 
				AND p.PostTypeId = 2 );

The relevant part of the query plan looks like this:

SQL Server Query Plan
What’s My Name

We do a seek into the index we created on the two columns in our WHERE clause. We didn’t have to go back to the clustered index for everything else in the table.

That’s easy enough to prove if we only run the subquery — we have to change it a little bit, but the plan tells us what we need.

SELECT *
FROM   dbo.Posts AS p
WHERE  p.OwnerUserId = 22656
AND    p.PostTypeId = 2;

This time we do need the clustered index:

SQL Server Query Plan
Who We Be

You can even change it to something that would normally throw an error:

SELECT COUNT(*) AS records
FROM   dbo.Users AS u
WHERE  EXISTS ( SELECT 1/0 
                FROM dbo.Posts AS p 
				WHERE p.OwnerUserId = u.Id 
				AND p.PostTypeId = 2 );

Two For Completeness


Another example is in derived tables, joins, and apply.

Take these two queries. The first one only selects columns in our nonclustered index (same as above).

The second one actually does a SELECT *.

/*selective*/
SELECT     u.Id,
           u.DisplayName,
		   ca.OwnerUserId, --I am only selecting columns in our index
		   ca.PostTypeId,
		   ca.Id
FROM       dbo.Users AS u
CROSS APPLY( SELECT TOP (1) * --I am select *
             FROM dbo.Posts AS p 
			 WHERE p.OwnerUserId = u.Id 
			 AND p.PostTypeId = 2 
			 ORDER BY p.OwnerUserId DESC, p.Id DESC) AS ca
WHERE U.Reputation >= 100000;

/*less so*/
SELECT     u.Id,
           u.DisplayName,
		   ca.* --I am select *
FROM       dbo.Users AS u
CROSS APPLY( SELECT TOP (1) * --I am select *
             FROM dbo.Posts AS p 
			 WHERE p.OwnerUserId = u.Id 
			 AND p.PostTypeId = 2 
			 ORDER BY p.OwnerUserId DESC, p.Id DESC) AS ca
WHERE U.Reputation >= 100000;

The first query only touches our narrow nonclustered index:

SQL Server Query Plan
Blackout

The second query does a key lookup, because we really do select everything.

SQL Server Query Plan
Party Up

Trash Pile


I know, you’ve been well-conditioned to freak out about certain things. I’m here to help.

Not every SELECT * needs to be served a stake through the heart and beheading.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Announcing My Precon For SQL Saturday Washington, DC

HELLO GUAM!


If you’re planning on attending SQL Saturday Washington, DC, why not play hooky from work and spend the day with me learning about all the atrocities SQL Server is capable of?

I’m going to be delivering my Total Server Tuning material, which has been a hit at a whole bunch of events this past year. It’s an eye-opening full day of training where you’ll find out all my favorite ways that things can go wrong with SQL Server hardware, queries, and indexes.

And of course, how you can outsmart SQL Server.

Which is pretty hard.

Like, doctors work on it and stuff.

See you there!

For a limited time, use the coupon code “votesql” for $50 off.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Stack Overflow Demo Database Column Store Edition: Helpful Links

Sniff


If you’re starting from scratch with column store, here are some links that’ll help you get a better understanding of how they work, and what they’re good for, start here.

For some background information on column store indexes, see these:

For a detailed look at column store from Niko Neugebauer:

For scripts to help you analyze column store indexes:

Joe Obbish’s posts here:

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Stack Overflow Demo Database Column Store Edition: Exploring Relationships

Special Relativity


To get you started exploring the Stack Overflow column store database, here are some queries that show how tables are related.

The two main relationships are User Id, and Post Id.

User Id

/*User Id*/
SELECT COUNT_BIG(*) AS records
FROM dbo.Users AS u
JOIN dbo.Badges AS b
    ON b.UserId = u.Id;

SELECT COUNT_BIG(*) AS records
FROM dbo.Users AS u
JOIN dbo.Comments AS c
    ON c.UserId = u.Id;

SELECT COUNT_BIG(*) AS records
FROM dbo.Users AS u
JOIN dbo.Posts AS p
    ON p.OwnerUserId = u.Id;

SELECT COUNT_BIG(*) AS records
FROM dbo.Users AS u
JOIN dbo.Votes AS v
    ON v.UserId = u.Id;

Post Id

/*Post Id*/
SELECT COUNT_BIG(*) AS records
FROM dbo.Posts AS p
JOIN dbo.Comments AS c
    ON c.PostId = p.Id;

SELECT COUNT_BIG(*) AS records
FROM dbo.Posts AS p
JOIN dbo.Votes AS v
    ON v.PostId = p.Id;

Note-ry


A couple things to note, here:

  • Joining Users to Votes is unreliable, because most of the voting is anonymized in the dump
  • Things get much more interesting when you start building queries within relationships

For example, using the User Id columns in tables that aren’t users to join larger tables together, or joining Comments to Votes on Post Id.

You can really start to feel your CPU fans.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.