SQL Server Archives | Page 137 of 164

How To Do Date Math In Your Where Clause To Avoid SQL Server Performance Problems

Posted on November 26, 2019May 16, 2022 by Erik Darling

Pattern Baldness

I see this pattern quite a bit in stored procedures, where the front end accepts an integer which gets passed to a query.

That integer is used to specify some time period — days, months, whatever — and the procedure then has to search some date column for the relevant values.

Here’s a simplified example using plain ol’ queries.

Still Alive

Here are my queries. The recompile hints are there to edify people who are hip to local variable problems.

DECLARE @months_back INT = 1;

SELECT COUNT(*) AS records
FROM dbo.Posts AS p
WHERE DATEADD(MONTH, @months_back * -1, p.CreationDate) <= '20100101' --Usually GETDATE() is here
OPTION(RECOMPILE);
GO 

DECLARE @months_back INT = 1;

SELECT COUNT(*) AS records
FROM dbo.Posts AS p
WHERE p.CreationDate <= DATEADD(MONTH, @months_back, '20100101') --Usually GETDATE() is here
OPTION(RECOMPILE);
GO

The problem with the first query is that the function is applied to the column, rather than to the variable.

If we look at the plans for these, the optimizer only thinks one of them is special enough for an index request.

Sure, there’s also a ~200ms difference between the two, which is pretty repeatable.

But that’s not the point — where things change quite a bit more is when we have a useful index. Those two queries above are just using the clustered index, which is on a column unrelated to our where clause.

CREATE INDEX etadnoitaerc ON dbo.Posts(CreationDate);

Side by side:

The bad query uses >10x more CPU
Still runs for >3x as long
Scans the entire index
Reads 10x more rows
Has to go parallel to remain competitive

At MAXDOP 1, it runs for just about 2.2 seconds on a single thread. Bad news.

Qual

This is one example of misplacing logic, and why it can be bad for performance.

All these demos were run on SQL Server 2019, which unfortunately didn’t save us any trouble.

In the next post, I’m going to look at another way I see people make their own doom.

Thanks for reading!

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

A Quick Primer On Date Math In SQL Server

Posted on November 25, 2019May 16, 2022 by Erik Darling

Come Up

I’m going to be writing a few posts about things I see people doing with date math, or as I’ve started calling it, “dath*”, that can really hurt performance.

To get you ready for those, I need to talk about how functions like DATEADD and DATEDIFF work. It seems like that’s missing knowledge in many places.

Make Myself A Winner

As of this writing, if I run this query, I get these results:

SELECT SYSDATETIME() AS sys_date;

sys_date
2019-11-08 17:22:10.6886473

It has a date part, and a time part. Right? Cool.

The first thing I need to show you is that with the DATEADD and DATEDIFF functions, some dates and numbers are interchangeable.

For instance:

SELECT DATEDIFF(DAY, 0, SYSDATETIME()) AS [what does zero mean?],
          DATEDIFF(DAY, '19000101', SYSDATETIME()) AS [it means 1900-01-01];

Running these will both return the same value for me: 43,775. In other words, 0 and 1900-01-01 are interchangeable.

If you’re playing along at home, stick negative numbers in there too.

It might be a different result from mine when you run it, but they’ll both be the same locally.

To prove this out a little more: If we add 43,775 days to 0 (1900-01-01), we get today’s date (2019-11-08).

SELECT DATEADD( DAY, 
                DATEDIFF(DAY, 0, SYSDATETIME()), 
			    0 ) 
			    AS [add 43775 days to 1900-01-01];

In this example, I’m saying that I want to add the results of the datediff function to the date 1900-01-01.

These functions are pretty powerful, because they can accept expressions, literals, variables, parameters, column values, and so on.

The important part about this one is that the time is zeroed out.

What’s The Point?

I’ll often see people need to “flatten” dates to certain intervals.

By flatten, I mean the start of a day, week, month, or year, and likewise for the end values to all those intervals.

I’ve seen some really bad implementations of this. Most notable was probably a scalar valued function that converted a datetime to a 10 character string to remove the time portion.

In every where clause.

For much better implementations, check out these posts:

Calendar table of important dates, by Aaron Bertrand
Simplifying Date Calculations, by Aaron Bertrand

Thanks for reading!

*I don’t really call it that don’t be ludicrous

Going Further

SQL Saturday Washington, DC: Three Weeks To Go!

Posted on November 22, 2019May 14, 2022 by Erik Darling

HELLO GUAM!

If you’re planning on attending SQL Saturday Washington, DC, why not play hooky from work and spend the day with me learning about all the atrocities SQL Server is capable of?

I’m going to be delivering my Total Server Tuning material, which has been a hit at a whole bunch of events this past year.

In case you’ve never seen it, here’s the trailer I recorded for SQLBits earlier this year for it:

See you there!

For a limited time, use the coupon code “votesql” for $50 off.

Going Further

Can You Stop People From Creating SQL Server Indexes With Low Fill Factors?

Posted on November 21, 2019May 16, 2022 by Erik Darling

Yeah nah

So like, presented without much comment, this server level trigger will, in my limited testing, “work”.

Just make sure you understand something: this doesn’t stop the index from being created, it only rolls creation back afterwards.

If someone creates a gigantic index on an equally gigantic table, you’re in for a hell of a ride. I’d probably only deploy this on local dev boxes, and only if I really needed to prove a point.

CREATE OR ALTER TRIGGER CheckFillFactor
ON ALL SERVER
FOR CREATE_INDEX, ALTER_INDEX
AS
DECLARE @FillFactor NVARCHAR(4000);
DECLARE @Percent INT;
SELECT @FillFactor = EVENTDATA().value('(/EVENT_INSTANCE/TSQLCommand/CommandText)[1]', 'NVARCHAR(4000)');
IF UPPER(@FillFactor) LIKE '%FILLFACTOR%'
    BEGIN
        SET @FillFactor = REPLACE(@FillFactor, ' ', '');
        PRINT @FillFactor;
        SELECT @FillFactor = SUBSTRING(@FillFactor, CHARINDEX(N'FILLFACTOR=', @FillFactor) + LEN(N'FILLFACTOR='), PATINDEX('%[^0-9]%', @FillFactor) + 2);
        
		IF TRY_CONVERT(INT, @FillFactor) IS NULL
		    BEGIN
			    SET @Percent = LEFT(@FillFactor, 2);
			END;
		ELSE
		    BEGIN
			    SET @Percent = @FillFactor;
			END;

		IF @Percent < 80
            BEGIN
                RAISERROR('WHY YOU DO THAT?', 0, 1) WITH NOWAIT;
                ROLLBACK;
            END;
    END;
GO

It’ll work for create or alter index commands, i.e.

--Fails, under 80
CREATE INDEX whatever ON dbo.Users (Reputation) WHERE Reputation > 100000 WITH (FILLFACTOR = 70);

--Works, over 80
CREATE INDEX whatever ON dbo.Users (Reputation) WHERE Reputation > 100000 WITH (FILLFACTOR = 90);

--Fails, under 80
ALTER INDEX whatever ON dbo.Users REBUILD WITH (FILLFACTOR = 70);

--Works, uses default
CREATE INDEX whatever ON dbo.Users (Reputation) WHERE Reputation > 100000;

Pink Blood

Is it perfect? Probably not, but I threw it together quickly as a POC.

For instance, my first stab broke when fill factor wasn’t specified in the command.

My second stab broke when I changed the spacing around the “=”.

Let me know in the comments if you can get around it or break it, other than by changing server settings — I can’t go that far here.

Thanks for reading!

Going Further

How TABLOCK Hints Can Hurt Column Store Insert Performance

Posted on November 20, 2019May 16, 2022 by Erik Darling

Utility

There are a lot of good reasons to use tablock on your inserts, as long as you can handle the reduced concurrency of a table level lock.

You may get minimal logging here, here. And you may even get parallel inserts. You may even avoid weird compression side effects.

All very good reasons to vie for the affections of tablock hints!

You should do that.

You should try.

Because Sometimes

You may run into oddball scenarios where the results without tablock are better.

For example, I was experimenting with clustered column store temp tables for another post, and I came across this oddity.

Assume two identical inserts, but one has a tablock hint and one doesn’t.

DROP TABLE IF EXISTS #t_cci;
GO 

CREATE TABLE #t_cci (UserId INT, VoteTypeId INT, CreationDate DATETIME, INDEX c CLUSTERED COLUMNSTORE);

INSERT #t_cci WITH (TABLOCK) -- A tab of locks
    ( UserId, VoteTypeId, CreationDate )
SELECT v.UserId, v.VoteTypeId, MAX(v.CreationDate)
FROM dbo.Votes AS v
WHERE v.UserId IS NOT NULL
GROUP BY v.UserId, v.VoteTypeId;

DROP TABLE IF EXISTS #t_cci;
GO 

CREATE TABLE #t_cci (UserId INT, VoteTypeId INT, CreationDate DATETIME, INDEX c CLUSTERED COLUMNSTORE);

INSERT #t_cci -- A lack of tabs
    ( UserId, VoteTypeId, CreationDate )
SELECT v.UserId, v.VoteTypeId, MAX(v.CreationDate)
FROM dbo.Votes AS v
WHERE v.UserId IS NOT NULL
GROUP BY v.UserId, v.VoteTypeId;

The plans for those queries, in that order, look like this:

The fully parallel query takes 4.6 seconds, and the serial insert query takes 1.7 seconds.

Even more strange, the insert with tablock leaves four open rowgroups, and the non-tablock query has one compressed rowgroup.

SQL Server Performance Metrics — It was not a dream.

Adding to that, using sp_spaceused on the temp tables, we can see some important differences.

The uncompressed temp table is much larger, has more unused space, and has… index size?!

Yep, because open row groups, or delta stores, are stored as b-trees.

the columnstore index might store some data temporarily into a clustered index called a deltastore

I believe this happens with the parallel insert, because each thread inserts ~75k rows, which is fewer than the minimum 102,400 needed to qualify for skipping the delta store and being compressed. A simple way to test this is to run the tablock query at MAXDOP 2.

That ends up being a little faster than the serial insert, and also gets full compression.

Each thread is ~150k rows and gets compressed. Hooray.

Looking at a PerfView diff of the good and bad samples, this path is only present in the slow run:

When inserting to the delta store, we need to acquire exclusive latches like just about any b-tree insert, and reserve log space to do so.

Morality Hilarity

To summarize the problem:

The TABLOCK hint gets us a fully parallel insert plan
There aren’t enough rows on each parallel thread to get row group compression
We need to insert into a clustered index behind the scenes that makes up the delta store
Which results in threads waiting on exclusive latches and reserving log space

Will you ever hit this? Maybe someday.

This will also happen with real tables (not just temp tables), so I suspect it’s a potentially common scenario for column store users.

You get all worked up getting these parallel inserts and then your compression sucks.

Thanks for reading!

Going Further

No Column Store Indexes With In-Memory Tempdb For SQL Server? No Problem!

Posted on November 19, 2019May 16, 2022 by Erik Darling

Old Habits

Let’s say you’re on SQL Server 2019. No, seriously. It’s been out for a couple weeks now.

You could be.

I say that you could be because you’re the kind of brave person who tries new things and experiments with their ~~body~~ server.

You may even do crazy things like this.

Stone Cold

CREATE TABLE #t ( id INT, INDEX c CLUSTERED COLUMNSTORE );

SELECT    COUNT_BIG(*) AS records
FROM      dbo.Users AS u
JOIN      dbo.Posts AS p
    ON u.Id = p.OwnerUserId
JOIN      dbo.Comments AS c
    ON u.Id = c.UserId
LEFT JOIN #t AS t ON 1 = 0;

Woah ho ho. What happened there? A #temp table with a clustered column store index on it left joined on 1 = 0?

Yes. People do this.

People do this because it’s getting some batch mode operations “for free”, which have the nasty habit of making big reporting queries run a lot faster.

Yonder Problem

When you enable 2019’s new in memory tempdb, which can really help with stuff tempdb needs help with, you may find yourself hitting errors.

Msg 11442, Level 16, State 1, Line 14
Columnstore index creation is not support in tempdb when memory-optimized metadata mode is enabled.
Msg 1750, Level 16, State 1, Line 14
Could not create constraint or index. See previous errors.

The good news is that this works with *real* tables, too.

CREATE TABLE dbo.t ( id INT, INDEX c CLUSTERED COLUMNSTORE );

SELECT    COUNT_BIG(*) AS records
FROM      dbo.Users AS u
JOIN      dbo.Posts AS p
    ON u.Id = p.OwnerUserId
JOIN      dbo.Comments AS c
    ON u.Id = c.UserId
LEFT JOIN dbo.t AS t ON 1 = 0;

And you can get plans with all sorts of Batchy goodness in them.

Yeah, you’re gonna have to change some code, but don’t worry.

You’re the kind of person who enjoys that.

Right?

Thanks for reading!

Going Further

When SELECT * Doesn’t Matter In SQL Server Queries

Posted on November 18, 2019August 6, 2023 by Erik Darling

Pavlovian

There’s a strange response to some things in the SQL Server community that borders on religious fervor. I once worked with someone who chastised me for having SELECT * in some places in the Blitz scripts. It was odd and awkward.

Odd because this person was the most Senior DBA in the company, and awkward because they didn’t believe me that it didn’t matter in some cases.

People care about SELECT * for many valid reasons, but context is everything.

One For The Money

The first place it doesn’t matter is EXISTS. Take this index and this query:

CREATE INDEX specatular_blob ON dbo.Posts(PostTypeId, OwnerUserId);

SELECT COUNT(*) AS records
FROM   dbo.Users AS u
WHERE  EXISTS ( SELECT * 
                FROM dbo.Posts AS p 
				WHERE p.OwnerUserId = u.Id 
				AND p.PostTypeId = 2 );

The relevant part of the query plan looks like this:

We do a seek into the index we created on the two columns in our WHERE clause. We didn’t have to go back to the clustered index for everything else in the table.

That’s easy enough to prove if we only run the subquery — we have to change it a little bit, but the plan tells us what we need.

SELECT *
FROM   dbo.Posts AS p
WHERE  p.OwnerUserId = 22656
AND    p.PostTypeId = 2;

This time we do need the clustered index:

You can even change it to something that would normally throw an error:

SELECT COUNT(*) AS records
FROM   dbo.Users AS u
WHERE  EXISTS ( SELECT 1/0 
                FROM dbo.Posts AS p 
				WHERE p.OwnerUserId = u.Id 
				AND p.PostTypeId = 2 );

Two For Completeness

Another example is in derived tables, joins, and apply.

Take these two queries. The first one only selects columns in our nonclustered index (same as above).

The second one actually does a SELECT *.

/*selective*/
SELECT     u.Id,
           u.DisplayName,
		   ca.OwnerUserId, --I am only selecting columns in our index
		   ca.PostTypeId,
		   ca.Id
FROM       dbo.Users AS u
CROSS APPLY( SELECT TOP (1) * --I am select *
             FROM dbo.Posts AS p 
			 WHERE p.OwnerUserId = u.Id 
			 AND p.PostTypeId = 2 
			 ORDER BY p.OwnerUserId DESC, p.Id DESC) AS ca
WHERE U.Reputation >= 100000;

/*less so*/
SELECT     u.Id,
           u.DisplayName,
		   ca.* --I am select *
FROM       dbo.Users AS u
CROSS APPLY( SELECT TOP (1) * --I am select *
             FROM dbo.Posts AS p 
			 WHERE p.OwnerUserId = u.Id 
			 AND p.PostTypeId = 2 
			 ORDER BY p.OwnerUserId DESC, p.Id DESC) AS ca
WHERE U.Reputation >= 100000;

The first query only touches our narrow nonclustered index:

The second query does a key lookup, because we really do select everything.

Trash Pile

I know, you’ve been well-conditioned to freak out about certain things. I’m here to help.

Not every SELECT * needs to be served a stake through the heart and beheading.

Thanks for reading!

Going Further

Announcing My Precon For SQL Saturday Washington, DC

Posted on November 15, 2019May 16, 2022 by Erik Darling

HELLO GUAM!

If you’re planning on attending SQL Saturday Washington, DC, why not play hooky from work and spend the day with me learning about all the atrocities SQL Server is capable of?

I’m going to be delivering my Total Server Tuning material, which has been a hit at a whole bunch of events this past year. It’s an eye-opening full day of training where you’ll find out all my favorite ways that things can go wrong with SQL Server hardware, queries, and indexes.

And of course, how you can outsmart SQL Server.

Which is pretty hard.

Like, doctors work on it and stuff.

See you there!

For a limited time, use the coupon code “votesql” for $50 off.

Going Further

Stack Overflow Demo Database Column Store Edition: Helpful Links

Posted on November 15, 2019May 16, 2022 by Erik Darling

Sniff

If you’re starting from scratch with column store, here are some links that’ll help you get a better understanding of how they work, and what they’re good for, start here.

For some background information on column store indexes, see these:

For a detailed look at column store from Niko Neugebauer:

Columnstore

For scripts to help you analyze column store indexes:

Columnstore Indexes Scripts Library

Joe Obbish’s posts here:

Posty

Thanks for reading!

Going Further

Stack Overflow Demo Database Column Store Edition: Exploring Relationships

Posted on November 14, 2019May 16, 2022 by Erik Darling

Special Relativity

To get you started exploring the Stack Overflow column store database, here are some queries that show how tables are related.

The two main relationships are User Id, and Post Id.

User Id

/*User Id*/
SELECT COUNT_BIG(*) AS records
FROM dbo.Users AS u
JOIN dbo.Badges AS b
    ON b.UserId = u.Id;

SELECT COUNT_BIG(*) AS records
FROM dbo.Users AS u
JOIN dbo.Comments AS c
    ON c.UserId = u.Id;

SELECT COUNT_BIG(*) AS records
FROM dbo.Users AS u
JOIN dbo.Posts AS p
    ON p.OwnerUserId = u.Id;

SELECT COUNT_BIG(*) AS records
FROM dbo.Users AS u
JOIN dbo.Votes AS v
    ON v.UserId = u.Id;

Post Id

/*Post Id*/
SELECT COUNT_BIG(*) AS records
FROM dbo.Posts AS p
JOIN dbo.Comments AS c
    ON c.PostId = p.Id;

SELECT COUNT_BIG(*) AS records
FROM dbo.Posts AS p
JOIN dbo.Votes AS v
    ON v.PostId = p.Id;

Note-ry

A couple things to note, here:

Joining Users to Votes is unreliable, because most of the voting is anonymized in the dump
Things get much more interesting when you start building queries within relationships

For example, using the User Id columns in tables that aren’t users to join larger tables together, or joining Comments to Votes on Post Id.

You can really start to feel your CPU fans.

Thanks for reading!