Function Plan Caching In SQL Server: Scalar Valued Functions

Posted on October 1, 2019May 16, 2022 by Erik Darling

The End Is Nigh-Ish

Yes. SQL Server 2019. Scalar Function Inlining. Froid.

This post isn’t about any of that. It’s about scalar valued functions as we know them now.

Terrible, horrible, no-good, very-bad, presentation-filler.

But to make matters worse, we’re going to combine them with a problem no one is pretending to solve: bad parameter sniffing.

After all, that’s what can go wrong when you cache a plan, right?

Our Dummy

This is our function. It’s set up to do something that should happen on Stack sites: users should be penalized for leaving comments.

Okay, so this isn’t exactly how it would work. But stick with me, because it makes a better demo.

CREATE OR ALTER FUNCTION dbo.CommentsAreHorribleScalar(@Id INT)
RETURNS BIGINT
WITH SCHEMABINDING, RETURNS NULL ON NULL INPUT
AS
BEGIN
DECLARE @Tally BIGINT
SELECT @Tally = 
	   (SELECT SUM(Score) FROM dbo.Posts AS p WHERE p.OwnerUserId = @Id)
	   -
	   (SELECT COUNT_BIG(*) FROM dbo.Comments AS c WHERE c.UserId = @Id)
RETURN @Tally
END
GO

Really, we’re just getting a sum of all the scores in the Posts table for a user, then subtracting the count of comments they’ve left.

Because comments are horrible.

Anyway.

Individually

If we run these two queries, the plan will get reused.

SELECT dbo.CommentsAreHorribleScalar(22656);

SELECT dbo.CommentsAreHorribleScalar(138);

We can free the cache, run them in the other order, and the same will happen in reverse.

Of course, each one gets a different plan.

Tangentially, these functions will end up dm_exec_function_stats, which was introduced in 2016, and can also be identified by name in the plan cache.

SELECT * 
FROM sys.dm_exec_function_stats AS defs;

EXEC sp_BlitzCache @StoredProcName = 'CommentsAreHorribleScalar';

Sniffing

If we cache a plan for 22656 or 138, and then run a query like this:

SELECT TOP (100) u.DisplayName,
       dbo.CommentsAreHorribleScalar(u.Id)
FROM dbo.Users AS u;

The query will reuse whatever the cached plan is.

Again, using BlitzCache to track the plan down:

SQL Server Query Results — Favorite shoe

In this case, the query is simple enough that the plan difference doesn’t change performance very much.

Let’s change our function a little bit to see a better example.

Appalled

Here it is:

CREATE OR ALTER FUNCTION dbo.CommentsAreHorribleScalar(@Id INT)
RETURNS BIGINT
WITH SCHEMABINDING, RETURNS NULL ON NULL INPUT
AS
BEGIN

DECLARE @Tally BIGINT

SELECT @Tally = 
	   (SELECT SUM(Score) FROM dbo.Posts AS p WHERE p.OwnerUserId <= @Id)
	   -
	   (SELECT COUNT_BIG(*) FROM dbo.Comments AS c WHERE c.UserId <= @Id)

RETURN @Tally

END
GO

The plans are way different now:

22656 scans the clustered index, and 138 seeks into the nonclustered index with a key lookup.

For the record, the bad plan is the clustered index scan, and the fast plan is the nonclustered index seek.

Running a smaller query (because the top hundred made me impatient), the difference is obvious.

SELECT TOP (5) u.DisplayName,
       dbo.CommentsAreHorribleScalar(u.Id)
FROM dbo.Users AS u;

The version of the query that gets the bad function plan has a 10 second compute scalar, and the version of the query that gets the good function plan finishes in 900ms.

Functions Can Get Different Plans

Depending on how you call them, and how much data they might have to touch, this can be a big deal or no deal.

Because I know I’ll get some semi-related questions:

The body of scalar functions have no restrictions on parallelism, only the calling statements are forced to run serially
Functions run “once per row”, with the number of rows being dependent on where in the plan the compute scalar appears

Thanks for reading!

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

New Darling Data Product: DBA Funeral Insurance

Posted on September 30, 2019May 16, 2022 by Erik Darling

Sorry, Tennessee

With a new version of SQL Server coming out soon, you’re gonna hear about a lot of DBAs dying due to some new feature or another.

Don’t be caught off guard like your poor, passed-on colleagues who didn’t prepare for other massive die-offs from:

Missing index requests ?
Virtualization ?
PowerShell ?
The Cloud ?
Query Store ?
Data Science (???)

Loved Ones

We’ve been studying the trends here at Darling Data, and we’ve found a $700,0000% increase in DBA death and dismemberment with each new SQL Server release.

That’s 15 quadrillion percent more than the cost of milk in the Himalayas.

Think about it.

Yak Milk.

Don’t let your loved ones drink Yak Milk because you wanted to be buried in a life-sized replica of Nick’s Computer.

Shady Acres

The Senior DBA Care Plan© from Darling4Lyfe™ will help cover those rising funeral costs.

The best part about the plan is that’s totally tax free. Because if you don’t tell, I won’t tell either.

And you’ll be dead.

Federal* data shows a staggering increase in the cost of burying all those DBAs.

Apparently it’s tough to get someone in the ground while they’re kicking and screaming.

Call 1-900-WHATEVER today to learn more about the benefits you and your family deserve.

Plan Operators are standing by!

_{*: I assume there’s federal data on this that agrees with me. Subject to change at any time. Not an actual number in this timeline. Not Data Driven®}

Going Further

Joe Obbish Precon Super Sellout Sale

Posted on September 27, 2019May 16, 2022 by Erik Darling

If you’re interested in attending Joe’s precon, use the discount code “compressionsavings” to save 40% through Tuesday, 10/1 when ticket sales end!

I’m In Your Area

If you’re in the New York area and looking for some great free SQL Server training, head to SQL Saturday NYC.

We’ve got a great lineup of speakers covering a wide variety of topics that can help you learn no matter what your job function is.

And of course, if you’re looking for a full day of training, we’ve got a precon with Joe Obbish the Friday before the event (October 4th).

Joe Obbish – Clustered Columnstore For Performance

Buy Tickets Here!

Clustered columnstore indexes can be a great solution for data warehouse workloads, but there’s not a lot of advanced training or detailed documentation out there. It’s easy to feel all alone when you want a second opinion or run into a problem with query performance or data loading that you don’t know how to solve.

In this full day session, I’ll teach you the most important things I know about clustered columnstore indexes. Specifically, I’ll teach you how to make the right choices with your schema, data loads, query tuning, and columnstore maintenance. All of these lessons have been learned the hard way with 4 TB of production data on large, 96+ core servers. Material is applicable from SQL Server 2016 through 2019.

Here’s what I’ll be talking about:

– How columnstore compression works and tips for picking the right data types
– Loading columnstore data quickly, especially on large servers
– Improving query performance on columnstore tables
– Maintaining your columnstore tables

This is an advanced level session. To get the most out of the material, attendees should have some practical experience with columnstore and query tuning, and a solid understanding of internals such as wait stats analysis. You don’t need to bring a laptop to follow along.

Buy Tickets Here!

Location:

Courtyard Times Square – 114 West 40th street NY, NY 10018
Meeting Room – Lower Level – Meeting Room A & B

I'm just a picture, don't click on me.

Going Further

Live SQL Server Q&A!

Posted on September 26, 2019May 16, 2022 by Erik Darling

ICYMI

Last week’s thrilling, stunning, flawless episode of whatever-you-wanna-call-it.

Going Further

Trace flag 3656 is not sufficient for symbol resolution on SQL Server 2019

Posted on September 25, 2019May 14, 2022 by Joe Obbish

You may have noticed that TF 3656 appears to no longer work in SQL Server 2019 RC1. Symbols are not resolved in Extended Events event data even with that trace flag enabled. Trace flag 2592 must also be enabled to resolve symbols. This was recently added by Microsoft to the documentation. This concludes the shortest blog post I will ever write.

Be Careful When You Alter SQL Server Indexed Views

Posted on September 25, 2019May 16, 2022 by Erik Darling

Yo, Is That Mary?

Going Further

How SQL Server Statistics Get Updated Automatically

Posted on September 24, 2019May 16, 2022 by Erik Darling

Spawning Monsters

Here we go again, with me promising to blog about something later.

This time it’s an attempt to explain how SQL Server chooses which statistics to update.

It’s not glamorous, and it may even make you angry, but you know.

They can’t all be posts about…

*checks notes*

*stares into the camera*

*tears up notes*

*tears up*

*stares off camera until someone cuts to commercials*

And We’re Back

Let’s start with the query we’re going to use to examine our statistics.

    SELECT      t.name, 
	            s.name, 
				s.stats_id,
				sp.last_updated, 
				sp.rows, 
				sp.rows_sampled, 
				sp.modification_counter
    FROM        sys.stats AS s
    JOIN        sys.tables AS t
        ON s.object_id = t.object_id
    CROSS APPLY sys.dm_db_stats_properties(s.object_id, s.stats_id) AS sp
    WHERE       t.name = 'UserStats';

Right now, the results aren’t too interesting, because we only have a statistics object attached to the Primary Key.

We’re not gonna touch that column. We’re gonna use another column.

This query will get system generated statistics created on the AccountId column.

    SELECT COUNT(*)
    FROM   dbo.UserStats AS u
    WHERE  u.AccountId > 1000 
    AND    u.AccountId < 9999
	OPTION(RECOMPILE);

SQL Server Statistics — How nice of you to ask.

By itself, this isn’t very interesting. Let’s create an index, too.

    CREATE INDEX ix_AccountId ON dbo.UserStats ( AccountId );

The index created statistics, too. With the equivalent of a full scan! See that rows_sampled column?

I mean, why not, if you’re already scanning the whole table to get the data you need for the index, right?

Right.

I’m gonna use a couple updates to flip values around.

	UPDATE u
	SET u.AccountId = u.UpVotes + u.DownVotes
	FROM dbo.UserStats AS u
	WHERE 1 = 1;
	
	UPDATE u
	SET u.AccountId = u.UpVotes - u.DownVotes
	FROM dbo.UserStats AS u
	WHERE 1 = 1;

Don’t ask me why I swallowed a fly.

But the WHERE 1 = 1 is enough to get SQL Prompt to not warn me about running an update with no where clause.

Both stats objects have been modified the same number of times.

Let’s run our COUNT query and see what happens!

We can see that only the stats for the index were updated (and with the default sampling rate, not a full scan).

Now let’s create another stats object with FULLSCAN.

    CREATE STATISTICS s_AccountId ON dbo.UserStats ( AccountId ) WITH FULLSCAN;

We’ll also go ahead and run an update again.

And then our COUNT query…

SQL Server took two perfectly good fully sampled statistics and reduced them to the default sampling.

This doesn’t hurt our query, but it certainly is annoying to see.

That’s why newer versions of SQL Server allow you to persist the sampling rate.

Latest and Greatest

A lot of the stuff people call “rocket science” about statistics options, like auto create and auto update stats, are there for a reason.

When you let SQL Server make choices, they’re not always the best ones.

Tracking this stuff down and understanding when and if it’s a problem is hard work, though. Don’t flip those switches lightly, my friends.

Thanks for reading!

Going Further

Why Does My Serial Query Plan Have Parallel Wait Stats Like CXPACKET And CXCONSUMER?

Posted on September 23, 2019May 16, 2022 by Erik Darling

Harkening

In dramatic fashion, I’m revisiting something from this post about stats updates.

It’s a quick post, because uh… Well. Pick a reason.

Get In Gear

Follow along as I repeat all the steps in the linked post to:

Load > 2 billion rows into a table
Create a stats object on every column
Load enough new data to trigger a stats refresh
Query the table to trigger the stats refresh

Except this time, I’m adding a mAxDoP 1 hint to it:

SELECT COUNT(*)
FROM dbo.Vetos
WHERE UserId = 138
AND   PostId = 138
AND   BountyAmount = 138
AND   VoteTypeId = 138
AND   CreationDate = 138
OPTION(MAXDOP 1);

Here’s Where Things Get Interesting

Our MaXdOp 1 query registers nearly the same amount of time on stats updates and parallelism.

But our plan is indeed serial. Because we told it to be.

By setting maxDOP to 1.

Not Alone

So, if you’re out there in the world wondering why this crazy kinda thing goes down, here’s one explanation.

Are there others? Probably.

But you’ll have to find out by setting MAXdop to 1 on your own.

Thanks for reading!

Going Further

SQL Saturday NYC: Two Weeks To Go!

Posted on September 20, 2019May 16, 2022 by Erik Darling

I’m In Your Area

If you’re in the New York area and looking for some great free SQL Server training, head to SQL Saturday NYC.

We’ve got a great lineup of speakers covering a wide variety of topics that can help you learn no matter what your job function is.

And of course, if you’re looking for a full day of training, we’ve got a precon with Joe Obbish the Friday before the event (October 4th).

Joe Obbish – Clustered Columnstore For Performance

Buy Tickets Here!

Here’s what I’ll be talking about:

Buy Tickets Here!

Location:

Courtyard Times Square – 114 West 40th street NY, NY 10018
Meeting Room – Lower Level – Meeting Room A & B

I'm just a picture, don't click on me.

Going Further

Live SQL Server Q&A!

Posted on September 19, 2019May 16, 2022 by Erik Darling

ICYMI

Last week’s thrilling, stunning, flawless episode of whatever-you-wanna-call-it.