Parameter Sniffing Archives | Page 2 of 5

SQL Server 2022 Parameter Sensitive Plan Optimization: The Problem With Sniffed Parameter Sensitivity

Posted on June 27, 2022June 27, 2022 by Erik Darling

Long Time Coming

When Microsoft first started coming up with these Intelligent Query Processing features, I think everyone who cares about That Sort Of Thing© wondered when parameter sensitivity would get fixed.

Let’s take a brief moment to talk about terminology here, so you don’t go getting yourself all tied up in knots:

Parameter Sniffing: When the optimizer creates and caches a plan based on a set of parameter(s) for reuse
Parameter Sensitivity: When a cached plan for one set of parameter(s) is not a good plan for other sets of parameter(s)

The first one is a usually-good thing, because your SQL Server won’t spend a lot of time compiling plans constantly. This is obviously more important for OLTP workloads than for data warehouses.

This can pose problems in either type of environment when data is skewed towards one or more values, because queries that need to process a lot of rows typically need a different execution plan strategy than queries processing a small number of rows.

This seems a good fit for the Intelligent Query Processing family of SQL Server features, because fixing it sometimes requires a certain level of dynamism.

Choice 2 Choice

The reason this sort of thing can happen often comes down to indexing. That’s obviously not the only thing. Even a perfect index won’t make nested loops more efficient than a hash join (and vice versa) under the right circumstances.

Probably the most classic parameter sensitivity issue, and why folks spend a long time trying to fix them, is the also-much-maligned Lookup.

But consider the many other things that might happen in a query plan that will hamper performance.

Join type
Join order
Memory grants
Parallelism
Aggregate type
Sort/Sort Placement
Batch Mode

The mind boggles at all the possibilities. This doesn’t even get into all the wacky and wild things that can mess SQL Server’s cost-based optimizer up a long the way.

Table variables
Local variables
Optimize for unknown
Non-SARGable predicates
Wrong cardinality estimation model
Row Goals
Out of date statistics

The mind also boggles here. Anyway, I’ve written quite a bit about parameter sensitivity in the past, so I’m going to link you to the relevant post tag for those.

Unlearn

With SQL Server 2022, we’ve finally got a starting point for resolving this issue.

In tomorrow’s post, we’ll talk a bit about how this new feature works to help with your parameter sensitivity issues, which are issues.

Not your parameter sniffing issues, which are not issues.

For the rest of the week, I’m going to dig deeper into some of the stuff that the documentation glosses over, where it helps, and show you a situation where it should kick in and help but doesn’t.

Keep in mind that these are early thoughts, and I expect things to evolve both as RTM season approaches, and as Cumulative Updates are released for SQL Server 2022.

Remember scalar UDF inlining? That thing morphed quite a bit.

Can’t wait for all of you to get on SQL Server 2019 and experience it.

Thanks for reading!

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

SQL Server 2022 Is Going To Mess Up Your Query Monitoring Scripts

Posted on June 14, 2022August 1, 2022 by Erik Darling

At Least For Now

SQL Server 2022 has a new feature in it to help with parameter sensitive query plans.

That is great. Parameter sensitivity, sometimes just called parameter sniffing, can be a real bear to track down, reproduce, and fix.

In a lot of the client work I do, I end up using dynamic SQL like this to get things to behave:

But with this new feature, you get some of the same fixes without having to interfere with the query at all.

How It Works

You can read the full documentation here. But you don’t read the documentation, and the docs are missing some details at the moment anyway.

It only works on equality predicates right now
It only works on one predicate per query
It only gives you three query plan choices, based on stats buckets

There’s also some additional notes in the docs that I’m going to reproduce here, because this is where you’re gonna get tripped up, if your scripts associate statements in the case with calling stored procedures, or using object identifiers from Query Store.

For each query variant mapping to a given dispatcher:

The query_plan_hash is unique. This column is available in sys.dm_exec_query_stats, and other Dynamic Management Views and catalog tables.
The plan_handle is unique. This column is available in sys.dm_exec_query_stats, sys.dm_exec_sql_text, sys.dm_exec_cached_plans, and in other Dynamic Management Views and Functions, and catalog tables.
The query_hash is common to other variants mapping to the same dispatcher, so it’s possible to determine aggregate resource usage for queries that differ only by input parameter values. This column is available in sys.dm_exec_query_stats, sys.query_store_query, and other Dynamic Management Views and catalog tables.
The sql_handle is unique due to special PSP optimization identifiers being added to the query text during compilation. This column is available in sys.dm_exec_query_stats, sys.dm_exec_sql_text, sys.dm_exec_cached_plans, and in other Dynamic Management Views and Functions, and catalog tables. The same handle information is available in the Query Store as the last_compile_batch_sql_handle column in the sys.query_store_query catalog table.
The query_id is unique in the Query Store. This column is available in sys.query_store_query, and other Query Store catalog tables.

The problem is that, sort of like dynamic SQL, this makes each different plan/statement impossible to tie back to the procedure.

What I’ve Tried

Here’s a procedure that is eligible for parameter sensitivity training:

CREATE OR ALTER PROCEDURE

dbo.SQL2022

(

@ParentId int

)

BEGIN

SET NOCOUNT, XACT_ABORT ON;

SELECT TOP (10)

u.DisplayName,

p.*

FROM dbo.Posts AS p

JOIN dbo.Users AS u

ON p.OwnerUserId = u.Id

WHERE p.ParentId = @ParentId

ORDER BY u.Reputation DESC;

END;

CREATE OR ALTER PROCEDURE dbo.SQL2022 ( @ParentId int ) AS BEGIN SET NOCOUNT, XACT_ABORT ON; SELECT TOP (10) u.DisplayName, p.* FROM dbo.Posts AS p JOIN dbo.Users AS u ON p.OwnerUserId = u.Id WHERE p.ParentId = @ParentId ORDER BY u.Reputation DESC; END; GO

CREATE OR ALTER PROCEDURE 
    dbo.SQL2022
(
    @ParentId int
)
AS
BEGIN
SET NOCOUNT, XACT_ABORT ON;

    SELECT TOP (10) 
        u.DisplayName, 
        p.*
    FROM dbo.Posts AS p
    JOIN dbo.Users AS u
        ON p.OwnerUserId = u.Id
    WHERE p.ParentId = @ParentId
    ORDER BY u.Reputation DESC;

END;
GO

Here’s the cool part! If I run this stored procedure back to back like so, I’ll get two different query plans without recompiling or writing dynamic SQL, or anything else:

EXEC dbo.SQL2022

@ParentId = 184618;

EXEC dbo.SQL2022

@ParentId = 0;

EXEC dbo.SQL2022 @ParentId = 184618; GO EXEC dbo.SQL2022 @ParentId = 0; GO

EXEC dbo.SQL2022
    @ParentId = 184618;
GO 

EXEC dbo.SQL2022 
    @ParentId = 0;
GO

It happens because the queries look like this under the covers:

SELECT TOP (10)

u.DisplayName,

p.*

FROM dbo.Posts AS p

JOIN dbo.Users AS u

ON p.OwnerUserId = u.Id

WHERE p.ParentId = @ParentId

ORDER BY u.Reputation DESC

OPTION (PLAN PER VALUE(QueryVariantID = 1, predicate_range([StackOverflow2010].[dbo].[Posts].[ParentId] = @ParentId, 100.0, 1000000.0)))

SELECT TOP (10)

u.DisplayName,

p.*

FROM dbo.Posts AS p

JOIN dbo.Users AS u

ON p.OwnerUserId = u.Id

WHERE p.ParentId = @ParentId

ORDER BY u.Reputation DESC

OPTION (PLAN PER VALUE(QueryVariantID = 3, predicate_range([StackOverflow2010].[dbo].[Posts].[ParentId] = @ParentId, 100.0, 1000000.0)))

SELECT TOP (10) u.DisplayName, p.* FROM dbo.Posts AS p JOIN dbo.Users AS u ON p.OwnerUserId = u.Id WHERE p.ParentId = @ParentId ORDER BY u.Reputation DESC OPTION (PLAN PER VALUE(QueryVariantID = 1, predicate_range([StackOverflow2010].[dbo].[Posts].[ParentId] = @ParentId, 100.0, 1000000.0))) SELECT TOP (10) u.DisplayName, p.* FROM dbo.Posts AS p JOIN dbo.Users AS u ON p.OwnerUserId = u.Id WHERE p.ParentId = @ParentId ORDER BY u.Reputation DESC OPTION (PLAN PER VALUE(QueryVariantID = 3, predicate_range([StackOverflow2010].[dbo].[Posts].[ParentId] = @ParentId, 100.0, 1000000.0)))

SELECT TOP (10) 
    u.DisplayName, 
    p.*
FROM dbo.Posts AS p
JOIN dbo.Users AS u
    ON p.OwnerUserId = u.Id
WHERE p.ParentId = @ParentId
ORDER BY u.Reputation DESC 
OPTION (PLAN PER VALUE(QueryVariantID = 1, predicate_range([StackOverflow2010].[dbo].[Posts].[ParentId] = @ParentId, 100.0, 1000000.0)))

SELECT TOP (10) 
    u.DisplayName, 
    p.*
FROM dbo.Posts AS p
JOIN dbo.Users AS u
    ON p.OwnerUserId = u.Id
WHERE p.ParentId = @ParentId
ORDER BY u.Reputation DESC 
OPTION (PLAN PER VALUE(QueryVariantID = 3, predicate_range([StackOverflow2010].[dbo].[Posts].[ParentId] = @ParentId, 100.0, 1000000.0)))

Where Things Break Down

Normally, sp_BlitzCache will go through whatever statements it picks up and associate them with the parent object:

EXEC sp_BlitzCache

@DatabaseName = 'StackOverflow2010';

EXEC sp_BlitzCache @DatabaseName = 'StackOverflow2010';

EXEC sp_BlitzCache
    @DatabaseName = 'StackOverflow2010';

But it doesn’t do that here, it just says that they’re regular ol’ statements:

SQL Server Query Results — do i know you?

The way that it attempts to identify queries belonging to objects is like so:

RAISERROR(N'Attempting to get stored procedure name for individual statements', 0, 1) WITH NOWAIT;

UPDATE p

SET QueryType = QueryType + ' (parent ' +

+ QUOTENAME(OBJECT_SCHEMA_NAME(s.object_id, s.database_id))

+ '.'

+ QUOTENAME(OBJECT_NAME(s.object_id, s.database_id)) + ')'

FROM ##BlitzCacheProcs p

JOIN sys.dm_exec_procedure_stats s ON p.SqlHandle = s.sql_handle

WHERE QueryType = 'Statement'

AND SPID = @@SPID

OPTION (RECOMPILE);

RAISERROR(N'Attempting to get stored procedure name for individual statements', 0, 1) WITH NOWAIT; UPDATE p SET QueryType = QueryType + ' (parent ' + + QUOTENAME(OBJECT_SCHEMA_NAME(s.object_id, s.database_id)) + '.' + QUOTENAME(OBJECT_NAME(s.object_id, s.database_id)) + ')' FROM ##BlitzCacheProcs p JOIN sys.dm_exec_procedure_stats s ON p.SqlHandle = s.sql_handle WHERE QueryType = 'Statement' AND SPID = @@SPID OPTION (RECOMPILE);

RAISERROR(N'Attempting to get stored procedure name for individual statements', 0, 1) WITH NOWAIT;
UPDATE  p
SET     QueryType = QueryType + ' (parent ' +
                    + QUOTENAME(OBJECT_SCHEMA_NAME(s.object_id, s.database_id))
                    + '.'
                    + QUOTENAME(OBJECT_NAME(s.object_id, s.database_id)) + ')'
FROM    ##BlitzCacheProcs p
        JOIN sys.dm_exec_procedure_stats s ON p.SqlHandle = s.sql_handle
WHERE   QueryType = 'Statement'
AND SPID = @@SPID
OPTION (RECOMPILE);

Since SQL handles no longer match, we’re screwed. I also looked into doing something like this, but there’s nothing here!

SELECT

p.plan_handle,

pa.attribute,

object_name =

OBJECT_NAME(CONVERT(int, pa.value)),

pa.value

FROM

(

SELECT 0x05000600B7F6C349E0824C498D02000001000000000000000000000000000000000000000000000000000000 --Proc plan handle

UNION ALL

SELECT 0x060006005859A71BB0304D498D02000001000000000000000000000000000000000000000000000000000000 --Query plan handle

UNION ALL

SELECT 0x06000600DCB1FC11A0224D498D02000001000000000000000000000000000000000000000000000000000000 --Query plan handle

) AS p (plan_handle)

CROSS APPLY sys.dm_exec_plan_attributes (p.plan_handle) AS pa

WHERE pa.attribute = 'objectid';

SELECT p.plan_handle, pa.attribute, object_name = OBJECT_NAME(CONVERT(int, pa.value)), pa.value FROM ( SELECT 0x05000600B7F6C349E0824C498D02000001000000000000000000000000000000000000000000000000000000 --Proc plan handle UNION ALL SELECT 0x060006005859A71BB0304D498D02000001000000000000000000000000000000000000000000000000000000 --Query plan handle UNION ALL SELECT 0x06000600DCB1FC11A0224D498D02000001000000000000000000000000000000000000000000000000000000 --Query plan handle ) AS p (plan_handle) CROSS APPLY sys.dm_exec_plan_attributes (p.plan_handle) AS pa WHERE pa.attribute = 'objectid';

SELECT 
    p.plan_handle, 
    pa.attribute, 
    object_name = 
        OBJECT_NAME(CONVERT(int, pa.value)),
    pa.value
FROM
(
    SELECT 0x05000600B7F6C349E0824C498D02000001000000000000000000000000000000000000000000000000000000 --Proc plan handle
    UNION ALL 
    SELECT 0x060006005859A71BB0304D498D02000001000000000000000000000000000000000000000000000000000000 --Query plan handle
    UNION ALL
    SELECT 0x06000600DCB1FC11A0224D498D02000001000000000000000000000000000000000000000000000000000000 --Query plan handle
) AS p (plan_handle)
CROSS APPLY sys.dm_exec_plan_attributes (p.plan_handle) AS pa
WHERE pa.attribute = 'objectid';

The object identifiers are all amok:

Only the stored procedure has the correct one.

The same thing happens in Query Store, too:

EXEC sp_QuickieStore

@debug = 1;

EXEC sp_QuickieStore @debug = 1;

EXEC sp_QuickieStore
    @debug = 1;

SQL Server Query Result — lost in translation

The object identifiers are 0 for these two queries.

One Giant Leap

This isn’t a complaint as much as it is a warning. If you’re a monitoring tool vendor, script writer, or script relier, this is gonna make things harder for you.

Perhaps it’s something that can or will be fixed in a future build, but I have no idea at all what’s going to happen with it.

Maybe we’ll have to figure out a different way to do the association, but stored procedures don’t get query hashes or query plan hashes, only the queries inside it do.

This is gonna be a tough one!

Thanks for reading!

Going Further

Software Vendor Mistakes With SQL Server: Dealing With Bad Parameter Sniffing

Posted on February 11, 2022May 16, 2022 by Erik Darling

Bad Wrap

When people hear the words “parameter sniffing”, there’s almost a universally bad reaction. They lose their minds and start raving about how to “fix” it.

Recompiling everything
Using optimize for unknown hints
Using local variables
Clearing the plan cache

In today’s post, I’m gonna show you two videos from my paid training:

Intro to parameter sniffing
Parameter sniffing recap

Those explain why parameter sniffing is so tough to deal with, and why all the stuff up there in that list isn’t really the greatest idea.

There’s a whole bunch of stuff in between those two videos where I’ll teach you specifics about fixing parameter sniffing problems.

If that’s the content you’re after, hit the link at the very end of the post for 75% off my entire training catalog.

Intro To Parameter Sniffing

Parameter Sniffing Recap

Thanks for reading!

Going Further

How Avoiding Normalization Contributes To Skewed Data In SQL Server Tables

Posted on October 28, 2021June 17, 2022 by Erik Darling

One Who Overflew

In the Stack Overflow database, the biggest (and probably most important) table is Posts.

The Comments table should be truncated hourly. Comments were a mistake.

But the Posts table suffers from a serious design flaw in the public data dump: Questions and Answers are in the same table.

I’ve heard that it’s worse behind the scenes, but I don’t have any additional details on that.

Aspiring Aspirin

This ends up with some weird data distributions. Certain attributes can only ever be “true” for a question or an answer.

For example, only questions can have a non-zero AcceptedAnswerId, or AnswerCount. Some questions might have a ClosedDate, or a FavoriteCount, too. In the same way, only answers can have a ParentId. This ends up with some really weird patterns in the data.

Was it easier at first to design things this way? Probably. But introducing skew like this only makes dealing with parameter sniffing issues worse.

Even though questions and answers are the most common types of Posts, they’re not the only types of Posts. Even if you make people specify a type along with other things they’re searching for, you can end up with some really different query plans.

Designer Drugs

When you’re designing tables, try to keep this sort of stuff in mind. It might not be a big deal for small tables, but once you realize your data is getting big, it might be too late to make the change. It’s not just a matter of changes to the database, but the application, too.

Late stage redesigns often lead to the LET’S JUST REWRITE THE WHOLE APPLICATION FROM THE GROUND UP projects that take years and usually never make it.

Thanks for reading!

Going Further

How OPTIMIZE FOR UNKNOWN Makes Troubleshooting SQL Server Performance Problems Harder

Posted on October 8, 2021May 16, 2022 by Erik Darling

Detained

Despite the many metric tons of blog posts warning people about this stuff, I still see many local variables and optimize for unknown hints. As a solution to parameter sniffing, it’s probably the best choice 1/1000th of the time. I still end up having to fix the other 999/1000 times, though.

In this post, I want to show you how using either optimize for unknown or local variables makes my job — and the job of anyone trying to fix this stuff — harder than it should be.

Passenger

Like most things, we’re going to start with an index:

CREATE INDEX r ON dbo.Users(Reputation);

CREATE INDEX r ON dbo.Users(Reputation); GO

CREATE INDEX r ON dbo.Users(Reputation);
GO

I’m going to have a stored procedure that uses three different ways to pass a value to a where clause:

CREATE OR ALTER PROCEDURE

dbo.u

(

@r int,

@u int

)

BEGIN

/* Regular parameter */

SELECT

c = COUNT_BIG(*)

FROM dbo.Users AS u

WHERE u.Reputation = @r

AND u.UpVotes = @u;

/* Someone who saw someone else do it at their last job */

DECLARE

@LookMom int = @r,

@IDidItAgain int = @u;

SELECT

c = COUNT_BIG(*)

FROM dbo.Users AS u

WHERE u.Reputation = @LookMom

AND u.UpVotes = @IDidItAgain;

/* Someone who read the blog post URL wrong */

SELECT

c = COUNT_BIG(*)

FROM dbo.Users AS u

WHERE u.Reputation = @r

AND u.UpVotes = @u

OPTION(OPTIMIZE FOR UNKNOWN);

END;

CREATE OR ALTER PROCEDURE dbo.u ( @r int, @u int ) AS BEGIN /* Regular parameter */ SELECT c = COUNT_BIG(*) FROM dbo.Users AS u WHERE u.Reputation = @r AND u.UpVotes = @u; /* Someone who saw someone else do it at their last job */ DECLARE @LookMom int = @r, @IDidItAgain int = @u; SELECT c = COUNT_BIG(*) FROM dbo.Users AS u WHERE u.Reputation = @LookMom AND u.UpVotes = @IDidItAgain; /* Someone who read the blog post URL wrong */ SELECT c = COUNT_BIG(*) FROM dbo.Users AS u WHERE u.Reputation = @r AND u.UpVotes = @u OPTION(OPTIMIZE FOR UNKNOWN); END; GO

CREATE OR ALTER PROCEDURE 
    dbo.u 
(
    @r int, 
    @u int
)
AS
BEGIN

    /* Regular parameter */
    SELECT
        c = COUNT_BIG(*)
    FROM dbo.Users AS u
    WHERE u.Reputation = @r
    AND   u.UpVotes = @u;

    /* Someone who saw someone else do it at their last job */
    DECLARE 
        @LookMom int = @r,
        @IDidItAgain int = @u;
    
    SELECT
        c = COUNT_BIG(*)
    FROM dbo.Users AS u
    WHERE u.Reputation = @LookMom
    AND   u.UpVotes = @IDidItAgain;

    /* Someone who read the blog post URL wrong */
    SELECT
        c = COUNT_BIG(*)
    FROM dbo.Users AS u
    WHERE u.Reputation = @r
    AND   u.UpVotes = @u
    OPTION(OPTIMIZE FOR UNKNOWN);

END;
GO

First Way

The best case is we run this for a small number of rows, and no one really notices. Even though we get bad guesses for the second two queries, it’s not a huge deal.

When you run procedures like this, SQL Server doesn’t cache the compile time values the same way it does when you use parameters. Granted, this is because it technically shouldn’t matter, but if you’re looking for a way to execute the procedure again to reproduce the issue, it’s up to you to go figure out what someone did.

Since I’m getting the actual plans here, I get the runtime values for both, but those don’t show up in the plan cache or query store version of plans.

That’s typically a huge blind spot when you’re trying to fix performance issues of any kind, but it’s up to you to capture that stuff.

Just, you know, good luck doing it in a way that doesn’t squash performance.

Second Way

In this example, our index is only on the Reputation column, but our where clause is also on the UpVotes column.

In nearly every situations, it’s better to have your query do all the filtering it can from one index source — there are obviously exceptions — but the point here is that the optimizer doesn’t bother with a missing index request for the second two queries, only for the first one.

That doesn’t matter a ton if you’re looking at the query and plan right in front of you, but if you’re also using the missing index DMVs to get some idea about how useful overall a new index might be, you’re out of luck.

In this case, the optimizer doesn’t think the second two plans are costly enough to warrant anything, but it does for the first plan.

I’m not saying that queries with local variables or optimize for unknown hints always do this, or that parameterized plans will always ask for (good) indexes. There are many issues with costing and SARGability that can prevent them from showing up, including getting a trivial plan.

This is just a good example of how Doing Goofy Things™ can backfire on you.

Thanks for reading!

Going Further

Parameter Sniffing Is Usually A Good Thing In SQL Server

Posted on February 19, 2021May 16, 2022 by Erik Darling

Tick Tock

I talk to a lot of people about performance tuning. It seems like once someone is close enough to a database for long enough, they’ll have some impression of parameter sniffing. Usually a bad one.

You start to hear some funny stuff over and over again:

We should always recompile
We should always use local variables
We should always recompile and use local variables

Often, even if it means writing unsafe dynamic SQL, people will be afraid to parameterize things.

Between Friends

To some degree, I get it. You’re afraid of incurring some new performance problem.

You’ve had the same mediocre performance for years, and you don’t wanna make something worse.

The thing is, you could be making things a lot better most of the time.

Fewer compiles and recompiles, fewer single-use plans, fewer queries with multiple plans
Avoiding the local variable nonsense is, more often than not, going to get you better performance

A Letter To You

I’m going to tell you something that you’re not going to like, here.

Most of the time when I see a parameter sniffing problem, I see a lot of other problems.

Shabbily written queries, obvious missing indexes, and a whole list of other things.

It’s not that you have a parameter sniffing problem, you have a general negligence problem.

After all, the bad kind of parameter sniffing means that you’ve got variations of a query plan that don’t perform well on variations of parameters.

Once you start taking care of the basics, you’ll find a whole lot less of the problems that keep you up at night.

If that’s the kind of thing you need help with, drop me a line.

Thanks for reading!

Going Further

Recompile And Nested Procedures In SQL Server

Posted on January 28, 2021May 16, 2022 by Erik Darling

Rock Sale

While I was answering a question, I had to revisit what happens when using different flavors of recompile hints with stored procedure when they call inner stored procedures. I like when this happens, because there are so many little details I forget.

Anyway, the TL;DR is that if you have nested stored procedures, recompiling only recompiles the outer one. The inner procedures — really, I should say modules, because it includes other objects that compile query plans — but hey. Now you know what I should have said.

If you want to play around with the tests, you’ll need to grab sp_BlitzCache. I’m too lazy to write plan cache queries from scratch.

Testament

The procedures:

CREATE OR ALTER PROCEDURE dbo.inner_sp

BEGIN

SELECT

COUNT_BIG(*) AS records

FROM sys.master_files AS mf;

END;

CREATE OR ALTER PROCEDURE dbo.outer_sp

--WITH RECOMPILE /*toggle this to see different behavior*/

BEGIN

SELECT

COUNT_BIG(*) AS records

FROM sys.databases AS d;

EXEC dbo.inner_sp;

END;

CREATE OR ALTER PROCEDURE dbo.inner_sp AS BEGIN SELECT COUNT_BIG(*) AS records FROM sys.master_files AS mf; END; GO CREATE OR ALTER PROCEDURE dbo.outer_sp --WITH RECOMPILE /*toggle this to see different behavior*/ AS BEGIN SELECT COUNT_BIG(*) AS records FROM sys.databases AS d; EXEC dbo.inner_sp; END; GO

CREATE OR ALTER PROCEDURE dbo.inner_sp
AS
BEGIN

    SELECT
        COUNT_BIG(*) AS records
    FROM sys.master_files AS mf;
END;
GO 

CREATE OR ALTER PROCEDURE dbo.outer_sp
--WITH RECOMPILE /*toggle this to see different behavior*/
AS
BEGIN

    SELECT 
        COUNT_BIG(*) AS records
    FROM sys.databases AS d;
    
    EXEC dbo.inner_sp;

END;
GO

The tests:

--It's helpful to run this before each test to clear out clutter

DBCC FREEPROCCACHE;

--Look at this with and without

--WITH RECOMPILE in the procedure definition

EXEC dbo.outer_sp;

--Take out the proc-level recompile and run this

EXEC dbo.outer_sp WITH RECOMPILE;

--Take out the proc-level recompile and run this

EXEC sp_recompile 'dbo.outer_sp';

EXEC dbo.outer_sp;

--You should run these between each test to verify behavior

--If you just run them here at the end, you'll be disappointed

EXEC sp_BlitzCache

@DatabaseName = 'Crap',

@QueryFilter = 'procedure',

@SkipAnalysis = 1,

@HideSummary = 1;

EXEC sp_BlitzCache

@DatabaseName = 'Crap',

@QueryFilter = 'statement',

@SkipAnalysis = 1,

@HideSummary = 1;

--It's helpful to run this before each test to clear out clutter DBCC FREEPROCCACHE; --Look at this with and without --WITH RECOMPILE in the procedure definition EXEC dbo.outer_sp; --Take out the proc-level recompile and run this EXEC dbo.outer_sp WITH RECOMPILE; --Take out the proc-level recompile and run this EXEC sp_recompile 'dbo.outer_sp'; EXEC dbo.outer_sp; --You should run these between each test to verify behavior --If you just run them here at the end, you'll be disappointed EXEC sp_BlitzCache @DatabaseName = 'Crap', @QueryFilter = 'procedure', @SkipAnalysis = 1, @HideSummary = 1; EXEC sp_BlitzCache @DatabaseName = 'Crap', @QueryFilter = 'statement', @SkipAnalysis = 1, @HideSummary = 1;

--It's helpful to run this before each test to clear out clutter
DBCC FREEPROCCACHE;

--Look at this with and without 
--WITH RECOMPILE in the procedure definition
EXEC dbo.outer_sp;

--Take out the proc-level recompile and run this
EXEC dbo.outer_sp WITH RECOMPILE;

--Take out the proc-level recompile and run this
EXEC sp_recompile 'dbo.outer_sp';
EXEC dbo.outer_sp;

--You should run these between each test to verify behavior
--If you just run them here at the end, you'll be disappointed
EXEC sp_BlitzCache 
    @DatabaseName = 'Crap', 
    @QueryFilter = 'procedure', 
    @SkipAnalysis = 1, 
    @HideSummary = 1;

EXEC sp_BlitzCache 
    @DatabaseName = 'Crap', 
    @QueryFilter = 'statement', 
    @SkipAnalysis = 1, 
    @HideSummary = 1;

Whatchalookinat?

After each of these where a recompile is applied, you should see the inner proc/statement in the BlitzCache results, but not the outer procedure.

It’s important to understand behavior like this, because recompile hints are most often used to help investigate parameter sniffing issues. If it’s taking place in nested stored procedure calls, you may find yourself with a bunch of extra work to do or needing to re-focus your use of recompile hints.

Of course, this is why I much prefer option recompile hints on problem statements. You get much more reliable behavior.

And, as Paul writes:

For instances running at least SQL Server 2008 build 2746 (Service Pack 1 with Cumulative Update 5), using OPTION (RECOMPILE) has another significant advantage over WITH RECOMPILE: Only OPTION (RECOMPILE) enables the Parameter Embedding Optimization.

Thanks for reading!

Going Further

Defeating Parameter Sniffing With Dynamic SQL In SQL Server

Posted on December 24, 2020May 15, 2022 by Erik Darling

Enjoy!

Thanks for watching!

Going Further

A Suggestion To Make Key Lookups Less Of A Performance Problem In SQL Server

Posted on December 18, 2020May 15, 2022 by Erik Darling

Odor Of Gas

One problem with Lookups, aside from the usual complaints, is that the optimizer has no options for when the lookup happens.

If the optimizer decides to use a nonclustered index to satisfy some part of the query, but the nonclustered index doesn’t have all of the columns needed to cover what the query is asking for, it has to do a lookup.

Whether the lookup is Key or RID depends on if the table has a clustered index, but that’s not entirely the point.

The point is that there’s no way for the optimizer to decide to defer the lookup until later in the plan, when it might be more opportune.

Gastric Acid

Let’s take one index, and two queries.

CREATE INDEX p

ON dbo.Posts(PostTypeId, Score, CreationDate)

INCLUDE(OwnerUserId);

CREATE INDEX p ON dbo.Posts(PostTypeId, Score, CreationDate) INCLUDE(OwnerUserId);

CREATE INDEX p
    ON dbo.Posts(PostTypeId, Score, CreationDate)
    INCLUDE(OwnerUserId);

Stop being gross.

SELECT TOP (1000)

u.DisplayName,

p.*

FROM dbo.Posts AS p

JOIN dbo.Users AS u

ON p.OwnerUserId = u.Id

WHERE p.PostTypeId = 1

AND p.Score > 5

ORDER BY p.CreationDate DESC;

SELECT TOP (1000)

u.DisplayName,

p.*

FROM dbo.Posts AS p

JOIN dbo.Users AS u

ON p.OwnerUserId = u.Id

WHERE p.PostTypeId = 1

AND p.Score > 6

ORDER BY p.CreationDate DESC;

SELECT TOP (1000) u.DisplayName, p.* FROM dbo.Posts AS p JOIN dbo.Users AS u ON p.OwnerUserId = u.Id WHERE p.PostTypeId = 1 AND p.Score > 5 ORDER BY p.CreationDate DESC; SELECT TOP (1000) u.DisplayName, p.* FROM dbo.Posts AS p JOIN dbo.Users AS u ON p.OwnerUserId = u.Id WHERE p.PostTypeId = 1 AND p.Score > 6 ORDER BY p.CreationDate DESC;

SELECT TOP (1000)
    u.DisplayName,
    p.*
FROM dbo.Posts AS p
JOIN dbo.Users AS u
    ON p.OwnerUserId = u.Id
WHERE p.PostTypeId = 1
AND   p.Score > 5
ORDER BY p.CreationDate DESC;

SELECT TOP (1000)
    u.DisplayName,
    p.*
FROM dbo.Posts AS p
JOIN dbo.Users AS u
    ON p.OwnerUserId = u.Id
WHERE p.PostTypeId = 1
AND   p.Score > 6
ORDER BY p.CreationDate DESC;

The main point here is not that the lookup is bad; it’s actually good, and I wish both queries would use one.

If we hint the first query to use the nonclustered index, things turn out better.

SELECT TOP (1000)

u.DisplayName,

p.*

FROM dbo.Posts AS p WITH(INDEX = p)

JOIN dbo.Users AS u

ON p.OwnerUserId = u.Id

WHERE p.PostTypeId = 1

AND p.Score > 5

ORDER BY p.CreationDate DESC;

SELECT TOP (1000) u.DisplayName, p.* FROM dbo.Posts AS p WITH(INDEX = p) JOIN dbo.Users AS u ON p.OwnerUserId = u.Id WHERE p.PostTypeId = 1 AND p.Score > 5 ORDER BY p.CreationDate DESC;

SELECT TOP (1000)
    u.DisplayName,
    p.*
FROM dbo.Posts AS p WITH(INDEX = p)
JOIN dbo.Users AS u
    ON p.OwnerUserId = u.Id
WHERE p.PostTypeId = 1
AND   p.Score > 5
ORDER BY p.CreationDate DESC;

Running a full second faster seems like a good thing to me, but there’s a problem.

Ingest

Whether we use the lookup or scan the clustered index, all of these queries ask for rather large memory grants, between 5.5 and 6.5 GB

SQL Server Query Plan Tool Tips — bigsort4u

The operator asking for memory is the Sort — and while I’d love it if we could index for every sort — it’s just not practical.

So like obviously changing optimizer behavior is way more practical. Ahem.

The reason that the Sort asks for so much memory in each of these cases is that it’s forced to order the entire select output from the Posts table by the CreationDate column.

Detach

If we rewrite the query a bit, we can get the optimizer to sort data long before we go get all the output columns:

SELECT TOP (1000)

u.DisplayName,

p2.*

FROM dbo.Posts AS p

JOIN dbo.Posts AS p2

ON p.Id = p2.Id

JOIN dbo.Users AS u

ON p.OwnerUserId = u.Id

WHERE p.PostTypeId = 1

AND p.Score > 5

ORDER BY p.CreationDate DESC;

SELECT TOP (1000)

u.DisplayName,

p2.*

FROM dbo.Posts AS p

JOIN dbo.Posts AS p2

ON p.Id = p2.Id

JOIN dbo.Users AS u

ON p.OwnerUserId = u.Id

WHERE p.PostTypeId = 1

AND p.Score > 6

ORDER BY p.CreationDate DESC;

SELECT TOP (1000) u.DisplayName, p2.* FROM dbo.Posts AS p JOIN dbo.Posts AS p2 ON p.Id = p2.Id JOIN dbo.Users AS u ON p.OwnerUserId = u.Id WHERE p.PostTypeId = 1 AND p.Score > 5 ORDER BY p.CreationDate DESC; SELECT TOP (1000) u.DisplayName, p2.* FROM dbo.Posts AS p JOIN dbo.Posts AS p2 ON p.Id = p2.Id JOIN dbo.Users AS u ON p.OwnerUserId = u.Id WHERE p.PostTypeId = 1 AND p.Score > 6 ORDER BY p.CreationDate DESC;

SELECT TOP (1000)
    u.DisplayName,
    p2.*
FROM dbo.Posts AS p
JOIN dbo.Posts AS p2
    ON p.Id = p2.Id
JOIN dbo.Users AS u
    ON p.OwnerUserId = u.Id
WHERE p.PostTypeId = 1
AND   p.Score > 5
ORDER BY p.CreationDate DESC;

SELECT TOP (1000)
    u.DisplayName,
    p2.*
FROM dbo.Posts AS p
JOIN dbo.Posts AS p2
    ON p.Id = p2.Id
JOIN dbo.Users AS u
    ON p.OwnerUserId = u.Id
WHERE p.PostTypeId = 1
AND   p.Score > 6
ORDER BY p.CreationDate DESC;

In both cases, we get the same query plan shape, which is what we’re after:

Seek into the nonclustered index on Posts
Sort data by CreationDate
Join Posts to Users first
Join back to Posts for the select list columns

Because the Sort happens far earlier on in the plan, there’s less of a memory grant needed, and by quite a stretch from the 5+ GB before.

Thanks for reading!

Going Further

A Parameterization Puzzle With TOP Follow-Up

Posted on December 16, 2020May 15, 2022 by Erik Darling

Spell It Out

Back in October, I had written a couple posts about how parameterizing TOP can cause performance issues:

Anyway, I got back to thinking about it recently because a couple things had jogged in my foggy brain around table valued functions and parameter sniffing.

Go figure.

Reading Rainbow

One technique you could use to avoid this would be to use an inline table valued function, like so:

CREATE OR ALTER FUNCTION dbo.TopParam(@Top bigint)

RETURNS TABLE

WITH SCHEMABINDING

RETURN

SELECT TOP (@Top)

u.DisplayName,

b.Name

FROM dbo.Users AS u

CROSS APPLY

(

SELECT TOP (1)

b.Name

FROM dbo.Badges AS b

WHERE b.UserId = u.Id

ORDER BY b.Date DESC

) AS b

WHERE u.Reputation > 10000

ORDER BY u.Reputation DESC;

CREATE OR ALTER FUNCTION dbo.TopParam(@Top bigint) RETURNS TABLE WITH SCHEMABINDING AS RETURN SELECT TOP (@Top) u.DisplayName, b.Name FROM dbo.Users AS u CROSS APPLY ( SELECT TOP (1) b.Name FROM dbo.Badges AS b WHERE b.UserId = u.Id ORDER BY b.Date DESC ) AS b WHERE u.Reputation > 10000 ORDER BY u.Reputation DESC; GO

CREATE OR ALTER FUNCTION dbo.TopParam(@Top bigint)
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN
SELECT TOP (@Top)
    u.DisplayName,
    b.Name
FROM dbo.Users AS u
CROSS APPLY
(
    SELECT TOP (1)
        b.Name
    FROM dbo.Badges AS b
    WHERE b.UserId = u.Id
    ORDER BY b.Date DESC
) AS b
WHERE u.Reputation > 10000
ORDER BY u.Reputation DESC;
GO

When we select from the function, the top parameter is interpreted as a literal.

SELECT

tp.*

FROM dbo.TopParam(1) AS tp;

SELECT

tp.*

FROM dbo.TopParam(38) AS tp;

SELECT tp.* FROM dbo.TopParam(1) AS tp; SELECT tp.* FROM dbo.TopParam(38) AS tp;

SELECT 
    tp.*
FROM dbo.TopParam(1) AS tp;

SELECT 
    tp.*
FROM dbo.TopParam(38) AS tp;

Performance is “fine” for both in that neither one takes over a minute to run. Good good.

Departures

This is, of course, not what happens in a stored procedure or parameterized dynamic SQL.

EXEC dbo.ParameterTop @Top = 1;

EXEC dbo.ParameterTop @Top = 1;

Keen observers will note that this query runs for 1.2 seconds, just like the plan for the function above.

That is, of course, because this is the stored procedure’s first execution. The @Top parameter has been sniffed, and things have been optimized for the sniffed value.

If we turn around and execute it for 38 rows right after, we’ll get the “fine” performance noted above.

EXEC dbo.ParameterTop @Top = 38;

EXEC dbo.ParameterTop @Top = 38;

Looking at the plan in a slightly different way, here’s what the Top operator is telling us, along with what the compile and runtime values in the plan are:

It may make sense to make an effort to cache a plan with @Top = 1 initially to get the “fine” performance. That estimate is good enough to get us back to sending the buffers quickly.

Buggers

Unfortunately, putting the inline table valued function inside the stored procedure doesn’t offer us any benefit.

Without belaboring the point too much:

CREATE PROCEDURE dbo.ParameterTopItvf(@Top BIGINT)

BEGIN

SET NOCOUNT, XACT_ABORT ON;

SELECT

tp.*

FROM dbo.TopParam(@Top) AS tp;

END;

EXEC dbo.ParameterTopItvf @Top = 1;

EXEC dbo.ParameterTopItvf @Top = 38;

EXEC sp_recompile 'dbo.ParameterTopItvf';

EXEC dbo.ParameterTopItvf @Top = 38;

EXEC dbo.ParameterTopItvf @Top = 1;

CREATE PROCEDURE dbo.ParameterTopItvf(@Top BIGINT) AS BEGIN SET NOCOUNT, XACT_ABORT ON; SELECT tp.* FROM dbo.TopParam(@Top) AS tp; END; GO EXEC dbo.ParameterTopItvf @Top = 1; EXEC dbo.ParameterTopItvf @Top = 38; EXEC sp_recompile 'dbo.ParameterTopItvf'; EXEC dbo.ParameterTopItvf @Top = 38; EXEC dbo.ParameterTopItvf @Top = 1;

CREATE PROCEDURE dbo.ParameterTopItvf(@Top BIGINT)  
AS  
BEGIN  
    SET NOCOUNT, XACT_ABORT ON;  
  
    SELECT   
        tp.*  
    FROM dbo.TopParam(@Top) AS tp;  
  
END;  
GO 

EXEC dbo.ParameterTopItvf @Top = 1;

EXEC dbo.ParameterTopItvf @Top = 38;

EXEC sp_recompile 'dbo.ParameterTopItvf';

EXEC dbo.ParameterTopItvf @Top = 38;

EXEC dbo.ParameterTopItvf @Top = 1;

If we do this, running for 1 first gives us “fine” performance, but running for 38 first gives us the much worse performance.

Thanks for reading!