Yet Another Post About Local Variables In SQL Server

If I Ruled The World


Normally, I link people to this post by Kendra and this post by Paul when I need to point them to information about what goes wrong with local variables. They’re both quite good, but I wanted something a little more specific to the situation I normally see with people locally, along with some fixes.

First, some background:

In a stored procedure (and even in ad hoc queries or within dynamic SQL, like in the examples linked above), if you declare a variable within that code block and use it as a predicate later, you will get either a fixed guess for cardinality, or a less-confidence-inspiring estimate than when the histogram is used.

The local variable effect discussed in the rest of this post produces the same behavior as the OPTIMIZE FOR UNKNOWN hint, or executing queries with sp_prepare. I have that emphasized here because I don’t want to keep qualifying it throughout the post.

That estimate will be based on the number of rows in the table, and the “All Density” of the column multiplied together, for single equality predicates. The process for multiple predicates depends on which cardinality estimation model you’re using.

That can be viewed by using DBCC SHOW_STATISTICS.

CREATE INDEX flubber
    ON dbo.Posts(OwnerUserId);

DBCC SHOW_STATISTICS(Posts, flubber);
SQL Server Statistics
Injury

For example, this query using a single local variable with a single equality:

DECLARE @oui INT = 22656;
SELECT COUNT(*) FROM dbo.Posts AS p WHERE p.OwnerUserId = @oui;

Will get an estimate of 11.9-ish, despite 27,901 rows matching over here in reality.

SQL Server Query Plan Tool Tip
Poo

Which can be replicated like so, using the numbers from the screenshot up yonder.

SELECT (6.968291E-07 * 17142169) AS [?]

Several Different Levels

You can replicate the “All Density” calculation by doing this:

SELECT (1 / 
         CONVERT(FLOAT, COUNT(DISTINCT p.OwnerUserId))
         ) AS [All Density]
FROM Posts AS p
GO

Notice I didn’t call the estimate “bad”. Even though it often is quite bad, there are some columns where the distribution of values will be close enough to this estimate for it not to matter terribly for plan shape, index choice, and overall performance.

Don’t take this as carte blanche to use this technique; quite the opposite. If you’re going to use it, it needs careful testing across a variety of inputs.

Why? Because confidence in estimates decreases as they become based on less precise information.

In these estimates we can see a couple optimizer rules in action:

  • Inclusion: We assume the value is there — the alternative is ghastly
  • Uniformity: The data will have an even distribution of unique values

For ranges (>, >=, <, <=), LIKE, BETWEEN, and <>, there are different fixed guesses.

SQL Server Query Plans
Destined for Lateness

These numbers may change in the future, but up through 2019 this is what my testing resulted in.

Heck, maybe this behavior will be alterable in the future :^)

No Vector, No Estimate

A lot of people (myself included) will freely interchange “estimate” and “guess” when talking about this process. To the optimizer, there’s a big difference.

  • An estimate represents a process where math formulas with strange fonts that I don’t understand are used to calculate cardinality.
  • A guess represents a breakdown in that process, where the optimizer gives up, and a fixed number is used.

Say there’s no “density vector” available for the column used in an equality predicate. Maybe you have auto-create stats turned off, or stats created asynchronously is on for the first compilation.

You get a guess, not an estimate.

ALTER DATABASE StackOverflow2013 SET AUTO_CREATE_STATISTICS OFF;
GO 

DECLARE @oui INT = 22656;
SELECT COUNT(*) FROM dbo.Posts AS p WHERE p.OwnerUserId = @oui;
SELECT COUNT(*) FROM dbo.Posts AS p WHERE p.OwnerUserId = @oui OPTION(USE HINT('FORCE_LEGACY_CARDINALITY_ESTIMATION'));
GO 

ALTER DATABASE StackOverflow2013 SET AUTO_CREATE_STATISTICS ON;
GO

Using the new cardinality estimator (CE), which Microsoft has quite presumptuously started calling the Default CE, I get a guess of 4,140.

Using the legacy CE, which maybe I’ll start referring to as the Best CE, to match the presumptuousness of Microsoft, I get a guess of 266,409.

Though neither one is particularly close to the reality of 27,901 rows, we can’t expect a good guess because we’re effectively poking the optimizer in the eyeball by not allowing it to create statistics, and by using a local variable in our where clause.

These things would be our fault, regardless of the default-ness, or best-ness, of the estimation model.

If you’re keen on calculating these things yourself, you can do the following:

SELECT POWER(CONVERT(FLOAT, 17142169), 0.75) AS BEST_CE;
SELECT SQRT(CONVERT(FLOAT, 17142169)) AS default_ce_blah_whatever;

Other Guesses?

There may be additional circumstances where you’ll get a 10% cardinality estimate for equality predicates too, if you read this post by the lovely and talented Fabiano Amorim.

Take these two queries:

SELECT COUNT_BIG(*) 
FROM dbo.Posts AS p 
WHERE p.CreationDate = p.CommunityOwnedDate;

SELECT COUNT_BIG(*) 
FROM dbo.Posts AS p 
WHERE p.CreationDate = p.CommunityOwnedDate 
OPTION(USE HINT('FORCE_LEGACY_CARDINALITY_ESTIMATION'));

The so-called “default” CE thinks 1,714,220 rows will match for a column-equals-column comparison, and the “legacy” CE thinks 6.44248 rows will match, assuming that histograms are available for both of these queries.

How many actually match? 59,216.

I never said this was easy, HOWEVER!

Ahem.

The “legacy” CE estimate comes from advanced maths that only people who vape understand, while the so-called “default” CE just guesses ten percent, in true lazybones fashion. “You treat your stepmother with respect, Pantera!“, as a wise man once said.

Second, what we want to happen:

Code that uses literals, parameters, and other sniff-able forms of predicates use the statistics histogram, which typically has far more valuable information about data distribution for a column. No, they’re not always perfect, and sure, estimates can still be off if we use this, but that’s a chance I’m willing to take.

Even if they’re out of date. Maybe. Maybe not.

Look, just update those statistics.

SQL Server Statistics
American Histogram X

Like I mentioned before, these estimates typically have higher confidence levels because they’re often based on more precise details about the data.

If I had to rank them:

  1. Direct histogram step hits for an equality
  2. Intra-step hits for an equality
  3. Direct histogram step hits for a range
  4. Intra-step hits for a range
  5. Inequalities (not equals to)
  6. Joins
  7. 1000 other things
  8. All the goofy stuff you people do to make this more difficult, like wrapping columns in functions, mismatching data types, using local variables, etc.

Of course, parameterized code does open us up to parameter sniffing issues, which I’m not addressing in this post. My only goal here is to teach people how to get out of performance jams caused by local variables giving you bad-enough estimates. Ha ha ha.

Plus, there’s a lot of negativity out there already about parameter sniffing. A lot of the time it does pretty well, and we want it to happen.

Over-Under


The main issues with the local variable/density vector estimates is that they most often don’t align well with reality, and they’re almost certainly a knee-jerk reaction to a parameter sniffing problem, or done out of ignorance to the repercussions. It would be tedious to walk through all of the potential plan quality issues that could arise from doing this, though I did record a video about one of them here.

Instead of doing all that stuff, I’d rather walk through what works and what doesn’t when it comes to fixing the problem.

But first, what doesn’t work!

Temporary Objects Don’t Usually Work

If you put the value of the local variable in a #temp table, you can fall victim to statistics caching. If you use a @table variable, you don’t get any column-level statistics on what values go in there (even with a recompile hint or trace flag 2453, you only get table cardinality).

There may be some circumstances where a #temp table can help, or can get you a better plan, but they’re probably not my first stop on the list of fixes.

  • The #temp table will require a uniqueness constraint to work
  • This becomes more and more difficult if we have multiple local variables to account for
  • And if they have different data types, we need multiple #temp tables, or wide tables with a column and constraint per parameter

From there, we end up with difficulties linking those values in our query. Extra joins, subqueries, etc. all have potential consequences.

Inline Table Valued Functions Don’t Work

They’re a little too inline here, and they use the density vector estimate. See this gist for a demo.

Recompile Can Work, But Only Do It For Problem Statements

It has to be a statement-level recompile, using OPTION(RECOMPILE). Putting recompile as a stored procedure creation option will not allow for parameter embedding optimizations, i.e. WITH RECOMPILE.

SQL Server Query Plan Tool Tips
One of these things is not like the other.

The tool tip on the left is from a plan with a statement-level recompile. On the right is from a plan with a procedure-level recompile. In the statement-level recompile plan, we can see the scalar operator is a literal value. In the procedure-level recompile, we still see @ParentId passed in.

The difference is subtle, but exists. I prefer statement-level recompiles, because it’s unlikely that every statement in a procedure should or needs to be recompiled, unless it’s a monitoring proc or something else with no value to the plan cache.

Targeting specific statements is smarterer.

Erer.

A more detailed examination of this behavior is at Paul’s post, linked above.

Dynamic SQL Can Work

Depending on complexity, it may be more straight forward to use dynamic SQL as a receptacle for your variables-turned-parameters.

CREATE PROCEDURE dbo.game_time(@id INT) 
AS BEGIN 
    DECLARE @id_fix INT; 
    SET @id_fix = CASE WHEN @id < 0 THEN 1 ELSE @id END; 
    
    DECLARE @sql NVARCHAR(MAX) = N''; 
    SET @sql += N'SELECT COUNT(*) FROM dbo.Posts AS p WHERE p.OwnerUserId = @id;'; 
    
    EXEC sys.sp_executesql @sql, N'@id INT', @id_fix 
END;

Separate Stored Procedures Can Work

If you need to declare variables internally and perform some queries to assign values to them, passing them on to separate stored procedures can avoid the density estimates. The stored procedure occurs in a separate context, so all it sees are the values passed in as parameters, not their origins as variables.

In other words, parameters can be sniffed; variables can’t.

CREATE PROCEDURE dbo.game_time(@id INT)
AS
BEGIN
    DECLARE @id_fix INT;
    SET @id_fix = CASE WHEN @id < 0 THEN 1 ELSE @id END;
    
    EXEC dbo.some_new_proc @id_fix;
END;

Just pretend the dynamic SQL from above occupies the stored procedure dbo.some_new_proc here.

Optimizing For A Value Can Work

But choosing that value is hard. If one is feeling ambitious, one could take the local parameter value, compare it to the histogram on one’s own, then choose a value on one’s own that, one, on their own, could use to determine if a specific, common, or nearby value would be best to optimize for, using dynamic SQL that one has written on one’s own.

Ahem.

CREATE PROCEDURE dbo.game_time(@id INT) 
AS BEGIN 
    DECLARE @id_fix INT; 
    SET @id_fix = CASE WHEN @id < 0 THEN 1 ELSE @id END; 
    
    DECLARE @a_really_good_choice INT;
    SET @a_really_good_choice = 2147483647; --The result of some v. professional code IRL.

    DECLARE @sql NVARCHAR(MAX) = N''; 
    SET @sql += N'SELECT COUNT(*) FROM dbo.Posts AS p WHERE p.OwnerUserId = @id OPTION(OPTIMIZE FOR(@id = [a_really_good_choice]));'; 
    SET @sql = REPLACE(@sql, N'[a_really_good_choice]', @a_really_good_choice);

    EXEC sys.sp_executesql @sql, N'@id INT', @id_fix;
END; 
GO

 

Wrapping Up


This post aimed to give you some ways to avoid getting bad density vector estimates with local variables. If you’re getting good guesses, well, sorry you had to read all this.

When I see this pattern in client code, it’s often accompanied by comments about fixing parameter sniffing. While technically accurate, it’s more like plugging SQL Server’s nose with cotton balls and Lego heads.

Sometimes there will be several predicate filters that diminish the impact of estimates not using the histogram. Often a fairly selective predicate evaluated first is enough to make this not suck too badly. However, it’s worth learning about, and learning how to fix correctly.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Signs Your SQL Server Execution Plan Is Lying To You: Cached Temp Tables

Ain’t No Stress On Me


Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

When Should You Index Temp Tables In SQL Server?

What I Mean Is


You already know that your temp table needs an index. Let’s say there’s some query plan ouchie from not adding one. You’ve already realized that you should probably use a clustered index rather than a nonclustered index. Adding a nonclustered index leaves you with a heap and an index, and there are a lot of times when nonclustered indexes won’t be used because they don’t cover the query columns enough.

Good. We’ve fixed you.

But, like, when should you create the index?

Options


You can do one of these things:

  • Inline, when you create the table
  • After you create the table
  • After you load data into the table

This requires a bit of testing to get right.

Inline


In many cases, this is the best option, for reasons outlined by Pam Lahoud.

Do not explicitly drop temp tables at the end of a stored procedure, they will get cleaned up when the session that created them ends.
Do not alter temp tables after they have been created.
Do not truncate temp tables
Move index creation statements on temp tables to the new inline index creation syntax that was introduced in SQL Server 2014.

Where it can be a bad option is:

  • If you can’t get a parallel insert even with a TABLOCK hint
  • Sorting the data to match index order on insert could result in some discomfort

After Creation


This is almost always not ideal, unless you want to avoid caching the temp table, and for the recompilation to occur for whatever reason.

It’s not that I’d ever rule this out as an option, but I’d wanna have a good reason for it.

Probably even several.

After Insert


This can sometimes be a good option if the query plan you get from inserting into the index is deficient in some way.

Like I mentioned up above, maybe you lose parallel insert, or maybe the DML Request Sort is a thorn in your side.

This can be awesome! Except on Standard Edition, where you can’t create indexes in parallel. Which picks off one of the reasons for doing this in the first place, and also potentially causes you headaches with not caching temp tables, and statement level recompiles.

One upside here is that if you insert data into a temp table with an index, and then run a query that causes statistics generation, you’ll almost certainly get the default sampling rate. That could potentially cause other annoyances. Creating the index after loading data means you get the full scan stats.

Hooray, I guess.

This may not ever be the end of the world, but here’s a quick example:

DROP TABLE IF EXISTS #t;
GO 

--Create a table with an index already on it
CREATE TABLE #t(id INT, INDEX c CLUSTERED(id));

--Load data
INSERT #t WITH(TABLOCK)
SELECT p.OwnerUserId
FROM dbo.Posts AS p;

--Run a query to generate statistics
SELECT COUNT(*)
FROM #t AS t
WHERE t.id BETWEEN 1 AND 10000
GO 

--See what's poppin'
SELECT hist.step_number, hist.range_high_key, hist.range_rows, 
    hist.equal_rows, hist.distinct_range_rows, hist.average_range_rows
FROM tempdb.sys.stats AS s
CROSS APPLY tempdb.sys.dm_db_stats_histogram(s.[object_id], s.stats_id) AS hist
WHERE OBJECT_NAME(s.object_id, 2) LIKE '#t%'
GO 
DROP TABLE #t;


--Create a query with no index
CREATE TABLE #t(id INT NOT NULL);

--Load data
INSERT #t WITH(TABLOCK)
SELECT p.OwnerUserId
FROM dbo.Posts AS p;

--Create the index
CREATE CLUSTERED INDEX c ON #t(id);

--Run a query to generate statistics
SELECT COUNT(*)
FROM #t AS t
WHERE t.id BETWEEN 1 AND 10000

--See what's poppin'
SELECT hist.step_number, hist.range_high_key, hist.range_rows, 
    hist.equal_rows, hist.distinct_range_rows, hist.average_range_rows
FROM tempdb.sys.stats AS s
CROSS APPLY tempdb.sys.dm_db_stats_histogram(s.[object_id], s.stats_id) AS hist
WHERE OBJECT_NAME(s.object_id, 2) LIKE '#t%'
GO 
DROP TABLE #t;
Neckin’ Neck

On the left is the first 20 steps from the first histogram, and on the right is the first 20 from the second one.

You can see some big differences — whether or not they end up helping or hurting performance would take a lot of different tests. Quite frankly, it’s probably not where I’d start a performance investigation, but I’d be lying if I told you it never ended up there.

All Things Considerateded


In general, I’d stick to using the inline index creation syntax. If I had to work around issues with that, I’d create the index after loading data, but being on Standard Edition brings some additional considerations around parallel index creation.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Temporal Tables and Table Partitioning In SQL Server: Not If You Like Switching Partitions

These Are A Few Of Not My Favorite Things


This is the part of SQL Server I often find myself rolling my eyes at: poor feature interoperability, and that V1 smell that… never seems to turn into that V2 smell.

The full script is hosted here. I don’t want a tedious post full of setting up partitioning, etc.

I wanna get to the stuff that you might care about later.

If You Wanna…


The utility of Partitioning is being able to quickly switch partitions in and out. Data management. Not query performance (unless you’re using columnstore).

If you want to do that with temporal tables, your staging and “out” tables need to match exactly, down to the temporal-ness.

For example, this won’t work:

CREATE TABLE dbo.Votes_Deletes_Stage
(
    Id INT NOT NULL,
    PostId INT NOT NULL,
    UserId INT NULL,
    BountyAmount INT NULL,
    VoteTypeId INT NOT NULL,
    CreationDate DATETIME NOT NULL,
    SysStartTime DATETIME2(7) NOT NULL, --Versioning column
    SysEndTime DATETIME2(7) NOT NULL --Versioning column
    CONSTRAINT dr_rockzo_Stage PRIMARY KEY CLUSTERED (CreationDate, Id) ON [PRIMARY]
) ON [PRIMARY];

You gotta have all the same stuff you used to get your partitioned table set up for temporal-ness.

CREATE TABLE dbo.Votes_Deletes_Stage
(
    Id INT NOT NULL,
    PostId INT NOT NULL,
    UserId INT NULL,
    BountyAmount INT NULL,
    VoteTypeId INT NOT NULL,
    CreationDate DATETIME NOT NULL,
    SysStartTime DATETIME2(7) GENERATED ALWAYS AS ROW START HIDDEN
        CONSTRAINT df_VotesDeletes_Stage_SysStartTime
            DEFAULT SYSDATETIME(),
    SysEndTime DATETIME2(7) GENERATED ALWAYS AS ROW END HIDDEN
        CONSTRAINT df_VotesDeletes_Stage_SysEndTime
            DEFAULT CONVERT(DATETIME2(7), '9999-12-31 23:59:59.9999999')
    CONSTRAINT dr_rockzo_Stage PRIMARY KEY CLUSTERED (CreationDate, Id) ON [PRIMARY],
  PERIOD FOR SYSTEM_TIME([SysStartTime], [SysEndTime])
) ON [PRIMARY];

Then If You Wanna…


Switch data in or out, you have to turn off the temporal-ness.

Msg 13546, Level 16, State 1, Line 97
Switching out partition failed on table ‘DeletesDemo.dbo.Votes_Deletes’ because it is not a supported operation on system-versioned tables. Consider setting SYSTEM_VERSIONING to OFF and trying again.

“Consider turning off the feature that takes forever to turn back on with large tables so you can do the thing partitioning does quickly”

Don’t worry, the color red you’re seeing is totally natural.

And hey, once you’ve turned it off, you can swap a partition in or out.

A Normal Partitioning Problem


The partition you’re going to switch in needs to have a constraint on it that tells the whatever that the data you’re switching in is valid for the partition you’re swapping it into.

Msg 4982, Level 16, State 1, Line 105
ALTER TABLE SWITCH statement failed. Check constraints of source table ‘DeletesDemo.dbo.Votes_Deletes_Stage’ allow values that are not allowed by range defined by partition 8 on target table ‘DeletesDemo.dbo.Votes_Deletes’.

The thing is, this error message sucks. It sucks all the suck. Especially when dealing with temporal tables, you might think something odd happened with the constraints on your versioning columns. They both have constraints on them. WHICH CONSTRAINT IS THE PROBLEM?

If you’re new to Partitioning, you may not have ever switched data into or out of a table before. This error message can be a real head-scratcher.

The fix is to add a check constraint to your staging table — the one you’re swapping in — that tells Partitioning about what’s in the table. In my case, I have the Votes_Deletes table partitioned by CreationDate, by one year ranges. For me, Partition 8 contains values for the year 2013. To make sure it’s safe to swap my staging partition into the partition for that year, it needs a constraint:

ALTER TABLE dbo.Votes_Deletes_Stage
ADD CONSTRAINT ck_yrself 
    CHECK (CreationDate >= '20130101' AND CreationDate < '20140101' 
         AND CreationDate IS NOT NULL);

And You Should Probably


Turn the temporal-ness back on. When you do that, you have an option. Do you want to make sure your data is consistent?

ALTER TABLE dbo.Votes_Deletes SET (SYSTEM_VERSIONING = ON
 ( HISTORY_TABLE=dbo.Votes_Deletes_History,
   DATA_CONSISTENCY_CHECK= ON) );

If you don’t, re-enabling is instant. Buuuuuuut you take the chance that some data in your table may have changed while you were tooting around trying to get partitions swapped in and out. I have no idea what the ramifications of skipping the check might be. In the context of this post, probably nothing. If you’ve got a full history table and the specter of changes during this whole shebang…

This is what the query plan for turning it back on looks like.

SQL Server Query Plan
Two minutes for what?

There’s nothing in the history table. If there were, this could get really bad (trust me, ha ha ha). What checks do we do when the history table is empty?

Le Shrug, as they say in the land of protest.

But a 53 million row assert sure does take a couple minutes.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Signs Your SQL Server Execution Plan Is Lying To You: Table Variable Cardinality Estimation

Terrible Tables


Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

How Table Variables Prevent SQL Server From Using A Parallel Query Plan

Well, huh


Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

What Kind Of Indexes Can You Create On SQL Server Temporary Objects?

That, There


Intermediate result materialization is one of the most successful tuning methods that I regularly use.

Separating complexity is only second to eliminating complexity. Of course, sometimes those intermediate results need indexing to finish the job.

Not always. But you know. I deal with some weird stuff.

Like anime weird.

Listings!


Regular ol’ #temp tables can have just about any kind of index plopped on them. Clustered, nonclustered, filtered, column store (unless you’re using in-memory tempdb with SQL Server 2019). That’s nice parity with regular tables.

Of course, lots of indexes on temp tables have the same problem as lots of indexes on regular tables. They can slow down loading data in, especially if there’s a lot. That’s why I usually tell people load first, create indexes later.

There are a couple ways to create indexes on #temp tables:

Create, then add

/*Create, then add*/
CREATE TABLE #t (id INT NOT NULL);
/*insert data*/
CREATE CLUSTERED INDEX c ON #t(id);

Create inline

/*Create inline*/
CREATE TABLE #t(id INT NOT NULL,
                INDEX c CLUSTERED (id));

It depends on what problem you’re trying to solve:

  • Recompiles caused by #temp tables: Create Inline
  • Slow data loads: Create, then add

Another option to help with #temp table recompiles is the KEEPFIXED PLAN hint, but to wit I’ve only ever seen it used in sp_WhoIsActive.

Forgotten


Often forgotten is that table variables can be indexed in many of the same ways (at least post SQL Server 2014, when the inline index create syntax came about). The only kinds of indexes that I care about that you can’t create on a table variable are column store and filtered (column store generally, filtered pre-2019).

Other than that, it’s all fair game.

DECLARE @t TABLE( id INT NOT NULL,
                  INDEX c CLUSTERED (id),
				  INDEX n NONCLUSTERED (id) );

You can create clustered and nonclustered indexes on them, they can be unique, you can add primary keys.

It’s a whole thing.

Futuristic


In SQL Server 2019, we can also create indexes with included columns and filtered indexes with the inline syntax.

CREATE TABLE #t( id INT, 
                 more_id INT, 
				 INDEX c CLUSTERED (id),
                 INDEX n NONCLUSTERED (more_id) INCLUDE(id),
				 INDEX f NONCLUSTERED (more_id) WHERE more_id > 1 );


DECLARE @t TABLE ( id INT, 
                   more_id INT, 
				   INDEX c CLUSTERED (id),
                   INDEX n NONCLUSTERED (more_id) INCLUDE(id),
				   INDEX F NONCLUSTERED (more_id) WHERE more_id > 1 );

 

Missing Persons


Notice that I’m not talking about CTEs here. You can’t index create indexes on those.

Perhaps that’s why they’re called “common”.

Yes, you can index the underlying tables in your query, but the results of CTEs don’t get physically stored anywhere that would allow you to create an index on them.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Function Plan Caching In SQL Server: Multi Statement Table Valued Functions

Too Clean


In yesterday’s post, we looked at we looked at simple scalar function plan caching.

Today, we’ll look at MSTVFs. If you’re not sure what that means, look at the title of the post real quick.

Yeah, up there.

On we go.

Dirty Version


The function will do the same thing as before, just rewritten to be a MSVTF.

CREATE OR ALTER FUNCTION dbo.CommentsAreHorribleMulti(@Id INT)
RETURNS @Tally TABLE(Tally BIGINT)
WITH SCHEMABINDING
AS
BEGIN
INSERT @Tally ( Tally )
SELECT (SELECT SUM(Score) FROM dbo.Posts AS p WHERE p.OwnerUserId <= @Id)
	   -
	   (SELECT COUNT_BIG(*) FROM dbo.Comments AS c WHERE c.UserId <= @Id)
RETURN
END
GO

Now, where these differ immediately from SVFs (scalar valued functions), is that they don’t show up in the plan cache by name.

Wrong Hook

Note that these are both “statements”.

Also, unlike SVFs, they don’t show up in dm_exec_function_stats. This is documented behavior, but whatever.

And even though they’re called a “Proc” in dm_exec_cached_plans, they only show up in dm_exec_query_stats, not dm_exec_procedure_stats (which is why BlitzCache calls them a Statement).

Different Sniffint


Unlike SVFs, which don’t have a restriction on the function body using parallelism, all table variable modifications are forced to run serially (unless you’re sneaky).

That means both insert queries will be serialized, with the main difference being index access.

SQL Server Query Plan
arrowed

Like before, if we cache either plan, it will get reused. And just like before, the clustered index scan plan is significantly slower.

SELECT TOP (5) u.DisplayName,
       (SELECT * FROM dbo.CommentsAreHorribleMulti(u.Id))
FROM dbo.Users AS u
SQL Server Query Plan
Old Dress

Plan Noir


Just like scalar functions, these can have different plans cached and reused, and may fall victim to parameter sniffing.

Again, this depends a lot on how the function is called and used. It’s just something to be aware of when tuning queries that call functions.

Execution times may vary greatly depending on… well…

Parameters.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Creating #Temp Tables Dynamically In SQL Server

Tinker Toy


Let’s say you have dynamic SQL that selects different different data based on some conditions.

Let’s also say that data needs to end up in a temp table.

Your options officially suck.

If you create the table outside dynamic SQL, you need to know which columns to use, and how many, to insert into the table.

You can’t do SELECT…INTO with an EXEC.

If you create the table inside dynamic SQL, you can’t use it outside the dynamic SQL.

But…

Altered Images


There’s a fun function in SQL Server 2012+, dm_exec_describe_first_result_set.

People mainly use it for stored procedures (I think?), but it can also work like this:

DECLARE @sql1 NVARCHAR(MAX) = N'SELECT TOP 10 * FROM dbo.Users AS u WHERE u.Reputation > @i';
DECLARE @sql2 NVARCHAR(MAX) = N'SELECT TOP 10 * FROM dbo.Posts AS p WHERE p.Score > @i';

SELECT column_ordinal, name, system_type_name
FROM   sys.dm_exec_describe_first_result_set(@sql1, NULL, 0);

SELECT column_ordinal, name, system_type_name
FROM   sys.dm_exec_describe_first_result_set(@sql2, NULL, 0);

The results for the Users table look like this:

For you must

Don’t Judge Me


The best way I’ve found to do this is to use that output to generate an ALTER TABLE to add the correct columns and data types.

Here’s a dummy stored procedure that does it:

CREATE OR ALTER PROCEDURE dbo.dynamic_temp ( @TableName NVARCHAR(128))
AS
    BEGIN
        SET NOCOUNT ON;

        CREATE TABLE #t ( Id INT );
        DECLARE @sql NVARCHAR(MAX) = N'';

        IF @TableName = N'Users'
            BEGIN
                SET @sql = @sql + N'SELECT TOP 10 * FROM dbo.Users AS u WHERE u.Reputation > @i';
            END;

        IF @TableName = N'Posts'
            BEGIN
                SET @sql = @sql + N'SELECT TOP 10 * FROM dbo.Posts AS p WHERE p.Score > @i';
            END;

        SELECT   column_ordinal, name, system_type_name
        INTO     #dfr
        FROM     sys.dm_exec_describe_first_result_set(@sql, NULL, 0)
        ORDER BY column_ordinal;

        DECLARE @alter NVARCHAR(MAX) = N'ALTER TABLE #t ADD ';

        SET @alter += STUFF((   SELECT   NCHAR(10) + d.name + N' ' + d.system_type_name + N','
                                FROM     #dfr AS d
                                WHERE    d.name <> N'Id'
                                ORDER BY d.column_ordinal
                                FOR XML PATH(N''), TYPE ).value(N'.[1]', N'NVARCHAR(4000)'), 1, 1, N'');

        SET @alter = LEFT(@alter, LEN(@alter) - 1);

        EXEC ( @alter );

        INSERT #t
        EXEC sys.sp_executesql @sql, N'@i INT', @i = 10000;

        SELECT *
        FROM   #t;

    END;
GO

I can execute it for either Users or Posts, and get back the results I want.

EXEC dbo.dynamic_temp @TableName = 'Users';
EXEC dbo.dynamic_temp @TableName = 'Posts';

So yeah, this is generally a pretty weird requirement.

It might even qualify as Bad Idea Spandex™

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

A Trick For Dealing With Table Variable Modification Performance Problems In SQL Server

Don’t Want None


One of the many current downsides of @table variables is that modifying them inhibits parallelism, which is a problem #temp tables don’t have.

While updating and deleting from @table variables is fairly rare (I’ve seen it, but not too often), you at minimum need an insert to put some data in there.

No matter how big, bad, ugly, or costly your insert statement is, SQL Server can’t parallelize it.

Dimmo


Here’s our select statement.

SELECT DISTINCT
       u.Id
FROM   dbo.Users AS u
JOIN   dbo.Posts AS p
    ON p.OwnerUserId = u.Id
JOIN   dbo.Comments AS c
    ON  c.PostId = p.Id
    AND c.UserId = u.Id
WHERE  c.Score >= 5;

This goes parallel and runs for about 3 seconds.

SQL Server Query Plan
Now my heart is full~

But if we try to insert that into a @table variable, the plan will no longer go parallel, and will run for ~6 seconds.

DECLARE @icko TABLE (id INT);
INSERT @icko ( id )
SELECT DISTINCT
       u.Id
FROM   dbo.Users AS u
JOIN   dbo.Posts AS p
    ON p.OwnerUserId = u.Id
JOIN   dbo.Comments AS c
    ON  c.PostId = p.Id
    AND c.UserId = u.Id
WHERE  c.Score >= 5;
SQL Server Query Plan
Boy Racer

If we hit F4 to get the properties of the INSERT, well…

SQL Server Query Plan Properties
Chocolate Nonpareils Reason?

Let’s Say For Some Crazy Reason


You need to keep using a table variable.

Let’s say, I dunno, the crappy 1 row estimate gets you a better plan.

Or like… I dunno. Temp tables recompile too much.

I’m reaching, I know. But hey, that’s what consultants do. Have you read any blog posts lately?

If we change our insert to this, we get parallelism back:

DECLARE @icko TABLE (id INT);
INSERT @icko ( id )
EXEC(N'SELECT DISTINCT
              u.Id
       FROM   dbo.Users AS u
       JOIN   dbo.Posts AS p
           ON p.OwnerUserId = u.Id
       JOIN   dbo.Comments AS c
           ON  c.PostId = p.Id
           AND c.UserId = u.Id
       WHERE  c.Score >= 5;')
SQL Server Query Plan
Glamorous Glue

The dynamic SQL executes in a separate context, and the insert happens in… idk some magickal place.

But THAT’S PRETTY COOL, HUH?

This would also work if we put the query inside a stored procedure, or stuck the statement inside a variable and used sp_executesql.

In either case, the INSERT…EXEC pattern is your friend here.

Of course, you could just use a #temp table.

Sigh.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.