SQL Server Archives | Page 132 of 164

When Should You Index Temp Tables In SQL Server?

Posted on March 4, 2020May 16, 2022 by Erik Darling

What I Mean Is

You already know that your temp table needs an index. Let’s say there’s some query plan ouchie from not adding one. You’ve already realized that you should probably use a clustered index rather than a nonclustered index. Adding a nonclustered index leaves you with a heap and an index, and there are a lot of times when nonclustered indexes won’t be used because they don’t cover the query columns enough.

Good. We’ve fixed you.

But, like, when should you create the index?

Options

You can do one of these things:

Inline, when you create the table
After you create the table
After you load data into the table

This requires a bit of testing to get right.

Inline

In many cases, this is the best option, for reasons outlined by Pam Lahoud.

Do not explicitly drop temp tables at the end of a stored procedure, they will get cleaned up when the session that created them ends.
Do not alter temp tables after they have been created.
Do not truncate temp tables
Move index creation statements on temp tables to the new inline index creation syntax that was introduced in SQL Server 2014.

Where it can be a bad option is:

If you can’t get a parallel insert even with a TABLOCK hint
Sorting the data to match index order on insert could result in some discomfort

After Creation

This is almost always not ideal, unless you want to avoid caching the temp table, and for the recompilation to occur for whatever reason.

It’s not that I’d ever rule this out as an option, but I’d wanna have a good reason for it.

Probably even several.

After Insert

This can sometimes be a good option if the query plan you get from inserting into the index is deficient in some way.

Like I mentioned up above, maybe you lose parallel insert, or maybe the DML Request Sort is a thorn in your side.

This can be awesome! Except on Standard Edition, where you can’t create indexes in parallel. Which picks off one of the reasons for doing this in the first place, and also potentially causes you headaches with not caching temp tables, and statement level recompiles.

One upside here is that if you insert data into a temp table with an index, and then run a query that causes statistics generation, you’ll almost certainly get the default sampling rate. That could potentially cause other annoyances. Creating the index after loading data means you get the full scan stats.

Hooray, I guess.

This may not ever be the end of the world, but here’s a quick example:

DROP TABLE IF EXISTS #t;
GO 

--Create a table with an index already on it
CREATE TABLE #t(id INT, INDEX c CLUSTERED(id));

--Load data
INSERT #t WITH(TABLOCK)
SELECT p.OwnerUserId
FROM dbo.Posts AS p;

--Run a query to generate statistics
SELECT COUNT(*)
FROM #t AS t
WHERE t.id BETWEEN 1 AND 10000
GO 

--See what's poppin'
SELECT hist.step_number, hist.range_high_key, hist.range_rows, 
    hist.equal_rows, hist.distinct_range_rows, hist.average_range_rows
FROM tempdb.sys.stats AS s
CROSS APPLY tempdb.sys.dm_db_stats_histogram(s.[object_id], s.stats_id) AS hist
WHERE OBJECT_NAME(s.object_id, 2) LIKE '#t%'
GO 
DROP TABLE #t;


--Create a query with no index
CREATE TABLE #t(id INT NOT NULL);

--Load data
INSERT #t WITH(TABLOCK)
SELECT p.OwnerUserId
FROM dbo.Posts AS p;

--Create the index
CREATE CLUSTERED INDEX c ON #t(id);

--Run a query to generate statistics
SELECT COUNT(*)
FROM #t AS t
WHERE t.id BETWEEN 1 AND 10000

--See what's poppin'
SELECT hist.step_number, hist.range_high_key, hist.range_rows, 
    hist.equal_rows, hist.distinct_range_rows, hist.average_range_rows
FROM tempdb.sys.stats AS s
CROSS APPLY tempdb.sys.dm_db_stats_histogram(s.[object_id], s.stats_id) AS hist
WHERE OBJECT_NAME(s.object_id, 2) LIKE '#t%'
GO 
DROP TABLE #t;

On the left is the first 20 steps from the first histogram, and on the right is the first 20 from the second one.

You can see some big differences — whether or not they end up helping or hurting performance would take a lot of different tests. Quite frankly, it’s probably not where I’d start a performance investigation, but I’d be lying if I told you it never ended up there.

All Things Considerateded

In general, I’d stick to using the inline index creation syntax. If I had to work around issues with that, I’d create the index after loading data, but being on Standard Edition brings some additional considerations around parallel index creation.

Thanks for reading!

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Temporal Tables and Table Partitioning In SQL Server: Not If You Like Switching Partitions

Posted on March 3, 2020May 16, 2022 by Erik Darling

These Are A Few Of Not My Favorite Things

This is the part of SQL Server I often find myself rolling my eyes at: poor feature interoperability, and that V1 smell that… never seems to turn into that V2 smell.

The full script is hosted here. I don’t want a tedious post full of setting up partitioning, etc.

I wanna get to the stuff that you might care about later.

If You Wanna…

The utility of Partitioning is being able to quickly switch partitions in and out. Data management. Not query performance (unless you’re using columnstore).

If you want to do that with temporal tables, your staging and “out” tables need to match exactly, down to the temporal-ness.

For example, this won’t work:

CREATE TABLE dbo.Votes_Deletes_Stage
(
    Id INT NOT NULL,
    PostId INT NOT NULL,
    UserId INT NULL,
    BountyAmount INT NULL,
    VoteTypeId INT NOT NULL,
    CreationDate DATETIME NOT NULL,
    SysStartTime DATETIME2(7) NOT NULL, --Versioning column
    SysEndTime DATETIME2(7) NOT NULL --Versioning column
    CONSTRAINT dr_rockzo_Stage PRIMARY KEY CLUSTERED (CreationDate, Id) ON [PRIMARY]
) ON [PRIMARY];

You gotta have all the same stuff you used to get your partitioned table set up for temporal-ness.

CREATE TABLE dbo.Votes_Deletes_Stage
(
    Id INT NOT NULL,
    PostId INT NOT NULL,
    UserId INT NULL,
    BountyAmount INT NULL,
    VoteTypeId INT NOT NULL,
    CreationDate DATETIME NOT NULL,
    SysStartTime DATETIME2(7) GENERATED ALWAYS AS ROW START HIDDEN
        CONSTRAINT df_VotesDeletes_Stage_SysStartTime
            DEFAULT SYSDATETIME(),
    SysEndTime DATETIME2(7) GENERATED ALWAYS AS ROW END HIDDEN
        CONSTRAINT df_VotesDeletes_Stage_SysEndTime
            DEFAULT CONVERT(DATETIME2(7), '9999-12-31 23:59:59.9999999')
    CONSTRAINT dr_rockzo_Stage PRIMARY KEY CLUSTERED (CreationDate, Id) ON [PRIMARY],
  PERIOD FOR SYSTEM_TIME([SysStartTime], [SysEndTime])
) ON [PRIMARY];

Then If You Wanna…

Switch data in or out, you have to turn off the temporal-ness.

Msg 13546, Level 16, State 1, Line 97
Switching out partition failed on table ‘DeletesDemo.dbo.Votes_Deletes’ because it is not a supported operation on system-versioned tables. Consider setting SYSTEM_VERSIONING to OFF and trying again.

“Consider turning off the feature that takes forever to turn back on with large tables so you can do the thing partitioning does quickly”

Don’t worry, the color red you’re seeing is totally natural.

And hey, once you’ve turned it off, you can swap a partition in or out.

A Normal Partitioning Problem

The partition you’re going to switch in needs to have a constraint on it that tells the whatever that the data you’re switching in is valid for the partition you’re swapping it into.

Msg 4982, Level 16, State 1, Line 105
ALTER TABLE SWITCH statement failed. Check constraints of source table ‘DeletesDemo.dbo.Votes_Deletes_Stage’ allow values that are not allowed by range defined by partition 8 on target table ‘DeletesDemo.dbo.Votes_Deletes’.

The thing is, this error message sucks. It sucks all the suck. Especially when dealing with temporal tables, you might think something odd happened with the constraints on your versioning columns. They both have constraints on them. WHICH CONSTRAINT IS THE PROBLEM?

If you’re new to Partitioning, you may not have ever switched data into or out of a table before. This error message can be a real head-scratcher.

The fix is to add a check constraint to your staging table — the one you’re swapping in — that tells Partitioning about what’s in the table. In my case, I have the Votes_Deletes table partitioned by CreationDate, by one year ranges. For me, Partition 8 contains values for the year 2013. To make sure it’s safe to swap my staging partition into the partition for that year, it needs a constraint:

ALTER TABLE dbo.Votes_Deletes_Stage
ADD CONSTRAINT ck_yrself 
    CHECK (CreationDate >= '20130101' AND CreationDate < '20140101' 
         AND CreationDate IS NOT NULL);

And You Should Probably

Turn the temporal-ness back on. When you do that, you have an option. Do you want to make sure your data is consistent?

ALTER TABLE dbo.Votes_Deletes SET (SYSTEM_VERSIONING = ON
 ( HISTORY_TABLE=dbo.Votes_Deletes_History,
   DATA_CONSISTENCY_CHECK= ON) );

If you don’t, re-enabling is instant. Buuuuuuut you take the chance that some data in your table may have changed while you were tooting around trying to get partitions swapped in and out. I have no idea what the ramifications of skipping the check might be. In the context of this post, probably nothing. If you’ve got a full history table and the specter of changes during this whole shebang…

This is what the query plan for turning it back on looks like.

SQL Server Query Plan — Two minutes for what?

There’s nothing in the history table. If there were, this could get really bad (trust me, ha ha ha). What checks do we do when the history table is empty?

Le Shrug, as they say in the land of protest.

But a 53 million row assert sure does take a couple minutes.

Thanks for reading!

Going Further

How Functions Can Make Performance Tuning SQL Server Queries Harder

Posted on March 2, 2020May 15, 2022 by Erik Darling

Sensational

I’ve posted quite a bit about how cached plans can be misleading.

I’m gonna switch that up and talk about how an actual plan can be misleading, too.

In plans that include calling a muti-statement table valued function, no operator logs the time spent in the function.

Here’s an example:

SELECT TOP (100)
     p.Id AS [Post Link],
     vs.up,
     vs.down
FROM dbo.VoteStats() AS vs --The function
JOIN dbo.Posts AS p
    ON vs.postid = p.Id
WHERE vs.down > vs.up_multiplier
AND   p.CommunityOwnedDate IS NULL
AND   p.ClosedDate IS NULL
ORDER BY vs.up DESC

When I run the query, it drags on for 30-ish seconds, but the plan says that it only ran for about 2.7 seconds.

But there it is in Query Time Stats! 29 seconds. What gives?

SQL Server Query Times From Execution Plan — Hi there!

Estimations

If we look at the estimated plan for the function, we can see quite a thick arrow pointing to the table variable we populate for our results.

That process is all part of the query, but it doesn’t show up in any of the operators. It really should.

More specifically, I think it should show up right here.

Thanks for reading!

Going Further

Scalar Aggregates Can Use Hash Matches With Batch Mode

Posted on February 27, 2020May 16, 2022 by Erik Darling

GREAT

Going Further

This Won’t Interest You At All

Posted on February 26, 2020May 14, 2022 by Erik Darling

wtfudf

Going Further

SQL Server 2019: A Heuristic Evening With The Optimizer

Posted on February 25, 2020May 14, 2022 by Erik Darling

The Land Of Do-Overs

Of the new things in SQL Server 2019 that I plan on presenting about, the Batch Mode on Row Store (BMOR, from here) enhancements are probably the most interesting from a query tuning point of view.

Things like Accelerated Database Recovery, Optimize For Sequential Key, and In-Memory Tempdb Metadata are cool, but they’re server tuning. I love’em, but they’re more helpful for tuning an entire workload than a specific query.

The thing with BMOR is that it’s not just one thing. Getting Batch Mode also allows Adaptive Joins and Memory Grant Feedback to kick in.

But they’re all separate heuristics.

Getting Batch Mode

To get Batch Mode to kick in for a Row Store query, it has to pass a certain set of heuristics, which can be viewed in Extended Events.

SELECT dxoc.object_name, dxoc.name, dxoc.description
FROM sys.dm_xe_object_columns AS dxoc
JOIN sys.dm_xe_objects AS dxo
    ON  dxo.package_guid = dxoc.object_package_guid
    AND dxo.name = dxoc.object_name
WHERE dxo.name = 'batch_mode_heuristics' AND dxoc.column_type = 'data'
ORDER BY dxoc.name;

Once we’ve got Batch Mode, we can use the other stuff. But they have their own jim-jams.

Getting Adaptive Joins

To get Adaptive Joins, you need to pass these heuristics.

SELECT dxmv.name,
       dxmv.map_value,
       dxo.description
FROM sys.dm_xe_map_values AS dxmv
    JOIN sys.dm_xe_objects AS dxo
        ON dxmv.object_package_guid = dxo.package_guid
           AND dxmv.name = dxo.name
WHERE dxmv.name = 'adaptive_join_skipped_reason';

SQL Server Extended Events — Try refreshing

No, those aren’t plain English, but you can decode most of them. They mostly deal with index matching, and cardinality making sense to go down this route.

Getting Memory Grant Feedback

There isn’t a lot in Extended Events to tell you when this will happen, but it is documented. And written about.

For excessive grants, if the granted memory is more than two times the size of the actual used memory, memory grant feedback will recalculate the memory grant and update the cached plan. Plans with memory grants under 1MB will not be recalculated for overages.

For insufficiently sized memory grants that result in a spill to disk for batch mode operators, memory grant feedback will trigger a recalculation of the memory grant. Spill events are reported to memory grant feedback and can be surfaced via the spilling_report_to_memory_grant_feedback XEvent event. This event returns the node id from the plan and spilled data size of that node.

We can still see some stuff, though.

SELECT dxoc.object_name,
       dxoc.name,
       dxoc.description
FROM sys.dm_xe_object_columns AS dxoc
JOIN sys.dm_xe_objects AS dxo
ON dxo.package_guid = dxoc.object_package_guid
AND dxo.name = dxoc.object_name
WHERE dxo.name = 'memory_grant_feedback_loop_disabled'
AND dxoc.column_type = 'data'

UNION ALL 

SELECT dxoc.object_name,
       dxoc.name,
       dxoc.description
FROM sys.dm_xe_object_columns AS dxoc
JOIN sys.dm_xe_objects AS dxo
ON dxo.package_guid = dxoc.object_package_guid
AND dxo.name = dxoc.object_name
WHERE dxo.name = 'memory_grant_updated_by_feedback'
AND dxoc.column_type = 'data'
ORDER BY dxoc.name;

Getting All Three

In SQL Server 2019, you may see plans with Batch Mode operators happening for Row Store indexes, but you may not get an Adaptive Join, or Memory Grant Feedback. If you have a lot of single-use plans, you’ll never see them getting Memory Grant Feedback (I mean, they might, but it won’t matter because there won’t be a second execution, ha ha ha).

It’s important to remember that this isn’t all just one feature, but a family of them for improving query performance for specific scenarios.

On Enterprise Edition.

In compat level 150.

Say, where’d I put that scotch…

Thanks for reading!

Going Further

My Favorite SQL Server Cursor Options: Local Static

Posted on February 24, 2020May 16, 2022 by Erik Darling

Cursory

Going Further

SQL Server Table Columns That Share Ordering

Posted on February 20, 2020May 14, 2022 by Erik Darling

Somebody Set Up Us The Column

This is an okay trick to keep in mind when you need to use order by on a large table.

Of course, we care about order by for many very good reasons, especially when we don’t have an index to support the ordering.

Sorting data requires memory, and Sort operators particularly may ask for quite a bit of memory.

Why? Because you need to sort all the columns you’re selecting by the column you’re ordering by.

Sorts aren’t just for the column(s) in your order by — if you SELECT *, you need order to all the columns in the * by all the columns in the order by.

I know I basically repeated myself. That’s for emphasis. It’s something professional writers do.

Dig it.

Butheywhatabout

Let’s say, just for kicks, that we have a table in our database. And maybe it has a column called something like “Id” in it.

Pushing this tale further into glory, let’s also assume that this legendary “Id” column is the primary key and clustered index.

That means we have the entire table sorted by this one column. Cool.

Tighten those wood screws, because we’re about to go cat 5 here. Ready?

There’s a date or date time column in the table — let’s say it defines when the row was first inserted into the table.

It could be a creation date, or an order date. Doesn’t matter.

What does matter? That the “Id” and the “*Date” column increment at the same time, which means that they’re in the same order.

It may suit your queries better to order by the clustered index key column rather than another column in the table which may not be in a helpful index in a helpful order for you query.

Too Sort

Take these two queries:

SELECT TOP (1000) *
FROM dbo.Posts AS p 
ORDER BY p.Id;

SELECT TOP (1000) *
FROM dbo.Posts AS p 
ORDER BY p.CreationDate;

I know, they’re terribly unrealistic. No one would ever. Not even close. Fine.

Though both queries present the same data in the same order, the query that orders by the CreationDate column takes uh.

Considerably longer.

For reasons that should be apparent.

Of course, we could add an index to help. Just add all the indexes. What could go wrong?

If you have the type of application that lets users, say, dynamically filter and order by whatever columns they want, you’ve got a whole lot of index to create.

Better get started.

Thanks for reading!

Going Further

Indexed View Maintenance Is Only As Bad As Your Indexes In SQL Server

Posted on February 19, 2020August 4, 2022 by Erik Darling

Wah Wah Wah

Yes, indexed view maintenance can be quite rough. I don’t mean like, rebuilding them. I will never talk about that.

I mean that, in some cases locks are serializable, and that if you don’t mind your indexes you may find run-of-the-mill modifications taking quite a long time.

Let’s go look!

Mill Town

Let’s get update a small chunk of the Posts table.

BEGIN TRAN
UPDATE p
SET p.Score += 100
FROM dbo.Posts AS p
WHERE p.OwnerUserId = 22656;
ROLLBACK

Let’s all digress from the main point of this post for a moment!

It’s generally useful to give modifications an easy path to find data they need to update. For example:

This update takes 1.6 seconds because we have no useful index on OwnerUserId. But we get a daft missing index request, because it wants to include Score, which would mean we’d need to then update that index as well as read from it. Locking leads to NOLOCK hints. I tend to want to introduce as little of it as possible.

With an index on just OwnerUserId, our situation improves dramatically.

Allow Me To Reintroduce Myself

Let’s see what happens to our update with an indexed view in place.

CREATE OR ALTER VIEW dbo.PostScoresVotes
WITH SCHEMABINDING
AS 
SELECT   p.Id, 
         SUM(p.Score * 1.0) AS ScoreSum, 
		 COUNT_BIG(v.Id) AS VoteCount,
		 COUNT_BIG(*) AS OkayThen 
FROM     dbo.Posts AS p
JOIN     dbo.Votes AS v
    ON p.Id = v.PostId
WHERE p.PostTypeId = 2
AND   p.CommunityOwnedDate IS NULL
GROUP BY p.Id;
GO

CREATE UNIQUE CLUSTERED INDEX c_Id 
    ON dbo.PostScoresVotes(Id);

Our update query now takes about 10 seconds…

With the majority of the time being spent assembling the indexed view for maintenance.

The Problem Of Course

Is that our indexes are bad. We’ve got no helpful index between Posts and Votes to help with the assembly.

Our first clue may have been when creating the indexed view took a long time, but hey.

Let’s fix it.

CREATE INDEX v ON dbo.Votes(PostId);

Now our update finishes in about a second!

Thanks for reading!

Going Further

Signs Your SQL Server Execution Plan Is Lying To You

Posted on January 27, 2020May 16, 2022 by Erik Darling

Polygraph

Unless you’re looking at an actual execution plan, one must never ever ever ever ever believe what one is seeing.

What one is seeing is a series of estimates that may have very little to do with reality. I don’t only mean within the execution plan; I also mean the runtime issues a query may encounter.

With all that in mind, here are some of the many ways a query plan can hide work:

Pants On Fire:

Low CPU, Long Duration: Scalar Valued Functions, Table Variables, Row Goals
Low Cost, High CPU: Blocking, Triggers
Spools: Index, Table
Parallelism: Exchange Spills, Repartition Streams
Dynamic SQL and Sub Procs
Many to Many Merge Joins
Nested Loops/Lookups High Executions
Memory Grants/Sort and Hash Spills
Operator Costs
Batch Costs
Branch usage/Startup Expressions
Cached Temp Tables

Over the next few weeks, I’m going to show you how all of these things can be dreadfully wrong. The links will come alive here over the next few weeks, but you can see the full playlist here.

Anyway.

Thanks for reading!