create temp table sql – Page 104

Be Careful With SET ANSI_DEFAULTS ON In SQL Server

Posted on February 23, 2021May 16, 2022 by Erik Darling

Secret Agent Plan

My dislike for implicit transactions is well documented. Recently, while working with a client, I noticed that they had a bunch of them causing blocking.

Digging in a little bit further, I noticed they were all coming from an Agent Job. Of course, by default, Agent runs with a bunch of wacked-out ANSI options.

To get the job to perform better — which it did — it had to make use of a filtered index on an archival task. If you scroll way down in that doc, you’ll see a note:

Review the required SET options for filtered index creation in CREATE INDEX (Transact-SQL) syntax

In order to create, or have queries use your filtered index, they need to have very specific options set correctly.

Baggage

Rather than just setting the required options, which was apparently a lot of typing, someone had just set all the ANSI defaults on.

SET ANSI_DEFAULTS ON;

But this comes with some additional baggage, in the form of implicit transactions. If you run DBCC USEROPTIONS; with that turned on:

Set Option              Value
----------------------- --------------
textsize                2147483647
language                us_english
dateformat              mdy
datefirst               7
statistics XML          SET
lock_timeout            -1
quoted_identifier       SET
arithabort              SET
ansi_null_dflt_on       SET
ansi_defaults           SET
ansi_warnings           SET
ansi_padding            SET
ansi_nulls              SET
concat_null_yields_null SET
cursor_close_on_commit  SET
implicit_transactions   SET <---- UwU what's this
isolation level         read committed

It sets all the things you actually need, plus a couple other options for implicit transactions and cursor close on commit.

Baggage

Of course, had someone just done a bit more typing, all would have been well and good.

SET ANSI_NULLS ON;
SET ANSI_PADDING ON;
SET ANSI_WARNINGS ON;
SET ARITHABORT ON;
SET CONCAT_NULL_YIELDS_NULL ON;
SET QUOTED_IDENTIFIER ON;

Using SET ANSI_DEFAULTS OFF;is equally disappointing, sort of.

Set Option              Value
----------------------- --------------
textsize                2147483647
language                us_english
dateformat              mdy
datefirst               7
lock_timeout            -1
arithabort              SET
concat_null_yields_null SET
isolation level         read committed

It really does just flip everything off. Not that I’m saying it shouldn’t — but maybe we need a command in between?

SET ANSI_DEFAULTS BACK_TO_NORMAL; or something.

Whatever “normal” means.

Thanks for reading!

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Documentation for dm_db_missing_index_group_stats_query

Posted on February 22, 2021May 16, 2022 by Erik Darling

No, It’s New

When I was checking out early builds of SQL Server 2019, I noticed a new DMV called dm_db_missing_index_group_stats_query, that I thought was pretty cool.

It helped you tie missing index requests to the queries that requested them. Previously, that took a whole lot of heroic effort, or luck.

With this new DMV, it’s possible to combine queries that look for missing indexes with queries that look for tuning opportunities in the plan cache or in Query Store.

It seems to tie back to dm_db_missing_index_groups, on the index_group_handle column in this DMV joined to the group handle column in the new DMV.

If you’re wondering why I’m not giving you any code samples here, it’s because I’m going to get some stuff built into sp_BlitzIndex to take advantage of it, now that it’s documented.

Special thanks to William Assaf (b|t) for helping to get this done.

Thanks for reading!

Going Further

Parameter Sniffing Is Usually A Good Thing In SQL Server

Posted on February 19, 2021May 16, 2022 by Erik Darling

Tick Tock

I talk to a lot of people about performance tuning. It seems like once someone is close enough to a database for long enough, they’ll have some impression of parameter sniffing. Usually a bad one.

You start to hear some funny stuff over and over again:

We should always recompile
We should always use local variables
We should always recompile and use local variables

Often, even if it means writing unsafe dynamic SQL, people will be afraid to parameterize things.

Between Friends

To some degree, I get it. You’re afraid of incurring some new performance problem.

You’ve had the same mediocre performance for years, and you don’t wanna make something worse.

The thing is, you could be making things a lot better most of the time.

Fewer compiles and recompiles, fewer single-use plans, fewer queries with multiple plans
Avoiding the local variable nonsense is, more often than not, going to get you better performance

A Letter To You

I’m going to tell you something that you’re not going to like, here.

Most of the time when I see a parameter sniffing problem, I see a lot of other problems.

Shabbily written queries, obvious missing indexes, and a whole list of other things.

It’s not that you have a parameter sniffing problem, you have a general negligence problem.

After all, the bad kind of parameter sniffing means that you’ve got variations of a query plan that don’t perform well on variations of parameters.

Once you start taking care of the basics, you’ll find a whole lot less of the problems that keep you up at night.

If that’s the kind of thing you need help with, drop me a line.

Thanks for reading!

Going Further

A General Indexing Strategy For Normal Queries In SQL Server

Posted on February 18, 2021May 16, 2022 by Erik Darling

Find Your Data First

Most queries will have a where clause. I’ve seen plenty that don’t. Some of’em have surprised the people who developed them far more than they surprised me.

But let’s start there, because it’s a pretty important factor in how you design your indexes. There are all sorts of things that indexes can help, but the first thing we want indexes to do in general is help us locate data.

Why? Because the easier we can locate data, the easier we can eliminate rows early on in the query plan. I’m not saying we always need to have an index seek, but we generally want to filter out rows we don’t care about when we’re touching the table they’re in.

Burdens

When we carry excess rows throughout the query plan, all sorts of things get impacted and can become less efficient. This goes hand in hand with cardinality estimation.

At the most severe, rows can’t be filtered when we touch tables, or even join them together, and we have to filter them out later.

I wrote about that here and here.

When that happens, it’s probably not your indexes that are the problem — it’s you.

You, specifically. You and your awful query.

We can take a page from the missing index request feature here: helping queries find the rows we care about should be a priority.

Sweet N’ Low

When people talk about the order predicates are evaluated in, the easiest way to influence that is with the order of columns in the key of your index.

Since that defines the sort order of the index, if you want a particular column to be evaluated first, put it first in the key of the index.

Selectivity is a decent attribute to consider, but not the end all be all of index design.

Equality predicates preserve ordering of other key columns in the index, which may or may not become important depending on what your query needs to accomplish.

Post Where

After the where clause, there are some rather uncontroversial things that indexes can help with:

Joins
Grouping
Ordering

Of course, they help with this because indexes put data in order.

Having rows in a deterministic order makes the above things either much easier (joining and grouping), or free (ordering).

How we decide on key column order necessarily has to take each part of the query involved into account.

If a query is so complicated that creating one index to help it would mean a dozen key columns, you probably need to break things down further.

Use a #temp table to materialize a less complicated portion
Get yourself a nonclustered columnstore index to read from

Minnow

When you’re trying to figure out a good index for one query, you usually want to start with the where clause.

Not always, but it makes sense in most cases because it’s where you can find gains in efficiency.

If your index doesn’t support your where clause, you’re gonna see an index scan and freak out and go in search of your local seppuku parlor.

After that, look to other parts of your query that could help you eliminate rows. Joins are an obvious choice, and typically make good candidates for index key columns.

At this point, your query might be in good enough shape, and you can leave other things alone.

If so, great! You can make the check out to cache. I mean cash.

Thanks for reading!

Going Further

Reconsidering SQL Server’s Missing Index Requests

Posted on February 17, 2021May 16, 2022 by Erik Darling

Milk Carton

Part of reviewing any server necessarily includes reviewing indexes. When you’re working through things that matter, like unused indexes, duplicative indexes, heaps, etc. it’s pretty clear cut what you should do to fix them.

Missing indexes are a different animal though. You have three general metrics to consider with them:

Uses: the number of times a query could have used the index
Impact: how much the optimizer thinks it can reduce the cost of the query by
Query cost: How much the optimizer estimates the query will cost to run

Of those metrics, impact and query cost are entirely theoretical. I’ve written quite a bit about query costing and how it can be misleading. If you really wanna get into it, you can watch the whole series here.

In short: you might have very expensive queries that finish very quickly, and you might have very low cost queries that finish very slowly.

Especially in cases of parameter sniffing, a query plan with a very low cost might get compiled and generate a missing index request. What happens if every other execution of that query re-uses the cheaply-costed plan and runs for a very long time?

You might have a missing index request that looks insignificant.

Likewise, impact is how much the optimizer thinks it can reduce the cost of the current plan by. Often, you’ll create a new index and get a totally different plan. That plan may be more or less expensive that the previous plan. It’s all a duck hunt.

The most reliable of those three metrics is uses. I’m not saying it’s perfect, but there’s a bit less Urkeling there.

When you’re looking at missing index requests, don’t discount those with lots of uses for low cost queries. Often, they’re more important than they look.

Thanks for reading!

Going Further

What’s Really Different About In-Memory Table Variables In SQL Server?

Posted on February 16, 2021May 16, 2022 by Erik Darling

Kendra, Kendra, Kendra

My dear friend Kendra asked… Okay, look, I might have dreamed this. But I maybe dreamed that she asked what people’s Cost Threshold For Blogging™ is. Meaning, how many times do you have to get asked a question before you write about it.

I have now heard people talking and asking about in-memory table variables half a dozen times, so I guess here we are.

Talking about table variables.

In memory.

Yes, Have Some

First, yes, they do help relieve tempdb contention if you have code that executes under both high concurrency and frequency. And by high, I mean REALLY HIGH.

Like, Snoop Dogg high.

Because you can’t get rid of in memory stuff, I’m creating a separate database to test in.

Here’s how I’m doing it!

CREATE DATABASE trash;

ALTER DATABASE trash 
ADD FILEGROUP trashy 
    CONTAINS MEMORY_OPTIMIZED_DATA ;
     
ALTER DATABASE trash 
ADD FILE 
(
    NAME=trashcan, 
    FILENAME='D:\SQL2019\maggots'
) 
TO FILEGROUP trashy;

USE trash;

CREATE TYPE PostThing 
AS TABLE
(
    OwnerUserId int,
    Score int,
    INDEX o HASH(OwnerUserId)
    WITH(BUCKET_COUNT = 100)
) WITH
(
    MEMORY_OPTIMIZED = ON
);
GO

Here’s how I’m testing things:

CREATE OR ALTER PROCEDURE dbo.TableVariableTest(@Id INT)
AS
BEGIN

    SET NOCOUNT, XACT_ABORT ON;
    
    DECLARE @t AS PostThing;
    DECLARE @i INT;

    INSERT @t 
        ( OwnerUserId, Score )
    SELECT 
        p.OwnerUserId,
        p.Score
    FROM Crap.dbo.Posts AS p
    WHERE p.OwnerUserId = @Id;

    SELECT 
        @i = SUM(t.Score)
    FROM @t AS t
    WHERE t.OwnerUserId = 22656
    GROUP BY t.OwnerUserId;

    SELECT 
        @i = SUM(t.Score)
    FROM @t AS t
    GROUP BY t.OwnerUserId;

END;
GO

Hot Suet

So like, the first thing I did was use SQL Query Stress to run this on a bunch of threads, and I didn’t see any tempdb contention.

So that’s cool. But now you have a bunch of stuff taking up space in memory. Precious memory. Do you have enough memory for all this?

Marinate on that.

Well, okay. Surely they must improve on all of the issues with table variables in some other way:

Modifications can’t go parallel
Bad estimates
No column level stats

But, nope. No they don’t. It’s the same crap.

Minus the tempdb contetion.

Plus taking up space in memory.

But 2019

SQL Server 2019 does offer the same table level cardinality estimate for in-memory table variables as regular table variables.

If we flip database compatibility levels to 150, deferred compilation kicks in. Great. Are you on SQL Server 2019? Are you using compatibility level 150?

Don’t get too excited.

Let’s give this a test run in compat level 140:

DECLARE @i INT = 22656;
EXEC dbo.TableVariableTest @Id = @i;

SQL Server Query Plan — everything counts in large amounts

Switching over to compat level 150:

Candy Girl

So what do memory optimized table variables solve?

Not the problem that table variables in general cause.

They do help you avoid tempdb contention, but you trade that off for them taking up space in memory.

Precious memory.

Do you have enough memory?

Thanks for reading!

Going Further

Improving The Performance Of Modifications In SQL Server

Posted on February 15, 2021May 16, 2022 by Erik Darling

Pssst!

If you landed here from Brent’s weekly links, use this link to get my training for 90% off.

The access is for life, but this coupon code isn’t! Get it while it lasts.

Discount applies at checkout, and you have to purchase everything for it to apply.

Obviously

There are going to be situations where it’s smarter to change different aspects of code like this:

But I know how it is out there! Sometimes it’s hard to get in and change a bunch of logic and tinker with things.

In some cases, you can improve performance by wrapping chunks of code in transactions.

Fear Of Commitment

In this example, there’s an automatic commit every time the update completes. That means every time we step through the loop, we send a record to the transaction log.

This can result in very chatty behavior, which even good storage can have a tough time with. There are likely other aspects of transaction logging impacted by this, but I only have so much time before this call starts.

SET NOCOUNT ON;

DECLARE 
    @cur_user int = 0,
    @max_user int = 0;

SELECT 
    @cur_user = MIN(u.Id), 
    @max_user = MAX(u.Id)
FROM dbo.Users AS u
WHERE u.Age IS NULL;

WHILE @cur_user <= @max_user
BEGIN

    UPDATE u
        SET u.Age = DATEDIFF(YEAR, u.CreationDate, u.LastAccessDate)
    FROM dbo.Users AS u
    WHERE u.Id = @cur_user
    AND   u.Age IS NULL;

    SET @cur_user = (SELECT MIN(u.Id) FROM dbo.Users AS u WHERE u.Id > @cur_user);

END;

This code runs for nearly 5 minutes before completing. Looking at a ~60 second sample turns up some gnarly gnumbers.

Batched Commit

Without changing the logic of the update, we can get things in better shape by using transactions and periodically committing them.

SET NOCOUNT ON;

DECLARE 
    @rows bigint = 0,
    @cur_user int = 0,
    @max_user int = 0;

SELECT 
    @cur_user = MIN(u.Id), 
    @max_user = MAX(u.Id)
FROM dbo.Users AS u
WHERE u.Age IS NULL;

BEGIN TRANSACTION;

WHILE @cur_user <= @max_user
BEGIN

    UPDATE u
        SET u.Age = DATEDIFF(YEAR, u.CreationDate, u.LastAccessDate)
    FROM dbo.Users AS u
    WHERE u.Id = @cur_user
    AND   u.Age IS NULL;

    IF @rows = (@rows + @@ROWCOUNT)
    BEGIN
        COMMIT TRANSACTION;
        RETURN;
    END;
    ELSE
    BEGIN
        SET @rows = (@rows + @@ROWCOUNT);
        SET @cur_user = (SELECT MIN(u.Id) FROM dbo.Users AS u WHERE u.Id > @cur_user AND u.Age IS NULL);
    END;

    IF @rows >= 50000
    BEGIN
        RAISERROR('Restarting', 0, 1) WITH NOWAIT;
        SET @rows = 0;
        COMMIT TRANSACTION;
        BEGIN TRANSACTION;
    END;

END;

IF @@TRANCOUNT > 0
COMMIT

The first thing we’ll notice is that the code finishes in about 1 minute rather than 5 minutes.

How nice! I love when things move along. The metrics look a bit better, too.

We have almost no waits on WRITELOG, and we write far less to the transaction log (35MB vs 13MB).

We also got to do some snazzy stuff with @@ROWCOUNT. Good job, us.

Thanks for reading!

Going Further

When Index Sort Direction Matters For Query Performance In SQL Server

Posted on February 12, 2021May 16, 2022 by Erik Darling

Ever Helpful

I got a mailbag question recently about some advice that floats freely around the internet regarding indexing for windowing functions.

But even after following all the best advice that Google could find, their query was still behaving poorly.

Why, why why?

Ten Toes Going Up

Let’s say we have a query that looks something like this:

SELECT
    u.DisplayName,
    u.Reputation,
    p.Score, 
    p.PostTypeId
FROM dbo.Users AS u
JOIN
(
    SELECT
        p.Id,
    	p.OwnerUserId,
    	p.Score,
    	p.PostTypeId,
    	ROW_NUMBER() OVER
    	(
    	    PARTITION BY
    		    p.OwnerUserId,
    			p.PostTypeId
    		ORDER BY
    		    p.Score DESC
    	) AS n
    FROM dbo.Posts AS p
) AS p
    ON  p.OwnerUserId = u.Id
    AND p.n = 1
WHERE u.Reputation >= 500000
ORDER BY u.Reputation DESC,
         p.Score DESC;

Without an index, this’ll drag on forever. Or about a minute.

But with a magical index that we heard about, we can fix everything!

Ten Toes Going Down

And so we create this mythical, magical index.

CREATE INDEX bubble_hard_in_the_double_r
ON dbo.Posts
(
    OwnerUserId ASC, 
    PostTypeId ASC, 
    Score ASC
);

But there’s still something odd in our query plan. Our Sort operator is… Well, it’s still there.

Oddly, we need to sort all three columns involved in our Windowing Function, even though the first two of them are in proper index order.

OwnerUserId and PostTypeId are both in ascending order. The only one that we didn’t stick to the script on is Score, which is asked for in descending order.

Dram Team

This is a somewhat foolish situation, all around. One column being out of order causing a three column sort is… eh.

We really need this index, instead:

CREATE INDEX bubble_hard_in_the_double_r
ON dbo.Posts
(
    OwnerUserId ASC, 
    PostTypeId ASC, 
    Score DESC
);

Granted, I don’t know that I like this plan at all without parallelism and batch mode, but we’ve been there before.

Thanks for reading!

Going Further

Signs You Need Dynamic SQL To Fix Query Performance Problems In SQL Server

Posted on February 11, 2021May 16, 2022 by Erik Darling

Nothing Works

There are things that queries just weren’t meant to do all at once. Multi-purpose queries are often just a confused jumble with crappy query plans.

If you have a Swiss Army Knife, pull it out. Open up all the doodads. Now try to do one thing with it.

If you didn’t end up with a corkscrew in your eye, I’m impressed.

En Masse

The easiest way to think of this is conditionals. If what happens within a stored procedure or query depends on something that is decided based on user input or some other state of data, you’ve introduced an element of uncertainty to the query optimization process.

Of course, this also depends on if performance is of some importance to you.

Since you’re here, I’m assuming it is. It’s not like I spend a lot of time talking about backups and crap.

There are a lot of forms this can take, but none of them lead to you winning an award for Best Query Writer.

IFTTT

Let’s say a stored procedure will execute a different query based on some prior logic, or an input parameter.

Here’s a simple example:

IF @i = 1
BEGIN
    SELECT
        u.*
    FROM dbo.Users AS u
    WHERE u.Reputation = @i;
END;

IF @i = 2
BEGIN
    SELECT
        p.*
    FROM dbo.Posts AS p
    WHERE p.PostTypeId = @i;
END;

If the stored procedure runs for @i = 1 first, the second query will get optimized for that value too.

Using parameterized dynamic SQL can get you the type of optimization separation you want, to avoid cross-optimization contamination.

I made half of that sentence up.

For more information, read this article.

Act Locally

Local variables are another great use of dynamic SQL, because one query’s local variable is another query’s parameter.

DECLARE @i int = 2;
SELECT
    v.*
FROM dbo.Votes AS v
WHERE v.VoteTypeId = @i;

Doing this will get you weird estimates, and you won’t be happy.

You’ll never be happy.

For more information, read this article.

This Or That

You can replace or reorder the where clause with lots of different attempts at humor, but none of them will be funny.

SELECT
    c.*
FROM dbo.Comments AS c
WHERE (c.Score >= @i OR @i IS NULL);

The optimizer does not consider this SARGable, and it will take things out on you in the long run.

Maybe you’re into that, though. I won’t shame you.

We can still be friends.

For more information, watch this video.

Snortables

Dynamic SQL is so good at helping you with parameter sniffing issues that I have an entire session about it.

Thanks for reading!

Going Further

Mind Your OUTPUT Targets In SQL Server, Some Of Them Hurt Query Performance

Posted on February 10, 2021May 16, 2022 by Erik Darling

Browser History

I’ve blogged about OUTPUT a couple times, and those posts are Still Accurate™

But it’s worth noting that, for the second post OUTPUT forced the query to run serially with no target; just returning data back to SSMS.

Depending on the query behind the putting of the out, parallelism could be quite important.

That’s why in the first post, the put out into a real table didn’t cause performance to suffer.

Of course, if you OUTPUT into a table variable, you still have to deal with table variables being crappy about modifications.

Samesies

If you compare the performance of queries that output into a @table variable vs one that outputs into a #temp table, you’ll see a difference:

Even though the parallel zone is limited here, there’s a big difference in overall query time. Scanning the Votes table singe-threaded vs. in parallel.

When you’re designing processes to be as efficient as possible, paying attention to details like this can make a big difference.

Thanks for reading!