Blog – Page 82 – Darling Data

Software Vendor Mistakes With SQL Server: Writing Functions That Already Exist

Posted on January 27, 2022May 16, 2022 by Erik Darling

Reinventing The Sequel

In my time consulting, I’ve seen dbo-prefixed functions that have internal-function names on a number of occasions, acting like a wrapper to the existing functions that come with SQL Server.

While those are fairly ridiculous, and easy to point out the ridiculousness of, there are some other functions that I see pretty often that are maybe a little less obvious. I’m not talking about functions that have been added more recently that might not be supported by all customer installations.

All Supported Versions

One of the more popular versions of this malady that I see looks something like this:

CREATE OR ALTER FUNCTION 
    dbo.ufn_FmtDt
(
    @d datetime
)
RETURNS char(11)
AS
BEGIN
    RETURN CONVERT(char(11), @d)
END;
GO

There are two big reasons this is bad:

SQL Server doesn’t store dates as strings, at all, ever
You can generate the same internal representation by converting to a date

You gain nothing by encapsulating code like this, at least not since SQL Server 2000 or so.

Do It Again

Using some lessons we learned from earlier in this series, we can keep reasonable up to date with SQL Server’s progress by rewriting the function to something like this:

CREATE OR ALTER FUNCTION 
    dbo.ifn_FmtDt
(
    @d datetime
)
RETURNS TABLE
AS
RETURN 
   
   SELECT d = 
       CONVERT(date, @d);
GO

And now when you need to chop the times off all those dates in your reporting queries, they won’t be artificially held back.

Thanks for reading!

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Software Vendor Mistakes With SQL Server: Not Using Inline Table Valued Functions

Posted on January 26, 2022May 16, 2022 by Erik Darling

Swiss Army

In the last couple posts, I’ve talked about the problems caused by two types of T-SQL functions: Scalar User Defined Functions, and Multi-Statement Table Valued Functions.

I’ve also hinted around about a third type of function, called an Inline Table Valued Function. These are different because there are no built-in issues with them as a feature.

They’re only as bad as the query you put in them, and often rewriting T-SQL Scalar User Defined Functions, and Multi-Statement Table Valued Functions as Inline Table Valued Functions can fix a lot of query performance issues.

In the videos below, I’m going to show you how to rewrite T-SQL Scalar User Defined Functions, and a really cool thing they can do on top of just replacing the bad kinds of functions.

Thanks for reading!

Going Further

Software Vendor Mistakes With SQL Server: Multi Statement Table Valued Functions

Posted on January 25, 2022May 16, 2022 by Erik Darling

Available Units

In T-SQL, there are two kinds of “table valued functions”. One of them returns a select as a derived table (Inline Table Valued Function), and the other returns a @table variable as a result (Multi-Statement Table Valued Functions).

I’ve written before about the problems you’ll run into with T-SQL’s @table variables, so I’m not going to go back over that here.

I will take this time to point out that they don’t behave any better when used in a function than they do when used independent of a function. If you need to stop here to watch that video, go ahead. I’ll be waiting for you.

To learn more about how T-SQL’s Multi-Statement Table Valued Functions can mess up SQL Server query performance, check out the video below. And as always, if this is the kind of thing you love learning about, hit the link below to get 75% off all of my training material.

Thanks for reading!

Going Further

Software Vendor Mistakes With SQL Server: Scalar User Defined Functions

Posted on January 24, 2022May 16, 2022 by Erik Darling

The Very Worst Idea

Scalar UDFs answered a very good question: How do I ruin query performance so that two generations of consultants can make a living off of SQL Server?

In the videos below, which are part of my paid training, I’m going to show you how T-SQL Scalar UDFs ruin query performance, and one way of fixing them. If you’ve got lots of these little devils hanging around your codebase, you’ll wanna pay close attention, here to see how:

T-SQL Scalar UDFs force queries to run single threaded
T-SQL Scalar UDFs run once per row that they process
T-SQL Scalar UDFs hide all the work they actually do in query plans and other metrics

There’s a ton more available in the full paid training courses, so hit the link below to get 75% off the whole thing.

Thanks for reading!

Going Further

Software Vendor Mistakes With SQL Server: Misusing Common Table Expressions

Posted on January 21, 2022May 16, 2022 by Erik Darling

Stinko

In this post, I’m gonna show you how stringing together a bunch of CTEs can cause performance problems with one of my paid training videos. If you like it, hit the link below to get 75% off the entire bundle.

Thanks for reading!

Going Further

Software Vendor Mistakes With SQL Server: Indexing #Temp Tables Incorrectly

Posted on January 20, 2022May 16, 2022 by Erik Darling

Understanding

I’ve covered this sort of information in previous posts, like:

But some points should be made together, so I’m going to combine them a bit, and expand on a few points too.

I know that it’s probably an overly-lofty goal to expect people who don’t seem to have the hang of indexing regular tables down to not repeat those errors with #temp tables.

But hey, hope burns eternal. Like American Spirits (the cigarettes, not some weird metaphorical thing that Americans possess, or ghosts).

Nonclustered Index Follies: Creating Them Before You Insert Data

I’m not saying that you should never add a nonclustered index to a #temp table, but I am saying that they shouldn’t be your first choice. Make sure you have a good clustered index on there first, if you find one useful. Test it. Test it again. Wait a day and test it again.

But more importantly, don’t do this:

CREATE TABLE #bad_idea
(
    a_number int,
    a_date datetime,
    a_string varchar(10),
    a_bit bit
);

CREATE INDEX anu ON #bad_idea(a_number);
CREATE INDEX ada ON #bad_idea(a_date);
CREATE INDEX ast ON #bad_idea(a_string);
CREATE INDEX abi ON #bad_idea(a_bit);

Forget for a minute that these are a bunch of single-column indexes, which I’m naturally and correctly opposed to.

Look what happens when we try to insert data into that #temp table:

You have to insert into the heap (that’s the base table here, since we don’t have a clustered index), and then each of the nonclustered indexes. In general, if you want nonclustered indexes on your #temp tables, you should create them after you insert data, to not mess with parallel inserts and to establish statistics with a full scan of the data.

Nonclustered Index Follies: If You Need Them, Create Them Inline

If for some insane reason you decide that you need indexes on your #temp table up front, you should create everything in a single statement to avoid recompilations.

CREATE TABLE #bad_idea
(
    a_number int,
        INDEX anu (a_number),
    a_date datetime,
        INDEX ada (a_date),
    a_string varchar(10),
        INDEX ast (a_string),
    a_bit bit,
        INDEX abi (a_bit)
);

I don’t have a fantastic demo for that, but I can quote a Great Post™ about #temp tables:

Do not explicitly drop temp tables at the end of a stored procedure, they will get cleaned up when the session that created them ends.
Do not alter temp tables after they have been created.
Do not truncate temp tables
Move index creation statements on temp tables to the new inline index creation syntax that was introduced in SQL Server 2014.

There are some other good points there, too. Pay attention to those as well.

Of course, there is one interesting reason for dropping #temp tables: running out of space in tempdb. I tend to work with clients who need help tuning code and processes that hit many millions of rows or more.

If you’re constantly creating large #temp tables, you may want to clean them up when you’re done with them rather than letting self-cleanup happen at the end of a procedure.

This applies to portions of workloads that have almost nothing in common with OLTP, so you’re unlikely to experience the type of contention that the performance features which apply there also apply here. Reporting queries rarely do.

Thanks for reading!

Going Further

Software Vendor Mistakes With SQL Server: Not Getting Parallel Inserts Into #Temp Tables

Posted on January 19, 2022May 16, 2022 by Erik Darling

Do That, But Faster

Serial zones in parallel plans can leave a lot of performance on the table. One of the best ways to see that is with insert queries that do a lot of work. A big contrast between @table variables and #temp tables is that the former fully disallows parallelism in modification queries, and the latter do not.

The story gets a little more complicated when inserting into a columnstore index, but this post isn’t really about that.

Let’s look at some scenarios where a fully parallel insert is allowed, and then not allowed.

The thing to keep an eye out for is the insert operator being in the serial zone. For the purposes of this thread:

Works: SELECT INTO

As long as you don’t do anything too goofy here, a fully parallel insert will “always” be allowed, here.

Goofy things will be explained later in the post.

--This will "always" work, as long as you don't do 
--anything listed below in the "broken" select
SELECT
    C.UserId,
    SumScore = 
        SUM(C.Score)
INTO
    #AvgComments_SelectInto
FROM 
    dbo.Comments AS C
GROUP BY 
    C.UserId
HAVING 
    SUM(C.Score) > 200;

DROP TABLE #AvgComments_SelectInto;

Works: INSERT, with TABLOCK

Rather than selecting directly into a table, here we’re gonna create the table and issue an insert statement with the tablock hint.

--This *will* get you a fully parallel insert, unless goofiness is involved.
CREATE TABLE
    #AvgComments_Tablock
(
    UserId int,
    SumScore int
);

INSERT 
    #AvgComments_Tablock WITH (TABLOCK)
(
    UserId, 
    SumScore
)
SELECT 
    C.UserId,
    AvgScore = 
        SUM(C.Score)
FROM 
    dbo.Comments AS C
GROUP BY 
    C.UserId
HAVING 
    SUM(C.Score) > 200

DROP TABLE #AvgComments_Tablock

Doesn’t Work: INSERT, without TABLOCK

Without the tablock hint, this will get you the plan we don’t want, where the insert operator is outside the parallel zone.

--This will not get you a fully parallel insert
CREATE TABLE
    #AvgComments_NoTablock
(
    UserId int,
    SumScore int
);

INSERT 
    #AvgComments_NoTablock 
(
    UserId, 
    SumScore
)
SELECT
    C.UserId,
    SumScore = 
        SUM(C.Score)
FROM 
    dbo.Comments AS C
GROUP BY 
    C.UserId
HAVING 
    SUM(C.Score) > 200;

DROP TABLE #AvgComments_NoTablock;

Doesn’t Work: A Whole Laundry List Of Stuff

Basically any one thing quoted out has the ability to deny the parallel insert that we’re after.

If you’re doing any of this stuff, like, bye.

--SET ROWCOUNT Any_Number;
--ALTER DATABASE StackOverflow2013 
--    SET COMPATIBILITY_LEVEL = Anything_Less_Than_130;
CREATE TABLE
    #AvgComments_BrokenTablock
(
    --Id int IDENTITY,
    UserId int,
    SumScore int,
    --INDEX c CLUSTERED(UserId)
    --INDEX n NONCLUSTERED(UserId)
);

--Also, if there's a trigger or indexed view on the target table
--But that's not gonna be the case with #temp tables
INSERT 
    #AvgComments_BrokenTablock WITH (TABLOCK)
(
    UserId, 
    SumScore
)
--The rules here are a little weird, so
--be prepared to see weird things if you use OUTPUT
--OUTPUT Inserted.*
--To the client or
--INTO dbo.some_table
--INTO @table_varible
SELECT
    --Id = IDENTITY(bigint, 1, 1),
    --dbo.A_Scalar_UDF_Not_Inlined_By_Froid
    C.UserId,
    SumScore = 
        SUM(C.Score)
FROM 
    dbo.Comments AS C
--Any reference to the table you're inserting into
--Not exists is just an easy example of that
--WHERE NOT EXISTS
--(
--    SELECT
--	    1/0
--	FROM #AvgComments_BrokenTablock AS A
--	WHERE A.UserId = C.UserId
--)
GROUP BY 
    C.UserId
HAVING 
    SUM(C.Score) > 200;

DROP TABLE #AvgComments_BrokenTablock;

Explainer

There are many good reasons to want a fully parallel insert, but you need to make sure that the bottleneck isn’t earlier in the plan.

If it is, you may not see the full performance gains from getting it.

In general, it’s a good strategy when building larger #temp tables, but at this point I add a tablock hint to every #temp table insert at first to test things out.

Thanks for reading!

Going Further

Software Vendor Mistakes With SQL Server: Not Using #Temp Tables

Posted on January 18, 2022May 16, 2022 by Erik Darling

Wind Charms

After yesterday’s post about when to use or not use @table variables in SQL Server, you can probably make choices more confidently.

Most of the time, you want to be using #temp tables, when plan choices and statistics matter to overall performance, and @table variables when code executes at a high frequency over a small-ish number of rows, where plan choices and statistics don’t matter to overall performance.

In case you didn’t pick that up, or something.

Let’s move on.

Use Cases For #Temp Tables

The best use cases for #temp tables are for materializing things like:

Non-SARGable expressions
Complicated CTEs or Derived Tables
Parameter-sensitive portions of queries
CTEs that need to be referenced multiple times

Just to name a few-plus-one things that can generally be improved.

There are many more, of course. But getting overly-encyclopedic in blog posts tends to be over-productive. Plus, no one reads them, anyway.

What I think the real value of breaking queries up into more atomic pieces is, though, is that it’s a bit easier to isolate exactly which parts are the slowest, and work on them independently.

When you’ve got one gigantic query, it can be difficult to tune or figure out how all the different pieces interact. What’s slow for one execution might be fast for another, and vice-versa.

Chomper

Of course, temporary objects aren’t always strictly necessary. Sometimes it’s enough to break disjunctive predicates up into UNION-ed clauses. Sometimes having the right index or using batch mode can get you where you need to go.

Choosing to use a temporary object comes with choices:

Can I afford to take up this much space in tempdb?
Can I afford to execute this under high concurrency?
Have I exhausted other options for tuning this query?

You don’t necessarily need to answer all of those things immediately, but you should exercise some domain knowledge during tuning efforts.

Thanks for reading!

Going Further

Software Vendor Mistakes With SQL Server: Misusing @Table Variables

Posted on January 17, 2022May 16, 2022 by Erik Darling

The Miseducation Of Everyone

Temporary objects are one of those tricky things. You probably know you should be using them for certain things, but which one to use is a constant source of trial, error, and coin-tosses.

In these videos from my training, I’m going to go through the downsides of table variables. There’s one free video from YouTube at the end about when you should use them, too.

Going Further

Software Vendor Mistakes With SQL Server: Not Taking Advantage Of Index Compression

Posted on January 14, 2022May 16, 2022 by Erik Darling

Greater Than Many

This post is rather short, because it’s more of a link round-up than anything.

I think that index compression is so generally useful that I’d start off any new system with it enabled, just to avoid issues with needing to apply it later. Where it’s particularly useful is on systems where data is plenty, and memory is scarce.

Having index structures that are much smaller both on disk and in memory is quite useful. It’s also nice when you’re on Standard Edition, and you need to make the most of out the 128GB cap on the buffer pool.

For some great information and detail on index compression, check out My Friend Andy™ who has very helpful blog posts about it.

And More

There are some obvious bits here, like being extra careful with choosing string length.

LOB data can lead to weird locking, and mess with memory grants.

And of course, overly-wide, non-normalized tables can also lead to issues.

If you’re running an OLTP workload, you may also want to make sure that your critical tables aren’t heaps.