Software Vendor Mistakes With SQL Server: Using Date Functions On Columns Or Local Variables

Reusable Logic


In most programming languages, it’s quite sensible to create a variable or parameter, use some predefined logic to assign it a value, and then keep reusing it to prevent having to execute the same code over and over again.

But those languages are all procedural, and have a bit of a different set of rules and whatnot. In SQL Server, there are certainly somewhat procedural elements.

  • Functions
  • Control-flow logic
  • Cursors
  • While loops
  • Maybe the inner side of Nested Loops joins

You may be able to name some more, if you really get to thinking about it. That should be a common enough list, though.

Reusable Problems


SQL Server has a wonderful optimizer. It’s capable of many things, but it also has some problems.

Many of those problems exist today for “backwards compatibility”. In other words: play legacy games, win legacy performance.

Lots of people have found “workarounds” that rely on exploiting product behavior, and taking that away or changing it would result in… something else.

That’s why so many changes (improvements?) are hidden behind trace flags, compatibility levels, hints, batch mode, and other “fences” that you have to specifically hop to see if the grass is greener.

One of those things is the use of local variables. The linked post details how lousy those can be.

In this post, I show how you’re better off using the date math expressions instead.

And in this post, I show how you’re better off doing date math on parameters rather than on columns.

Let’s bring all that together!

Reusable Solutions


In SQL Server, context is everything. By context, I mean the way different methods of query execution are able to accept arguments from others.

You’ll sometimes hear this referred to as scope, too. Usually people will say inner context/scope and outer context/scope, or something similar.

What that means is something like this, if we’re talking about stored procedures:

CREATE PROCEDURE
    dbo.InnerContext
(
    @StartDate datetime,
    @EndDate datetime
)
AS
BEGIN
SET NOCOUNT, XACT_ABORT ON;

    SELECT
        C.PostId,
    	Score = 
    	    SUM(C.Score)
    FROM dbo.Comments AS C
    JOIN dbo.Votes AS V
        ON C.PostId = V.PostId
    WHERE C.CreationDate >= @StartDate
    AND   c.CreationDate <  @EndDate
    GROUP BY c.PostId;

END;
GO 

CREATE PROCEDURE
    dbo.OuterContext
(
    @StartDate datetime,
    @EndDate datetime
)
AS
BEGIN
SET NOCOUNT, XACT_ABORT ON;

IF @StartDate IS NULL
BEGIN
   SELECT
       @StartDate = GETDATE();
END;

IF @EndDate IS NULL
BEGIN
   SELECT
       @EndDate = DATEADD(DAY, 30, GETDATE());
END;

EXEC dbo.InnerContext
    @StartDate = @StartDate,
    @EndDate = @EndDate;

END;

If you’re okay using dynamic SQL, and really, you should be because it’s awesome when you’re not bad at it, you can do something like this:

CREATE PROCEDURE
    dbo.OuterContext
(
    @StartDate datetime,
    @EndDate datetime
)
AS
BEGIN
SET NOCOUNT, XACT_ABORT ON;

IF @StartDate IS NULL
BEGIN
   SELECT
       @StartDate = GETDATE();
END;

IF @EndDate IS NULL
BEGIN
   SELECT
       @EndDate = DATEADD(DAY, 30, GETDATE());
END;

DECLARE 
    @sql nvarchar(MAX) = N'
    /*dbo.OuterContext*/
    SELECT
        C.PostId,
    	Score = 
    	    SUM(C.Score)
    FROM dbo.Comments AS C
    JOIN dbo.Votes AS V
        ON C.PostId = V.PostId
    WHERE C.CreationDate >= @StartDate
    AND   c.CreationDate <  @EndDate
    GROUP BY c.PostId;		
    ';

EXEC sys.sp_executesql
    @sql,
  N'@StartDate datetime, 
    @EndDate datetime',
    @StartDate,
    @EndDate;

END;

Which will achieve the same thing.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Software Vendor Mistakes With SQL Server: Using Functions In Join Or Where Clauses

Easy Thinking


To start with, let’s classify functions into two varietals:

  • Ones built-in to SQL Server
  • Ones that developers write

Of those, built-in functions are pretty much harmless when they’re in the select list. Classifying things a bit further for the ones user writes, we have:

Out of the four up there, only the last one doesn’t have a reference link. Why? Because I don’t write C# — I’m not that smart — if you need someone smart about that, go read my friend Josh’s blog. He’s quite capable.

If you’re too lazy to go read the three reference links:

  • Scalar User Defined Functions generally wreck performance
  • Multi-Statement Table Valued Functions stand a high chance of generally wrecking performance
  • Inline Table Valued Functions are okay as long as you don’t do anything awful in them

Smart Thinking


The important thing to understand is that using any of these functions, let’s call it below the belt, can really mess things up for query performance in new and profound ways compared to what they can do in just the select list.

To be more specific for you, dear developer reader, let’s frame below the belt as anything underneath the from clause. Things here get particularly troublesome, because much of the activity here is considered relational, whereas stuff up above is mostly just informational.

Why is the relational stuff a much bigger deal than the informational stuff? Because that’s where all the math happens in a query plan, and SQL Server’s optimizer decides on all sorts of things at compile-time, like:

  • Which indexes to use
  • Join order
  • Join types
  • Memory grants
  • Parallelism
  • Seeks and Scans
  • Aggregate types
  • Much, much more!

Those things are partially based on how well it’s able to estimate the number of rows that join and where conditions will produce.

Sticking functions in the way of those join and where conditions is a bit like putting a blindfold on SQL Server’s optimization and cardinality estimation process and asking it to swing a bowling ball sharp saber at a piece of confetti at 100 paces.

In other words, don’t complain when your query plans suck and your queries run slow. You’re doing the hobbling, you dirty bird.

Future Thinking


If you want your customers, users, or whatever you want to call them, to be reliably happy in the future, even as their database sizes grow beyond your wildest imagination, and your application gets used in ways that would make Caligula blush, you need to start by obeying the first law of database physics: thou shalt not get in the way of the optimizer.

Going back to a couple other laws of  database physics that cannot be ignored:

  • Anything that makes your job easier makes the optimizer’s job harder
  • Store data the way you query it, and query data the way you store it

If it makes you feel better, stick a few thous and shalls or shalt nots or whences or whenceforths in there. It might make you happier.

It will make your customers, users, or whatever you want to call them happier, if you listen to me.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Software Vendor Mistakes With SQL Server: String Splitting With Functions

Advent


Since SQL Server 2016, at least for databases in a similarly current compatibility level, the STRING_SPLIT function has been available. There were some problems with it initially that have recently been solved, too, like a lack of an “ordinal position” indicator, to show you when items occurred in a string.

But there’s still a bigger problem, since it’s sort of a Jack-Of-All-Splitters, you might not be getting the datatype back that you want. From the docs:

If the ordinal output column is not enabled, STRING_SPLIT returns a single-column table whose rows are the substrings. The name of the column is value. It returns nvarchar if any of the input arguments are either nvarchar or nchar. Otherwise, it returns varchar. The length of the return type is the same as the length of the string argument.

If you’re comparing the output to something that isn’t stringy, you might hit some weird implicit conversion issues.

Another significant problem is that SQL Server’s optimizer can’t see inside the string you pass in to generate any meaningful statistics information about the values in there.

This is generally true of many of the unpleasant T-SQL User Defined Functions I’ve seen written to split string prior to this, too.

Playtime


The big blaring headline here should be that if you’re going to do split any string and attempt to filter or join to it, you should put the parsed results into a #temp table first, with whatever indexing is appropriate. If you don’t do that, you’re gonna see all the screwy stuff that I mention below.

First, you only get a 50 row estimate no matter how many separate values are in the string. That might be fine for you, but it might not be fine for everyone.

Let’s start by throwing a real knuckleball to the function.

DECLARE 
    @x nvarchar(MAX) = N'';

SELECT 
    @x += 
        (
               SELECT DISTINCT
                   [text()] = 
                       ',' + RTRIM(P.OwnerUserId)
               FROM dbo.Posts AS P
               FOR XML 
                   PATH(''),
                   TYPE
        ).value
            (
                './text()[1]', 
                'nvarchar(max)'
            );

SELECT
    p.OwnerUserId,
    TotalScore = 
        SUM(p.Score * 1.)
FROM dbo.Posts AS P
WHERE P.OwnerUserId IN
(
    SELECT
        SS.value
    FROM STRING_SPLIT(@x, ',') AS SS
)
GROUP BY P.OwnerUserId
OPTION(RECOMPILE);

The code above takes every unique value in the OwnerUseId column in the Posts table and adds them to a comma separated list. Why didn’t I use the STRING_AGG function? I forgot. And after so many years, I have the XML equivalent memorized.

Even with a recompile hint on board, this is the query plan we get:

SQL Server Query Plan

 

Spending 8 seconds inside the splitter function feels pretty bad on its own, but then the entire query plan turns out crappy because of the 50 row estimate, and takes 11 minutes to finish in total. I spent 11 minutes of my life waiting for that to finish. Just for you.

Think about that.

Regularly


Under more normal circumstances, the lack of introspection (of the statistical variety) that you get from directly querying the splitter function can lead to a bad time.

I’m intentionally including a value with a fair amount of skew in the list-to-be-parsed, that leads to what might be a bad plan choice, based on that lack of introspection:

SELECT
    p.OwnerUserId,
    TotalScore = 
        SUM(p.Score * 1.)
FROM dbo.Posts AS P
WHERE P.OwnerUserId IN
(
    SELECT
        SS.value
    FROM STRING_SPLIT('1138,22656', ',') AS SS
)
GROUP BY P.OwnerUserId;
SQL Server Query Plan
zee problemmen!

We get a 50 row guess for two values, but one of those values has ~28k rows associated with it in the Posts table. Since there’s no stats on the internal elements of the string, we don’t get a good estimate for it at all

Big Awkward


If you, or one of your users, throws a malformed string into the mix, you might get some unexpected results.

On top of the values that we just passed in, I’m also going to pass in an empty string at the end after the final comma:

SELECT
    p.OwnerUserId,
    TotalScore = 
        SUM(p.Score * 1.)
FROM dbo.Posts AS P
WHERE P.OwnerUserId IN
(
    SELECT
        SS.value
    FROM STRING_SPLIT('1138,22656, ', ',') AS SS
)
GROUP BY P.OwnerUserId;

This time around, the results are a little different from above. The space at the end is implicitly converted from an empty string to the number 0, and things look like this:

SQL Server Query Plan
played

Most all of the bad guesses are retained as the previous plan, but since the number 0 accounts for a whole mess of rows too, things get a whole lot worse.

We go from 4.5 seconds to 30 seconds, and blah blah blah everything is horrible.

If you want to get around this, sure, you can add a filter to remove empty strings from the results of the splitter, but if you’re reading this, chances are you might not have the most defensive T-SQL coding practices in place. to begin with.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Software Vendor Mistakes With SQL Server: Writing Functions That Already Exist

Reinventing The Sequel


In my time consulting, I’ve seen dbo-prefixed functions that have internal-function names on a number of occasions, acting like a wrapper to the existing functions that come with SQL Server.

While those are fairly ridiculous, and easy to point out the ridiculousness of, there are some other functions that I see pretty often that are maybe a little less obvious. I’m not talking about functions that have been added more recently that might not be supported by all customer installations.

All Supported Versions


One of the more popular versions of this malady that I see looks something like this:

CREATE OR ALTER FUNCTION 
    dbo.ufn_FmtDt
(
    @d datetime
)
RETURNS char(11)
AS
BEGIN
    RETURN CONVERT(char(11), @d)
END;
GO

There are two big reasons this is bad:

  • SQL Server doesn’t store dates as strings, at all, ever
  • You can generate the same internal representation by converting to a date

You gain nothing by encapsulating code like this, at least not since SQL Server 2000 or so.

Do It Again


Using some lessons we learned from earlier in this series, we can keep reasonable up to date with SQL Server’s progress by rewriting the function to something like this:

CREATE OR ALTER FUNCTION 
    dbo.ifn_FmtDt
(
    @d datetime
)
RETURNS TABLE
AS
RETURN 
   
   SELECT d = 
       CONVERT(date, @d);
GO

And now when you need to chop the times off all those dates in your reporting queries, they won’t be artificially held back.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Software Vendor Mistakes With SQL Server: Not Using Inline Table Valued Functions

Swiss Army


In the last couple posts, I’ve talked about the problems caused by two types of T-SQL functions: Scalar User Defined Functions, and Multi-Statement Table Valued Functions.

I’ve also hinted around about a third type of function, called an Inline Table Valued Function. These are different because there are no built-in issues with them as a feature.

They’re only as bad as the query you put in them, and often rewriting T-SQL Scalar User Defined Functions, and Multi-Statement Table Valued Functions as Inline Table Valued Functions can fix a lot of query performance issues.

In the videos below, I’m going to show you how to rewrite T-SQL Scalar User Defined Functions, and a really cool thing they can do on top of just replacing the bad kinds of functions.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Software Vendor Mistakes With SQL Server: Multi Statement Table Valued Functions

Available Units


In T-SQL, there are two kinds of “table valued functions”. One of them returns a select as a derived table (Inline Table Valued Function), and the other returns a @table variable as a result (Multi-Statement Table Valued Functions).

I’ve written before about the problems you’ll run into with T-SQL’s @table variables, so I’m not going to go back over that here.

I will take this time to point out that they don’t behave any better when used in a function than they do when used independent of a function. If you need to stop here to watch that video, go ahead. I’ll be waiting for you.

To learn more about how T-SQL’s Multi-Statement Table Valued Functions can mess up SQL Server query performance, check out the video below. And as always, if this is the kind of thing you love learning about, hit the link below to get 75% off all of my training material.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Software Vendor Mistakes With SQL Server: Scalar User Defined Functions

The Very Worst Idea


Scalar UDFs answered a very good question: How do I ruin query performance so that two generations of consultants can make a living off of SQL Server?

In the videos below, which are part of my paid training, I’m going to show you how T-SQL Scalar UDFs ruin query performance, and one way of fixing them. If you’ve got lots of these little devils hanging around your codebase, you’ll wanna pay close attention, here to see how:

  • T-SQL Scalar UDFs force queries to run single threaded
  • T-SQL Scalar UDFs run once per row that they process
  • T-SQL Scalar UDFs hide all the work they actually do in query plans and other metrics

There’s a ton more available in the full paid training courses, so hit the link below to get 75% off the whole thing.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Software Vendor Mistakes With SQL Server: Misusing Common Table Expressions

Stinko


In this post, I’m gonna show you how stringing together a bunch of CTEs can cause performance problems with one of my paid training videos. If you like it, hit the link below to get 75% off the entire bundle.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Software Vendor Mistakes With SQL Server: Indexing #Temp Tables Incorrectly

Understanding


I’ve covered this sort of information in previous posts, like:

But some points should be made together, so I’m going to combine them a bit, and expand on a few points too.

I know that it’s probably an overly-lofty goal to expect people who don’t seem to have the hang of indexing regular tables down to not repeat those errors with #temp tables.

But hey, hope burns eternal. Like American Spirits (the cigarettes, not some weird metaphorical thing that Americans possess, or ghosts).

Nonclustered Index Follies: Creating Them Before You Insert Data


I’m not saying that you should never add a nonclustered index to a #temp table, but I am saying that they shouldn’t be your first choice. Make sure you have a good clustered index on there first, if you find one useful. Test it. Test it again. Wait a day and test it again.

But more importantly, don’t do this:

CREATE TABLE #bad_idea
(
    a_number int,
    a_date datetime,
    a_string varchar(10),
    a_bit bit
);

CREATE INDEX anu ON #bad_idea(a_number);
CREATE INDEX ada ON #bad_idea(a_date);
CREATE INDEX ast ON #bad_idea(a_string);
CREATE INDEX abi ON #bad_idea(a_bit);

Forget for a minute that these are a bunch of single-column indexes, which I’m naturally and correctly opposed to.

Look what happens when we try to insert data into that #temp table:

SQL Server Query Plan
the bucket

You have to insert into the heap (that’s the base table here, since we don’t have a clustered index), and then each of the nonclustered indexes. In general, if you want nonclustered indexes on your #temp tables, you should create them after you insert data, to not mess with parallel inserts and to establish statistics with a full scan of the data.

Nonclustered Index Follies: If You Need Them, Create Them Inline


If for some insane reason you decide that you need indexes on your #temp table up front, you should create everything in a single statement to avoid recompilations.

CREATE TABLE #bad_idea
(
    a_number int,
        INDEX anu (a_number),
    a_date datetime,
        INDEX ada (a_date),
    a_string varchar(10),
        INDEX ast (a_string),
    a_bit bit,
        INDEX abi (a_bit)
);

I don’t have a fantastic demo for that, but I can quote a Great Post™ about #temp tables:

  1. Do not explicitly drop temp tables at the end of a stored procedure, they will get cleaned up when the session that created them ends.

  2. Do not alter temp tables after they have been created.

  3. Do not truncate temp tables

  4. Move index creation statements on temp tables to the new inline index creation syntax that was introduced in SQL Server 2014.

There are some other good points there, too. Pay attention to those as well.

Of course, there is one interesting reason for dropping #temp tables: running out of space in tempdb. I tend to work with clients who need help tuning code and processes that hit many millions of rows or more.

If you’re constantly creating large #temp tables, you may want to clean them up when you’re done with them rather than letting self-cleanup happen at the end of a procedure.

This applies to portions of workloads that have almost nothing in common with OLTP, so you’re unlikely to experience the type of contention  that the performance features which apply there also apply here. Reporting queries rarely do.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Software Vendor Mistakes With SQL Server: Not Getting Parallel Inserts Into #Temp Tables

Do That, But Faster


Serial zones in parallel plans can leave a lot of performance on the table. One of the best ways to see that is with insert queries that do a lot of work. A big contrast between @table variables and #temp tables is that the former fully disallows parallelism in modification queries, and the latter do not.

The story gets a little more complicated when inserting into a columnstore index, but this post isn’t really about that.

Let’s look at some scenarios where a fully parallel insert is allowed, and then not allowed.

The thing to keep an eye out for is the insert operator being in the serial zone. For the purposes of this thread:

SQL Server Query Plan
attention, please

Works: SELECT INTO


As long as you don’t do anything too goofy here, a fully parallel insert will “always” be allowed, here.

Goofy things will be explained later in the post.

--This will "always" work, as long as you don't do 
--anything listed below in the "broken" select
SELECT
    C.UserId,
    SumScore = 
        SUM(C.Score)
INTO
    #AvgComments_SelectInto
FROM 
    dbo.Comments AS C
GROUP BY 
    C.UserId
HAVING 
    SUM(C.Score) > 200;

DROP TABLE #AvgComments_SelectInto;

Works: INSERT, with TABLOCK


Rather than selecting directly into a table, here we’re gonna create the table and issue an insert statement with the tablock hint.

--This *will* get you a fully parallel insert, unless goofiness is involved.
CREATE TABLE
    #AvgComments_Tablock
(
    UserId int,
    SumScore int
);

INSERT 
    #AvgComments_Tablock WITH (TABLOCK)
(
    UserId, 
    SumScore
)
SELECT 
    C.UserId,
    AvgScore = 
        SUM(C.Score)
FROM 
    dbo.Comments AS C
GROUP BY 
    C.UserId
HAVING 
    SUM(C.Score) > 200

DROP TABLE #AvgComments_Tablock

Doesn’t Work: INSERT, without TABLOCK


Without the tablock hint, this will get you the plan we don’t want, where the insert operator is outside the parallel zone.

--This will not get you a fully parallel insert
CREATE TABLE
    #AvgComments_NoTablock
(
    UserId int,
    SumScore int
);

INSERT 
    #AvgComments_NoTablock 
(
    UserId, 
    SumScore
)
SELECT
    C.UserId,
    SumScore = 
        SUM(C.Score)
FROM 
    dbo.Comments AS C
GROUP BY 
    C.UserId
HAVING 
    SUM(C.Score) > 200;

DROP TABLE #AvgComments_NoTablock;

Doesn’t Work: A Whole Laundry List Of Stuff


Basically any one thing quoted out has the ability to deny the parallel insert that we’re after.

If you’re doing any of this stuff, like, bye.

--SET ROWCOUNT Any_Number;
--ALTER DATABASE StackOverflow2013 
--    SET COMPATIBILITY_LEVEL = Anything_Less_Than_130;
CREATE TABLE
    #AvgComments_BrokenTablock
(
    --Id int IDENTITY,
    UserId int,
    SumScore int,
    --INDEX c CLUSTERED(UserId)
    --INDEX n NONCLUSTERED(UserId)
);

--Also, if there's a trigger or indexed view on the target table
--But that's not gonna be the case with #temp tables
INSERT 
    #AvgComments_BrokenTablock WITH (TABLOCK)
(
    UserId, 
    SumScore
)
--The rules here are a little weird, so
--be prepared to see weird things if you use OUTPUT
--OUTPUT Inserted.*
--To the client or
--INTO dbo.some_table
--INTO @table_varible
SELECT
    --Id = IDENTITY(bigint, 1, 1),
    --dbo.A_Scalar_UDF_Not_Inlined_By_Froid
    C.UserId,
    SumScore = 
        SUM(C.Score)
FROM 
    dbo.Comments AS C
--Any reference to the table you're inserting into
--Not exists is just an easy example of that
--WHERE NOT EXISTS
--(
--    SELECT
--	    1/0
--	FROM #AvgComments_BrokenTablock AS A
--	WHERE A.UserId = C.UserId
--)
GROUP BY 
    C.UserId
HAVING 
    SUM(C.Score) > 200;

DROP TABLE #AvgComments_BrokenTablock;

Explainer


There are many good reasons to want a fully parallel insert, but you need to make sure that the bottleneck isn’t earlier in the plan.

If it is, you may not see the full performance gains from getting it.

In general, it’s a good strategy when building larger #temp tables, but at this point I add a tablock hint to every #temp table insert at first to test things out.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.