SQL Server Archives | Page 105 of 163

Starting SQL: Query Execution In SQL Server

Posted on July 6, 2021May 16, 2022 by Erik Darling

Starting SQL: Query Execution

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Free Training For SQL Server Beginners

Posted on July 5, 2021May 16, 2022 by Erik Darling

Month Of Sundays

All this month I’m going to be giving readers (watchers? stalkers?) access to my Starting SQL video course.

Enjoy!

Going Further

Don’t use ISNUMERIC: Use TRY_CONVERT or a CASE Expression In SQL Server

Posted on July 2, 2021May 16, 2022 by Erik Darling

Kid Anymore

A while back I blogged about how ISNUMERIC can give you some wacky results.

Working with a client… Well, it’s hard to know based on my publishing schedule. But they were using ISNUMERIC all over the place, and getting the same wacky results, and even errors.

Here’s a short setup:

SELECT
    x.n
INTO #t
FROM
(
    SELECT '$'AS n
    UNION ALL 
    SELECT ','AS n
    UNION ALL
    SELECT '.'AS n
    UNION ALL
    SELECT ',1,'AS n
    UNION ALL
    SELECT '-'AS n
    UNION ALL
    SELECT '+'AS n
    UNION ALL
    SELECT CHAR(9)AS n 
    UNION ALL
    SELECT CHAR(10)AS n 
    UNION ALL
    SELECT CHAR(11)AS n
    UNION ALL
    SELECT CHAR(12)AS n 
    UNION ALL
    SELECT CHAR(13)AS n
) AS x;

We’re definitely not dumping anything in that table that’s a number.

Competitors

Let’s see what happens when we run this query:

SELECT
    t.n,
    i = ISNUMERIC(t.n),
    t = TRY_CONVERT(bigint, t.n),
    c = CASE 
            WHEN t.n NOT LIKE '%[^0-9]%'
            THEN 1
            ELSE 0
        END
FROM #t AS t

The first column is what each of the next three columns are evaluating:

You can see ISNUMERIC stink it up every time. TRY_CONVERT mostly does okay, but I’m not sure how I feel about + and – being converted to zero.

That case expression is the only thing that seems reliable.

Of course, integers are stodgy. Stiff. Maybe we need something a little more floaty.

SELECT
    t.n,
    i = ISNUMERIC(t.n),
    t = TRY_CONVERT(float, t.n),
    c = CASE 
            WHEN t.n NOT LIKE '%[^0-9]%'
            THEN 1
            ELSE 0
        END
FROM #t AS t

What happens?

Well, ISNUMERIC still stinks, but at least TRY_CONVERT does better.

Majority

No one should be using ISNUMERIC, period. Depending on your query needs, you can certainly use TRY_CONVERT in most situations, assuming you’re on SQL Server 2016+.

For everyone else, and even maybe people on 2016+, that CASE expression works really well for rooting out things that aren’t numbers.

Thanks for reading!

Going Further

Things SQL Server vNext Should Address: Making Date Math SARGable

Posted on July 1, 2021May 16, 2022 by Erik Darling

Whiffle Ball

How you do date math in a where clause matters, because wrapping a column in any sort of expression can really hurt your query performance.

The thing is that most rewrites are pretty simple, as long as there aren’t two columns fed into it.

For example, there’s not much the optimizer could do about this:

WHERE DATEDIFF(DAY, u.CreationDate, u.LastAccessDate) > 1

But that’s okay, because if you do that you deserve exactly what you get.

Computed columns exist for a reason. Use them.

Whaffle House

Where things get a bit easier is for simpler use cases where constant folding and expression matching can be used to flip predicates around a little bit.

It’s just a little bit of pattern recognition, which the optimizer already does to make trees and apply rules, etc.

CREATE INDEX p ON dbo.Posts(CreationDate);

There’s a huge difference between these two query plans:

SELECT 
    c = COUNT_BIG(*)
FROM dbo.Posts AS p
WHERE DATEADD(MONTH, -1, p.CreationDate) >= '20191101'
GO 

SELECT 
    c = COUNT_BIG(*)
FROM dbo.Posts AS p
WHERE p.CreationDate >= DATEADD(MONTH, 1, '20191101');
GO

Thanks for reading!

Going Further

Local Variables Get Bad Cardinality Estimates In Cursors In SQL Server, Too

Posted on June 30, 2021May 16, 2022 by Erik Darling

Work Off

While helping a client out with weird performance issues, we isolated part of the code that was producing a whole bunch of bad plans.

At the intersection of bad ideas, there was a cursor looping over a table gathering some data points with a local variable in the where clause.

For more background on that, check out these posts:

One might expect the query inside the cursor to be subject to some special rules, but alas, nothing good comes of it.

Eau de Sample

To repro a little bit, we need to create a certainly suboptimal index.

CREATE INDEX p ON dbo.Posts
    (OwnerUserId);

If you have a lot of single key column indexes, you’re probably doing indexing wrong.

Full demo code is at the end because it’s a bit long, but the end result is five query plans that all share the same bad estimate based off the density vector.

The worst of them looks like this:

And the best of them looks like this:

Over and Over

If you’re getting bad guesses like this over and over again in any loop-driven code, local variables might just be to blame.

That guess of 10 rows of course comes from this calculation:

SELECT 
    density = 
        (
            1 / 
            CONVERT
            (
                float, 
                COUNT(DISTINCT p.OwnerUserId)
            )
        ) *
        COUNT_BIG(*)
FROM Posts AS p

Which, with a little bit of rounding, gets us to the estimate we see in the query plan:

Thanks for reading!

Going Further

Demo Code

SET STATISTICS XML OFF;
SET NOCOUNT ON;

DECLARE 
    @CurrentId int;

DROP TABLE IF EXISTS #UserIds; 

CREATE TABLE #UserIds 
(
    UserId int PRIMARY KEY CLUSTERED
);

INSERT
    #UserIds WITH(TABLOCK)
(
    UserId
)
SELECT
    u.Id
FROM dbo.Users AS u
WHERE u.Reputation > 850000
ORDER BY u.Reputation DESC;


DECLARE counter CURSOR 
    LOCAL STATIC
FOR
SELECT 
    UserId 
FROM #UserIds;
 
OPEN counter;

FETCH NEXT FROM counter 
    INTO @CurrentId;

WHILE @@FETCH_STATUS = 0
BEGIN 
   
    SET STATISTICS XML ON;
    
    SELECT
        p.PostTypeId,
        c = COUNT_BIG(*)
    FROM dbo.Posts AS p
    WHERE p.OwnerUserId = @CurrentId
    GROUP BY p.PostTypeId
    ORDER BY c DESC;
    
    SET STATISTICS XML OFF;
 
FETCH NEXT FROM counter 
    INTO @CurrentId;
END;

CLOSE counter;
DEALLOCATE counter;

Things SQL Server vNext Should Address: Going From INT To BIGINT

Posted on June 29, 2021May 16, 2022 by Erik Darling

Historical

I’ve had to help people with this a few times recently, and it’s always a mess. If you’re lucky, you can use a technique like Andy’s to do it, but even this can be complicated by foreign keys, schemabound objects, etc. If your scenario is one big lonely table though, that can be great.

Michael J. Swart has a post where he talks about some of the things you can run into, and Keeping It Canadian™, Aaron Bertrand has a four part series on the subject, too.

Going fully international, Gianluca Sartori and Paul White have also written about the subject.

Now that we have all that covered, let’s talk about where everything falls short: If the identity column is in the primary key, or any other indexes, you still have to drop those to modify the column even if they all have compression enabled.

El Tiablo

For example, if we have this table:

DROP TABLE IF EXISTS dbo.comp_test;

CREATE TABLE dbo.comp_test
(
    id   int PRIMARY KEY CLUSTERED 
        WITH (DATA_COMPRESSION = ROW) , 
    crap int, 
    good date, 
    bad  date,
    INDEX c (crap, id) 
        WITH (DATA_COMPRESSION = ROW),
    INDEX g (good, id) 
        WITH (DATA_COMPRESSION = ROW),
    INDEX b (bad,  id) 
        WITH (DATA_COMPRESSION = ROW)
);

And we try to alter the id column:

ALTER TABLE dbo.comp_test
    ALTER COLUMN id BIGINT NOT NULL 
WITH 
(
    ONLINE = ON
);

We get all these error messages:

Msg 5074, Level 16, State 1, Line 22
The object 'PK__comp_tes__3213E83FF93312D6' is dependent on column 'id'.
Msg 5074, Level 16, State 1, Line 22
The index 'b' is dependent on column 'id'.
Msg 5074, Level 16, State 1, Line 22
The index 'g' is dependent on column 'id'.
Msg 5074, Level 16, State 1, Line 22
The index 'c' is dependent on column 'id'.
Msg 4922, Level 16, State 9, Line 22
ALTER TABLE ALTER COLUMN id failed because one or more objects access this column.

Odds R Us

The chances of you having an identity column on a table that isn’t the PK seems pretty low to me, based on every single database I’ve ever looked at.

The chances of you being able to drop the Primary Key on a table running over 2 billion rows, alter the column, and then add it back without causing some congestion aren’t so hot. If your database is in an AG or synchronizing data in some other way, you’re in for a bad time with that, too.

Sure, if you’re on Enterprise Edition, you can drop the Primary Key with ONLINE = ON, but you can’t do that with the nonclustered indexes.

ALTER TABLE dbo.comp_test
    DROP CONSTRAINT PK__comp_tes__3213E83FF93312D6
WITH (ONLINE = ON);

That works fine, but, this does not:

DROP INDEX c ON dbo.comp_test WITH (ONLINE = ON);

This error makes our issue clear:

Msg 3745, Level 16, State 1, Line 33
Only a clustered index can be dropped online.

Adding them back with ONLINE = ON is also available in Enterprise Edition, but all the queries that used those indexes are gonna blow chunks because those 2 billion row indexes were probably pretty important to performance.

Partitioning Is Useless

I know, I know. It probably feels like I’m picking on partitioning here. It really wasn’t made for this sort of thing, though.

CREATE PARTITION FUNCTION pf_nope(datetime) 
    AS RANGE RIGHT 
    FOR VALUES ('19990101');

CREATE PARTITION SCHEME ps_nope 
    AS PARTITION pf_nope 
    ALL TO ([PRIMARY]);

CREATE TABLE dbo.one_switch
(
    id integer, 
    e datetime
) ON ps_nope(e);

CREATE TABLE dbo.two_switch
(
    id bigint, 
    e datetime
) ON ps_nope(e);

In the first table, our id column is an integer, and in the second column is a big integer.

ALTER TABLE dbo.two_switch 
    SWITCH PARTITION 1 
    TO dbo.one_switch PARTITION 1;

Leads us to this error message:

Msg 4944, Level 16, State 1, Line 65
ALTER TABLE SWITCH statement failed because column 'id' has data type bigint 
in source table 'Crap.dbo.two_switch' which is different from its type int in 
target table 'Crap.dbo.one_switch'.

No match, no switch.

What a drag it is getting old.

FIXFIX

Moving from INT to BIGINT is not fun, especially for a change that realistically only needs to apply to new pages.

Changes to old pages (if ever necessary) could be deferred until then, but in the case of columns based on identities or sequences, I can’t think of a realistic scenario where that would even happen.

It would be really nice to have other options for making this change that didn’t seemingly trade complexity for uptime.

Thanks for reading!

Going Further

A Query To Help You Find Plans With Eager Index Spools In SQL Server’s Plan Cache

Posted on June 28, 2021May 16, 2022 by Erik Darling

Yawn And Yawn

I’ve written a bunch about Eager Index Spools, and how to fix them, but I’ve always sort of left the “finding” part up to you, or pointed curious tuners to tools like sp_BlitzCache.

Recently though, I worked with a client who had Eager Index Spools so frequently that we needed to track them down specifically.

This is the plan cache query that I used to do it — they didn’t have Query Store enabled — and I wanted to share it.

WITH 
    XMLNAMESPACES 
('http://schemas.microsoft.com/sqlserver/2004/07/showplan' AS x),
    plans AS 
(
    SELECT TOP (10)
        deqs.query_plan_hash,
        sort = 
            SUM(deqs.total_worker_time / deqs.execution_count)
    FROM sys.dm_exec_cached_plans AS decp
    JOIN sys.dm_exec_query_stats AS deqs
        ON decp.plan_handle = deqs.plan_handle
    CROSS APPLY sys.dm_exec_query_plan(decp.plan_handle) AS deqp
    CROSS APPLY deqp.query_plan.nodes('//x:RelOp') AS r (c)
    WHERE  r.c.exist('//x:RelOp[@PhysicalOp="Index Spool" and @LogicalOp="Eager Spool"]') = 1
    AND    EXISTS
           (      
               SELECT 
                   1/0
               FROM sys.dm_exec_plan_attributes(decp.plan_handle) AS pa 
               WHERE pa.attribute = 'dbid'
               AND   pa.value > 4
           )   
    GROUP BY deqs.query_plan_hash
    ORDER BY sort DESC
)
SELECT
    deqp.query_plan,
    dest.text,
    avg_worker_time = 
        (deqs.total_worker_time / deqs.execution_count),
    deqs.total_worker_time,
    deqs.execution_count
FROM sys.dm_exec_cached_plans AS decp
JOIN sys.dm_exec_query_stats AS deqs
    ON decp.plan_handle = deqs.plan_handle
CROSS APPLY sys.dm_exec_query_plan(decp.plan_handle) AS deqp    
CROSS APPLY sys.dm_exec_sql_text(deqs.sql_handle) AS dest
WHERE EXISTS
(
    SELECT
        1/0
    FROM plans AS p
    WHERE p.query_plan_hash = deqs.query_plan_hash
)
ORDER BY avg_worker_time DESC
OPTION(RECOMPILE, MAXDOP 1);

It’s maybe not the prettiest thing in the world, but it got the job done.

Thanks for reading!

Going Further

Darling Data Is Moving On Up

Posted on June 25, 2021May 16, 2022 by Erik Darling

Spatial Abuse

At the beginning of 2020, I recorded a bunch of training videos, and put on some live classes throughout the year. As much as I love doing that stuff, my current office absolutely sucks for it.

Like a lot of families, we spent most of 2020 trying to figure out how to make the most of the space we had, and then thinking longer-term about what our space requirements are.

We thought about moving to a lot of different places. We looked in difference cities, states, and even countries (visa applications for France weren’t available for a lot of the year though).

The entire time, I sort of put off recording training content and putting on classes, because the experience wasn’t great for me. That’s it. I stopped streaming, didn’t record YouTube videos for a while, and just buckled down to focus on blogging and client work.

Getting set up to do any of that stuff was a process, and the whole green screen situation in a small office just didn’t work out. It was right behind me, I’d knock into it, and the lighting in here made all sorts or artifacts and flicker effects. I was never really happy with the final product.

Add to it that there was a lot of outside noise and annoyances.

Very blah, as they say.

We finally found a new place that suits our needs though, and I have a much bigger office now that doesn’t require anyone else sacrificing space. I’m getting things migrated to the new space, and when I do I’ll start planning out more live stuff and training.

Stay tuned!

Going Further

Rewriting Scalar UDFs For SQL Server 2019 Inlining

Posted on June 24, 2021May 16, 2022 by Erik Darling

You May Find Yourself

SQL Server 2019 has been a bit of a roller coaster ride. In particular, UDF inlining started as I think the most interesting addition to the product. Big brain stuff, for sure.

It has been nerfed quite a bit, with seemingly more and more restrictions added to every cumulative update. Hopefully some of these can be lifted at the feature matures, but I understand how difficult all this is.

People program absolute bloodbaths into functions.

Today, I want to look at one restriction that has a fairly simple workaround: Calling GETDATE().

Emptied

Let’s use a simple function that figures out if fewer than 90 days exist between two dates.

CREATE OR ALTER FUNCTION dbo.sneaky
(
    @StartDate datetime,
    @EndDate datetime
)
RETURNS bit
WITH
    SCHEMABINDING,
    RETURNS NULL ON NULL INPUT
AS
BEGIN
RETURN 
    CASE 
        WHEN DATEDIFF
             (
                 DAY,
                 @StartDate,
                 ISNULL(@EndDate, GETDATE())
             ) < 90
       THEN 1
       ELSE 0 
    END;
END;
GO

This function can’t be inlined, because we call GETDATE() inside the function body. We can witness all the baddities that scalar UDFs cause as usual.

SELECT
    u.DisplayName,
    sneaky = 
        dbo.sneaky(u.CreationDate, u.LastAccessDate)
FROM dbo.Users AS u
WHERE EXISTS
      (
          SELECT
              1/0
          FROM dbo.Posts AS p
          JOIN dbo.Votes AS v
              ON v.PostId = p.Id
          WHERE p.OwnerUserId = u.Id
      );

The query won’t be allowed to use parallelism, the function will execute once per row it needs to process, etc. etc. etc.

Cool Street

The better option is to use a third function argument that you can pass GETDATE() to.

CREATE OR ALTER FUNCTION dbo.sneakier
(
    @StartDate datetime,
    @EndDate datetime,
    @FallBack datetime
)
RETURNS bit
WITH
    SCHEMABINDING,
    RETURNS NULL ON NULL INPUT
AS
BEGIN
RETURN 
    CASE 
        WHEN DATEDIFF
             (
                 DAY,
                  @StartDate,
                  ISNULL(@EndDate, @FallBack)
             ) < 90
       THEN 1
       ELSE 0 
    END;
END;
GO

You could also change the calling query to protect from NULLs and remove the check from the function, like so:

dbo.sneaky(u.CreationDate, ISNULL(u.LastAccessDate, GETDATE()))

Calling the new function like so results in a much faster query execution time:

SELECT
    u.DisplayName,
    sneaky = 
        dbo.sneakier(u.CreationDate, u.LastAccessDate, GETDATE())
FROM dbo.Users AS u
WHERE EXISTS
      (
          SELECT
              1/0
          FROM dbo.Posts AS p
          JOIN dbo.Votes AS v
              ON v.PostId = p.Id
          WHERE p.OwnerUserId = u.Id
      );
GO

The plan is allowed to go parallel, and rather than the function being hidden in a Compute Scalar operator, it’s represented in the query plan by Constant Scan operators that produce the necessary rows.

Pokemon Drift

These types of rewrites will probably become more common as people move to newer versions of SQL Server, and embrace higher compatibility levels where these features are allowed to maneuver.

Of course, at the rate things change, that may be a long ways off.

Thanks for reading!

Going Further

Common Table Expressions Are Useful For Rewriting Scalar Functions In SQL Server

Posted on June 23, 2021May 16, 2022 by Erik Darling

Punching

Thanks for watching!