A Little About Fill Factor And Fragmentation In SQL Server

A Little About Fill Factor And Fragmentation In SQL Server



Thanks for watching!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.

A Little About Multi-Column Indexes In SQL Server

A Little About Multi-Column Indexes In SQL Server



Thanks for watching!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.

A Very Silly Performance Tuning Trick In SQL Server

A Very Silly Performance Tuning Trick In SQL Server



Thanks for watching!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.

Some Questions I’ve Answered Recently On Database Administrators Stack Exchange

Fun and No Profit


Normally I don’t have many questions, but here are a couple that I did ask:

Here’s the list of answers on dba.stackexchange.com:

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.

In Which I Share A Piece Of Code That I’m Not Proud Of

Requesting


I am sometimes asked to write special bits of code by people to solve a specific problem they’re having.

A recent one was for quite an unreasonable situation brought on by a shamelessly written vendor application, where:

  • Implicit transactions were in use
  • Bizarre locking hints abounds
  • Absolutely zero attention paid to transaction handling

Which lead to scenarios where select queries would run, finish, and never close out their connection. Of course, this was bad, because loads of other queries would get blocked by these things that should have just ended their sessions and released their locks and been on their way.

And so I wrote this thing. A thing that I’d always sort of made fun of the concept of, because I’d seen so many bad implementations of it throughout the years.

Most of them would just look for lead blockers and kill them without any consideration as to how much work they’d done, which would lead to even more blocking during rollback.

This one specifically looks for things that have used zero transaction log space.

Here it is. I don’t love it, but I wanted to share it, because it might make you feel better about some code that you weren’t proud to write, either.

Thanks for reading!

/*
EXEC dbo.sleeper_killer
    @debug = 'true';

SELECT
    sk.*
FROM dbo.killed_sleepers AS sk;

*/

SET ANSI_NULLS ON;
SET ANSI_PADDING ON;
SET ANSI_WARNINGS ON;
SET ARITHABORT ON;
SET CONCAT_NULL_YIELDS_NULL ON;
SET QUOTED_IDENTIFIER ON;
SET NUMERIC_ROUNDABORT OFF;
SET IMPLICIT_TRANSACTIONS OFF;
SET STATISTICS TIME, IO OFF;
GO

CREATE OR ALTER PROCEDURE
    dbo.sleeper_killer
(
    @debug bit = 'false'
)
AS
BEGIN
    SET NOCOUNT ON;
    
    /*Make sure the logging table exists*/
    IF OBJECT_ID('dbo.killed_sleepers') IS NULL
    BEGIN
        CREATE TABLE
            dbo.killed_sleepers
        (
            run_id bigint IDENTITY PRIMARY KEY,
            run_date datetime NOT NULL DEFAULT SYSDATETIME(),
            session_id integer NULL,
            host_name sysname NULL,
            login_name sysname NULL,
            program_name sysname NULL,
            last_request_end_time datetime NULL,
            duration_seconds integer NULL,
            last_executed_query nvarchar(4000) NULL,
            error_number integer NULL,
            severity tinyint NULL,
            state tinyint NULL,
            error_message nvarchar(2048),
            procedure_name sysname NULL,
            error_line integer NULL
        );
    END;

    /*Check for any work to do*/
    IF EXISTS
    (
        SELECT
            1/0
        FROM sys.dm_exec_sessions AS s
        JOIN sys.dm_tran_session_transactions AS tst
          ON tst.session_id = s.session_id
        JOIN sys.dm_tran_database_transactions AS tdt
          ON tdt.transaction_id = tst.transaction_id
        WHERE s.status = N'sleeping'
        AND   s.last_request_end_time <= DATEADD(SECOND, -5, SYSDATETIME())
        AND   tdt.database_transaction_log_bytes_used < 1
    )
    BEGIN   
        IF @debug = 'true' BEGIN RAISERROR('Declaring variables', 0, 1) WITH NOWAIT; END;
        /*Declare variables for the cursor loop*/
        DECLARE
            @session_id integer,
            @host_name sysname,
            @login_name sysname,
            @program_name sysname,
            @last_request_end_time datetime,
            @duration_seconds integer,
            @last_executed_query nvarchar(4000),
            @kill nvarchar(11);
        
        IF @debug = 'true' BEGIN RAISERROR('Declaring cursor', 0, 1) WITH NOWAIT; END;
        /*Declare a cursor that will work off live data*/
        DECLARE
            killer
        CURSOR
            LOCAL
            SCROLL
            READ_ONLY
        FOR
        SELECT
            s.session_id,
            s.host_name,
            s.login_name,
            s.program_name,
            s.last_request_end_time,
            duration_seconds = 
              DATEDIFF(SECOND, s.last_request_end_time, GETDATE()),
            last_executed_query = 
                SUBSTRING(ib.event_info, 1, 4000),
            kill_cmd =
                N'KILL ' + RTRIM(s.session_id) + N';'
        FROM sys.dm_exec_sessions AS s
        JOIN sys.dm_tran_session_transactions AS tst
          ON tst.session_id = s.session_id
        JOIN sys.dm_tran_database_transactions AS tdt
          ON tdt.transaction_id = tst.transaction_id
        OUTER APPLY sys.dm_exec_input_buffer(s.session_id, NULL) AS ib
        WHERE s.status = N'sleeping'
        AND   s.last_request_end_time <= DATEADD(SECOND, -5, SYSDATETIME())
        AND   tdt.database_transaction_log_bytes_used < 1
        ORDER BY
            duration_seconds DESC;                       
        
        IF @debug = 'true' BEGIN RAISERROR('Opening cursor', 0, 1) WITH NOWAIT; END;
        /*Open the cursor*/
        OPEN killer;
        
        IF @debug = 'true' BEGIN RAISERROR('Fetch first from cursor', 0, 1) WITH NOWAIT; END;
        /*Fetch the initial row*/
        FETCH FIRST
        FROM killer
        INTO
            @session_id,
            @host_name,
            @login_name,
            @program_name,
            @last_request_end_time,
            @duration_seconds,
            @last_executed_query,
            @kill;
        
        /*Enter the cursor loop*/
        WHILE @@FETCH_STATUS = 0
        BEGIN
        BEGIN TRY
            IF @debug = 'true' BEGIN RAISERROR('Insert', 0, 1) WITH NOWAIT; END;
            /*Insert session details to the logging table*/
            INSERT
                dbo.killed_sleepers
            (
                session_id,
                host_name,
                login_name,
                program_name,
                last_request_end_time,
                duration_seconds,
                last_executed_query
            )
            VALUES
            (
                @session_id,
                @host_name,
                @login_name,
                @program_name,
                @last_request_end_time,
                @duration_seconds,
                @last_executed_query
            );
        
            IF @debug = 'true' BEGIN RAISERROR('Killing...', 0, 1) WITH NOWAIT; END;
            IF @debug = 'true' BEGIN RAISERROR(@kill, 0, 1) WITH NOWAIT; END;
            
            /*Kill the session*/
            EXEC sys.sp_executesql
                @kill;
        END TRY
        BEGIN CATCH
            IF @debug = 'true' BEGIN RAISERROR('Catch block', 0, 1) WITH NOWAIT; END;
            
            /*Insert this in the event of an error*/
            INSERT
                dbo.killed_sleepers
            (
                session_id,
                host_name,
                login_name,
                program_name,
                last_request_end_time,
                duration_seconds,
                last_executed_query,
                error_number,
                severity,
                state,
                error_message,
                procedure_name,
                error_line
            )
            SELECT
                @session_id,
                @host_name,
                @login_name,
                @program_name,
                @last_request_end_time,
                @duration_seconds,
                @last_executed_query,
                ERROR_NUMBER(),
                ERROR_SEVERITY(),
                ERROR_STATE(),
                ERROR_MESSAGE(),
                ERROR_PROCEDURE(),
                ERROR_LINE();
        END CATCH;
    
        IF @debug = 'true' BEGIN RAISERROR('Fetching next', 0, 1) WITH NOWAIT; END;
        /*Grab the next session to kill*/
        FETCH NEXT
        FROM killer
        INTO
            @session_id,
            @host_name,
            @login_name,
            @program_name,
            @last_request_end_time,
            @duration_seconds,
            @last_executed_query,
            @kill;
        END;
    
    IF @debug = 'true' BEGIN RAISERROR('Closedown time again', 0, 1) WITH NOWAIT; END;
    /*Shut things down*/
    CLOSE killer;
    DEALLOCATE killer;
    END;
END; --Final END

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.

Why You Should Avoid People Who Tell You To Avoid Subqueries In SQL Server

Why You Should Avoid People Who Tell You To Avoid Subqueries In SQL Server



Thanks for watching!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.

Cursors In Scalar UDFs, and Other Performance Pitfalls In SQL Server

Cursors In Scalar UDFs, and Other Performance Pitfalls In SQL Server


Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.

A Query Writing And Tuning Exercise: Finding Duplicate Post Titles In Stack Overflow

ErikGPT


I’m going to be totally open and honest with you, dear reader: I’ve been experimenting with… AI.

See, I’m just a lonely independent consultant, and sometimes it’s just nice to have someone to talk to. It’s also kind of fun to take a query idea you have, and ask “someone” else to write it to see what they’d come up with.

ChatGPT (for reference, 4 and 4o) does a rather okay job sometimes. In fact, when I ask it to write a query, it usually comes up with a query that looks a lot like the ones that I have to fix when I’m working with clients.

If I poke and prod it enough about the things that it has done wrongly, it will agree with me and do things the right way, eventually. That is an improvement over your average T-SQL developer.

Your average T-SQL developer will spend a terrible amount of time trying to figure out ways to write queries incorrectly, even when you show them the right way to do something, often under the assumption that they’ve found the one time it’s okay to do it wrong.

For this post, I came up with a query idea, wrote a query that did what I wanted, and then asked the AI to write its own version.

It came pretty close in general, and even added in a little touch that I liked and hadn’t thought of.

Duplicate Post Finder


Here’s the query I wrote, combined with the nice touch that ChatGPT added.

WITH 
    DuplicateTitles AS 
(
    SELECT 
        Title,
        EarliestPostId = MIN(p.Id),
        FirstPostDate = MIN(p.CreationDate),
        LastPostDate = MAX(p.CreationDate),
        DuplicatePostIds = 
            STRING_AGG
                (CONVERT(varchar(MAX), p.Id), ', ') 
            WITHIN GROUP 
                (ORDER BY p.Id),
        TotalDupeScore = SUM(p.Score),
        DuplicateCount = COUNT_BIG(*) - 1
    FROM dbo.Posts AS p
    WHERE p.PostTypeId = 1
    GROUP BY 
        p.Title
    HAVING 
        COUNT_BIG(*) > 1
)
SELECT 
    dt.Title,
    dt.FirstPostDate,
    dt.LastPostDate,
    dt.DuplicatePostIds,
    dt.DuplicateCount,
    TotalDupeScore = 
        dt.TotalDupeScore - p.Score
FROM DuplicateTitles dt
JOIN dbo.Posts p
  ON  dt.EarliestPostId = p.Id
  AND p.PostTypeId = 1
ORDER BY 
    dt.DuplicateCount DESC,
    TotalDupeScore DESC;

If you’re wondering what the nice touch is, it’s the - 1 in DuplicateCount = COUNT_BIG(*) - 1, and I totally didn’t think of doing that, even though it makes total sense.

So, good job there.

Let’s Talk About Tuning


To start, I added this index. Some of these columns could definitely be moved to the includes, but I wanted to see how having as many of the aggregation columns in the key of the index would help with sorting that data.

Those datums? These datas? I think one of those is right, probably.

CREATE INDEX 
    p 
ON dbo.Posts
    (PostTypeId, Title, CreationDate, Score) 
WITH 
    (SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);

It leads with PostTypeId, since that’s the only column we’re filtering on to find questions, which are the only things that can have titles.

But SQL Server’s cost-based optimizer makes a very odd choice here. Let’s look at that there query plan.

sql server query plan
two filters? in my query plan?

There’s one expected Filter in the query plan, for the COUNT_BIG(*) > 1 predicate, which makes absolute sense. We don’t know what the count will be ahead of time, so we have to calculate and filter it on the fly.

The one that is entirely unexpected is for PostTypeId = 1, because WE HAVE AN INDEX THAT LEADS WITH POSTTYPEID.

¿Por que las hamburguesas, SQL Server?

Costing vs. Limitations


I’ve written in the past about, quite literally not figuratively, how Max Data Type Columns And Predicates Aren’t SARGable.

My first thought was that that, since we’re doing this: (CONVERT(varchar(MAX), p.Id), ', '), that the compute scalar right before the filter was preventing the predicate on PostTypeId from being pushed into an index seek.

Keep in mind that this is quite often necessary when using STRING_AGG, because the implementation is pretty half-assed even by Microsoft standards. And unfortunately, the summer intern who worked on it has since moved on to be a Senior Vice President elsewhere in the organization.

At first I experimented with using smaller byte lengths in the convert. And yeah, somewhere in the 500-600 range, the plan would change to an index seek. But this wasn’t reliable. Different stats samplings and compatibility levels would leave me with different plans (switching between a seek and a scan). The only thing that worked reliably is using a FORCESEEK hint to override the optimizer’s mishandling.

This changes the plan to something quite agreeable, that no longer takes 12 seconds.

sql server query plan
STRING_AGG? more like STRING_GAG! 🥁 🤡

So why the decision to use the first plan, au naturale, instead of the plan that took me forcing things to seek?

  • 12 second plan: 706 query bucks
  • 4 second plan: 8,549 query bucks

The faster plan was estimated to cost nearly 10x the query bucks to execute. Go figure.

For anyone who needed a reminder:

  • High cost doesn’t mean slow
  • Low cost doesn’t mean fast
  • All costs are estimates, with no bearing on the reality of query execution

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.

Profiling Query Performance In SQL Server With Extended Events The Easy Way

Profiling Query Performance In SQL Server With Extended Events The Easy Way



Thanks for watching!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.

Plan Cache Pollution From Temporary Objects In SQL Server

Plan Cache Pollution From Temporary Objects In SQL Server



Thanks for watching!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.