SQL Server Archives | Page 97 of 161

I/O Bound vs CPU Bound Queries In SQL Server

Posted on September 21, 2021May 16, 2022 by Erik Darling

Handcuffed

When you’re looking for queries to tune, it’s important to understand which part is causing the slowdown.

That’s why Actual Execution plans are so valuable in newer versions of SQL Server and SSMS. Getting to see operator timing and wait stats for a query can tell you a lot about what kind of problem you’re facing.

Let’s take a look at some examples.

Diskord

If you’re looking at a query plan, and all the time is spent way to the right, when you’re reading from indexes, it’s usually a sign of one or two things:

You’re missing a good index for the query
You don’t have enough memory to cache data relevant to the query

If you run the query a second time, and it has the same characteristics — meaning we should now have the data cached in the buffer pool but we don’t — then one or both of those two things is true.

If you run the query a second time and it’s fast because all the data we care about is cached, then it’s more likely that only the second thing is true.

For example, every time I run this query it takes 20 seconds because every time it has to read the clustered index from disk into memory to get a count. That’s because my VM has 96 GB of RAM, and the clustered index of the Posts table is about 120 GB. I can’t fit the whole thing into the buffer pool, so each time I run this query has the gas station sushi effect on the buffer pool.

If I add a nonclustered index — and keep in mind I don’t really condone adding single key column nonclustered indexes like this — the query finishes much faster, because the smaller nonclustered index takes less time to read, and it fits into the buffer pool.

CREATE INDEX pr ON dbo.Posts

(

);

CREATE INDEX pr ON dbo.Posts ( Id );

CREATE INDEX pr ON dbo.Posts
(
    Id
);

If our query had different characteristics, like a where clause, join, group by, order by, or windowing function, I’d consider all of those things for the index definition. Just grabbing a count can still benefit from a smaller index, but there’s nothing relational that we need to account for here.

Proc Rock

Let’s say you already have ideal indexes for a query, but it’s still slow. Then what?

There are lots of possible reasons, but we’re going to examine what a CPU bound query looks like. A good example is one that needs to process a lot of rows, though not necessarily return a lot of rows, like a count or other aggregate.

While this query runs, CPUs are pegged like suburban husbands.

For queries of this stature, inducing batch mode is often the most logical choice. Why? Because CPU instructions are run over batches of rows at once, rather than a single row at a time.

With a small number of rows — like in an OLTP workload — you probably won’t notice any real gains. But for this query that takes many millions of rows and produces an aggregate, it’s Hammer Time™

Rather than ~30 seconds, we can get our query down to ~8 seconds without making a single other change to the indexes or written form.

Under Compression

For truly large data sets, compression indexes is a great choice for further reducing I/O bound portions of queries. In SQL Server, you have row, page, and column store (clustered and nonclustered) compression available to you based on the type of workload you’re running.

When you’re tuning a query, it’s important to keep the type of bottleneck you’re facing in mind. If you don’t, you can end up trying to solve the wrong problem and getting nowhere.

Thanks for reading!

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Forced Parameterization vs Optimize For Ad Hoc Workloads

Posted on September 20, 2021May 16, 2022 by Erik Darling

Shredded Cheese

I often speak with people who are confused about what these settings do, and which one they should be using to fix certain problems.

The first myth is that Optimize For Ad Hoc Workloads has some special effect on queries run outside of stored procedures: it does not. It’s very poorly named in that regard. There are no special optimizations applied because of that setting.

If you hit up the documentation, there’s no mention of it, but because it’s named how it is, people think YEAH IT’LL MAKE MY AD HOC QUERIES FASTER!

But no. It does not.

Add Chalk

Optimize For Ad Hoc Workloads mainly helps servers where the plan cache is unstable because it gets full of plans that don’t ever get reused. You end up with a lot of churn.

Why don’t they get reused? There are a lot of reasons, but often it’s because queries either aren’t parameterized, or because parameters aren’t explicitly defined in the application.

When you turn Optimize For Ad Hoc Workloads on, first-time plans are stored as stubs. That’s all. If they get used a second time, the full plan is stored.

This is great, unless all your plans have a low rate of reuse anyway, like < 10 or so.

Paramesan Cheese

First, the documentation for Forced Parameterization is hard to find.

Second, it’s usually a *wonderful* setting for queries that are fully unparameterized. One of the limitations is that if a query is only partially parameterized, it won’t parameterize the unparameterized bits.

That kinda sucks, but I understand why it doesn’t: Microsoft thinks you’re smart and you know what you’re doing, and there must be a *very good reason* for you to only have partially parameterized a query.

For instance, to get a filtered index used, or to avoid some parameter sniffing issue with skewed data.

Which One Do You Need?

Focus on the problem you’re trying to solve.

If you have a lot of single use plans clogging up your plan cache and forcing a lot of churn, then Optimize For Ad Hoc Workloads can be great
If you have a lot of unparameterized queries creating loads of duplicate plans (maybe even single use), you want Forced Parameterization

Thanks for reading!

Going Further

Replication Deadlocks In SQL Server, And How To Fix Them

Posted on September 17, 2021May 15, 2023 by Erik Darling

As If

Replication is one of my least favorite things, and I go out of my way not to deal with it. I have had a few clients now that have run into problems with deadlocks arising from it doing Replication-y things.

If you use Plan Explorer to look at deadlocks, which you should because SSMS sucks at it, you’ll see stuff that looks like this:

You’ll see deadlocks on things like LockMatchID, sys.sp_MSrepl_changestatus, and sp_MSrepl_addsubscription.

You may also see weird looking ones like this on sp_addsubscription.

If you see deadlocks that involved Database Id 32767, and a negative object ID like -993696157, it’s going to be some weird replication stuff. That’s the Id of the Resource Database, which you can’t really get at without the DAC, or copying and attaching the files for it as a user database.

i too am deadlock

You may also see deadlocks on things like sp_replupdatechema coming from mssqlsystemresource.

Fixing It

The official line from Microsoft Support is that you can usually fix the deadlocks by running these commands:

EXEC sp_changepublication

@publication = N'yourpublication',

@property = N'allow_anonymous',

@value = N'false';

EXEC sp_changepublication

@publication = N'yourpublication',

@property = N'immediate_sync',

@value = N'false';

EXEC sp_changepublication @publication = N'yourpublication', @property = N'allow_anonymous', @value = N'false'; GO EXEC sp_changepublication @publication = N'yourpublication', @property = N'immediate_sync', @value = N'false'; GO

EXEC sp_changepublication
  @publication = N'yourpublication',
  @property = N'allow_anonymous',
  @value = N'false';
GO

EXEC sp_changepublication
  @publication = N'yourpublication',
  @property = N'immediate_sync',
  @value = N'false';
GO

In practice, the clients I’ve had do this have had their Replication Deadlocks resolved. Hopefully if you’re hitting the same problems, you’ll find them useful.

Thanks for reading!

Going Further

Index Key Columns And Data Access Patterns In SQL Server

Posted on September 16, 2021May 16, 2022 by Erik Darling

Financial Advice

When I talk about “data access patterns” in relation to databases, it’s often met with blank stares.

What is a data access pattern? What does it have to do with indexes?

Well, my dear friend, today you’re gonna learn it good and hard.

Medical Doctor

Data access patterns refer to the most common ways that queries filter, join, order, and display data.

The earliest point of a data access pattern is your where clause. Different queries may have different patterns of data to look for.

Giving some generic OLTP-ish examples, you might have queries that look for:

Customer orders
- Within a date range
- Ordered by most recent
Items in a customer order
- With the total price
- Plus shipping
- Plus tax
Items in stock
- Total quantity

Depending on how normalized your data is, getting some of this stuff will likely require 2-3 tables getting joins together in some manner.

But all of these different scenarios define your data access patterns, and this is how you need to gear your indexes.

A lot of people get caught up on the minutiae of indexes without taking care of any of the basics, worrying about foreign keys, GUIDs, fragmentation, and other ridiculous memes.

Personal Trainer

Your data access patterns should define your indexes, because that — along with well-written queries to use those indexes most efficiently — is what’s going to make your application fast.

I’ve blogged about some of the fundamental concepts behind this in the past:

If you still need help with your index design after reading those, drop me a line! That’s the kind of thing I love helping people out with.

Thanks for reading!

Going Further

The Academics Behind FROID

Posted on September 15, 2021May 16, 2022 by Erik Darling

I am a heading

If you’re curious about all the stuff behind FROID, I’d highly suggest checking out this video by Andy Pavlo (b|t). It’s a great talk based on the academic material written about it, and made quite approachable and understandable even if you don’t understand all those fancy symbols that math people tell me are really numbers.

If you like this video, I’d highly suggest following Andy’s classes on YouTube. While they’re pretty database agnostic, they’re still awesome to watch to get a better understanding of how and why databases work, and do what they do. And there are some hot beats, to boot.

Thanks for reading!

Going Further

Trace Flag 3608 Disables Automatic Statistics Creation And Ghost Record Cleanup In SQL Server

Posted on September 14, 2021May 16, 2022 by Erik Darling

Short Bad

According to the docs:

Prevents SQL Server from automatically starting and recovering any database except the master database. If activities that require TempDB are initiated, then model is recovered and TempDB is created. Other databases will be started and recovered when accessed. Some features, such as snapshot isolation and read committed snapshot, might not work. Use for Move System Databases and Move User Databases.

Note: Do not use during normal operation.

Scope: global only

But it turns out it can do quite a bit of harm. It seems that many things stop working when it’s in use, though, including statistics getting automatically created.

Here’s a repro script:

DROP TABLE IF EXISTS

dbo.stats_test,

dbo.stats_test_3608;

CREATE TABLE dbo.stats_test

(

id int NOT NULL

);

INSERT

dbo.stats_test

(

)

VALUES

(

);

SELECT

st.*

FROM dbo.stats_test AS st

WHERE st.id = 1;

DBCC TRACEON(3608, -1);

CREATE TABLE dbo.stats_test_3608

(

id int NOT NULL

);

INSERT

dbo.stats_test_3608

(

)

VALUES

(

);

SELECT

st.*

FROM dbo.stats_test AS st

WHERE st.id = 1;

DBCC TRACEOFF(3608, -1);

DROP TABLE IF EXISTS dbo.stats_test, dbo.stats_test_3608; CREATE TABLE dbo.stats_test ( id int NOT NULL ); GO INSERT dbo.stats_test ( id ) VALUES ( 0 ); GO SELECT st.* FROM dbo.stats_test AS st WHERE st.id = 1; GO DBCC TRACEON(3608, -1); GO CREATE TABLE dbo.stats_test_3608 ( id int NOT NULL ); GO INSERT dbo.stats_test_3608 ( id ) VALUES ( 0 ); GO SELECT st.* FROM dbo.stats_test AS st WHERE st.id = 1; GO DBCC TRACEOFF(3608, -1); GO

DROP TABLE IF EXISTS
    dbo.stats_test,
    dbo.stats_test_3608;


CREATE TABLE dbo.stats_test
(
    id int NOT NULL
);
GO 

INSERT
    dbo.stats_test
(
    id
)
VALUES
(
    0
);
GO

SELECT 
    st.* 
FROM dbo.stats_test AS st 
WHERE st.id = 1;
GO 

DBCC TRACEON(3608, -1);
GO 

CREATE TABLE dbo.stats_test_3608
(
    id int NOT NULL
);
GO 

INSERT
    dbo.stats_test_3608
(
    id
)
VALUES
(
    0
);
GO

SELECT 
    st.* 
FROM dbo.stats_test AS st 
WHERE st.id = 1;
GO 

DBCC TRACEOFF(3608, -1);
GO

And here are the results:

End Times

This post comes from finding the trace flag enabled at a client site, and nearly every query plan having warnings about columns with missing statistics.

Sometimes this happens when statistics exist for a single column, but multi-column statistics don’t.

In this case, disabling the trace flag fixed things.

Ghost Busters

Even forcing ghost record cleanup won’t work with this trace flag on.

DBCC TRACEOFF(3608, -1);

INSERT

dbo.stats_test WITH(TABLOCK)

(

)

SELECT

m.message_id

FROM sys.messages AS m;

DELETE dbo.stats_test;

DBCC FORCEGHOSTCLEANUP;

SELECT

record_count,

ghost_record_count,

version_ghost_record_count

FROM sys.dm_db_index_physical_stats

(

DB_ID(),

OBJECT_ID('stats_test'),

'DETAILED'

);

DBCC TRACEON(3608, -1);

INSERT

dbo.stats_test_3608 WITH(TABLOCK)

(

)

SELECT

m.message_id

FROM sys.messages AS m;

DELETE dbo.stats_test_3608;

DBCC FORCEGHOSTCLEANUP;

SELECT

record_count,

ghost_record_count,

version_ghost_record_count

FROM sys.dm_db_index_physical_stats

(

DB_ID(),

OBJECT_ID('stats_test_3608'),

'DETAILED'

);

DBCC TRACEOFF(3608, -1);

DBCC TRACEOFF(3608, -1); GO INSERT dbo.stats_test WITH(TABLOCK) ( id ) SELECT m.message_id FROM sys.messages AS m; GO DELETE dbo.stats_test; GO DBCC FORCEGHOSTCLEANUP; GO SELECT record_count, ghost_record_count, version_ghost_record_count FROM sys.dm_db_index_physical_stats ( DB_ID(), OBJECT_ID('stats_test'), 0, 0, 'DETAILED' ); GO DBCC TRACEON(3608, -1); GO INSERT dbo.stats_test_3608 WITH(TABLOCK) ( id ) SELECT m.message_id FROM sys.messages AS m; GO DELETE dbo.stats_test_3608; GO DBCC FORCEGHOSTCLEANUP; GO SELECT record_count, ghost_record_count, version_ghost_record_count FROM sys.dm_db_index_physical_stats ( DB_ID(), OBJECT_ID('stats_test_3608'), 0, 0, 'DETAILED' ); GO DBCC TRACEOFF(3608, -1); GO

DBCC TRACEOFF(3608, -1);
GO 

INSERT 
    dbo.stats_test WITH(TABLOCK)
(
    id
)
SELECT
    m.message_id
FROM sys.messages AS m;
GO 

DELETE dbo.stats_test;
GO 

DBCC FORCEGHOSTCLEANUP;
GO 

SELECT
    record_count,
    ghost_record_count,
    version_ghost_record_count
FROM sys.dm_db_index_physical_stats
(
    DB_ID(), 
    OBJECT_ID('stats_test'), 
    0, 
    0, 
    'DETAILED'
);
GO 

DBCC TRACEON(3608, -1);
GO 

INSERT 
    dbo.stats_test_3608 WITH(TABLOCK)
(
    id
)
SELECT
    m.message_id
FROM sys.messages AS m;
GO 

DELETE dbo.stats_test_3608;
GO 

DBCC FORCEGHOSTCLEANUP;
GO 

SELECT
    record_count,
    ghost_record_count,
    version_ghost_record_count
FROM sys.dm_db_index_physical_stats
(
    DB_ID(), 
    OBJECT_ID('stats_test_3608'), 
    0, 
    0, 
    'DETAILED'
);
GO 

DBCC TRACEOFF(3608, -1);
GO

Turn Off

So, uh, if you see this enabled anywhere, you should disable it.

Thanks for reading!

Going Further

Things SQL Server vNext Should Address: Make Dynamic SQL Easily Traceable

Posted on September 13, 2021May 16, 2022 by Erik Darling

An Attempt

One tricky thing about working with dynamic SQL is that it’s rather unaccountable. You have a stored procedure, you build up a string, you execute it, and no one wants to claim responsibility.

Like a secret agent, or an ugly baby.

It would be nice if sp_executesql had an additional parameter to assign an object id to the code block so that when you’re looking at the plan cache or Query Store, you know immediately where the query came from.

Here’s an example.

A Contempt

Let’s use this as an example:

CREATE OR ALTER PROCEDURE dbo.dynamo

SET NOCOUNT, XACT_ABORT ON;

BEGIN

DECLARE

@sql nvarchar(MAX) = N'';

SELECT TOP (1)

b.*

FROM dbo.Badges AS b

WHERE b.UserId = 22656

ORDER BY b.Date DESC

SELECT

@sql = N'

/*dbo.dynamo*/

SELECT

c = COUNT_BIG(*)

FROM dbo.Users AS u

JOIN dbo.Posts AS p

ON p.OwnerUserId = u.Id;

EXEC sys.sp_executesql

@sql;

END;

CREATE OR ALTER PROCEDURE dbo.dynamo AS SET NOCOUNT, XACT_ABORT ON; BEGIN DECLARE @sql nvarchar(MAX) = N''; SELECT TOP (1) b.* FROM dbo.Badges AS b WHERE b.UserId = 22656 ORDER BY b.Date DESC SELECT @sql = N' /*dbo.dynamo*/ SELECT c = COUNT_BIG(*) FROM dbo.Users AS u JOIN dbo.Posts AS p ON p.OwnerUserId = u.Id; '; EXEC sys.sp_executesql @sql; END; GO

CREATE OR ALTER PROCEDURE dbo.dynamo
AS
SET NOCOUNT, XACT_ABORT ON;
BEGIN

    DECLARE
        @sql nvarchar(MAX) = N'';
    
    SELECT TOP (1)
        b.*
    FROM dbo.Badges AS b
    WHERE b.UserId = 22656
    ORDER BY b.Date DESC

    SELECT
        @sql = N'    
    /*dbo.dynamo*/
    SELECT
        c = COUNT_BIG(*)
    FROM dbo.Users AS u
    JOIN dbo.Posts AS p
        ON p.OwnerUserId = u.Id;    
        ';
    
    EXEC sys.sp_executesql
        @sql;
    
END;
GO

This is, by all accounts, Properly Written Dynamic SQL™

I know, this doesn’t need to be dynamic SQL, but I don’t need a great example of that to show what I mean. The first query is there to get the procedure to show up anywhere, and the dynamic SQL is there to show you that… dynamic SQL doesn’t show up as associated with the procedure that called it.

If we execute the procedure, and then look for the details of it in Query Store, all we get back it the first query.

EXEC dbo.dynamo;

EXEC sp_QuickieStore

@database_name = 'StackOverflow2013',

@procedure_schema = 'dbo',

@procedure_name = 'dynamo';

EXEC dbo.dynamo; GO EXEC sp_QuickieStore @database_name = 'StackOverflow2013', @procedure_schema = 'dbo', @procedure_name = 'dynamo'; GO

EXEC dbo.dynamo;
GO 

EXEC sp_QuickieStore
    @database_name = 'StackOverflow2013',
    @procedure_schema = 'dbo',
    @procedure_name = 'dynamo';
GO

twenty minutes

It sure would be nice to know that this procedure executed a whole other query.

A Temp

There’s no great workaround for this, but you can at least get a hint that something else happened if you dump the dynamic SQL results into a temp table.

CREATE OR ALTER PROCEDURE dbo.dynamo_insert

SET NOCOUNT, XACT_ABORT ON;

BEGIN

DECLARE

@sql nvarchar(MAX) = N'';

CREATE TABLE

#results

(

c bigint

);

SELECT TOP (1)

b.*

FROM dbo.Badges AS b

WHERE b.UserId = 22656

ORDER BY b.Date DESC

SELECT

@sql = N'

/*dbo.dynamo*/

SELECT

c = COUNT_BIG(*)

FROM dbo.Users AS u

JOIN dbo.Posts AS p

ON p.OwnerUserId = u.Id;

INSERT

#results WITH(TABLOCK)

(

)

EXEC sys.sp_executesql

@sql;

SELECT

r.*

FROM #results AS r

END;

CREATE OR ALTER PROCEDURE dbo.dynamo_insert AS SET NOCOUNT, XACT_ABORT ON; BEGIN DECLARE @sql nvarchar(MAX) = N''; CREATE TABLE #results ( c bigint ); SELECT TOP (1) b.* FROM dbo.Badges AS b WHERE b.UserId = 22656 ORDER BY b.Date DESC SELECT @sql = N' /*dbo.dynamo*/ SELECT c = COUNT_BIG(*) FROM dbo.Users AS u JOIN dbo.Posts AS p ON p.OwnerUserId = u.Id; '; INSERT #results WITH(TABLOCK) ( c ) EXEC sys.sp_executesql @sql; SELECT r.* FROM #results AS r END; GO

CREATE OR ALTER PROCEDURE dbo.dynamo_insert
AS
SET NOCOUNT, XACT_ABORT ON;
BEGIN

    DECLARE
        @sql nvarchar(MAX) = N'';

    CREATE TABLE
        #results
    (
        c bigint
    );
    
    SELECT TOP (1)
        b.*
    FROM dbo.Badges AS b
    WHERE b.UserId = 22656
    ORDER BY b.Date DESC

    SELECT
        @sql = N'    
    /*dbo.dynamo*/
    SELECT
        c = COUNT_BIG(*)
    FROM dbo.Users AS u
    JOIN dbo.Posts AS p
        ON p.OwnerUserId = u.Id;    
        ';
    
    INSERT
        #results WITH(TABLOCK)
    (
        c
    )    
    EXEC sys.sp_executesql
        @sql;

    SELECT
        r.*
    FROM #results AS r
    
END;
GO

This still sucks though, because we don’t know what the dynamic portion of the query did.

The query plan looks like this, with no real details or metrics:

A Fix

It would be super if sp_executesql took an additional parameter in the context of a stored procedure that could be assigned to a @@PROCID.

EXEC sys.sp_executesql

@sql,

@object_id = @@PROCID;

EXEC sys.sp_executesql @sql, @object_id = @@PROCID;

EXEC sys.sp_executesql
    @sql,
    @object_id = @@PROCID;

This would avoid all the headless dynamic SQL horsemen running around, and make it easier to locate procedure statements by searching for the procedure that executes them, rather than having to search a bunch of SQL text for a commented procedure name.

Sure, it’s fine if you stumble across dynamic SQL with a comment pointing to the procedure that runs it, but I hardly see anyone doing that.

Thanks for reading!

Going Further

Wait Stats During Exchange Spills In SQL Server Parallel Query Plans

Posted on September 10, 2021May 16, 2022 by Erik Darling

Inorganic

Lots of people say these are rare — and perhaps they are becoming more so — but I still see them with some frequency. That’s not a terribly strong indictment of SQL Server, but that’s not the point of this blog or of my work.

Exchange spills are particularly interesting, and I’ve written about them a couple times:

For each query we run, we’re gonna execute the query, and then this query to look at session level wait stats.

SELECT

desws.*

FROM sys.dm_exec_session_wait_stats AS desws

WHERE desws.session_id = @@SPID

ORDER BY desws.wait_time_ms DESC;

SELECT desws.* FROM sys.dm_exec_session_wait_stats AS desws WHERE desws.session_id = @@SPID ORDER BY desws.wait_time_ms DESC;

SELECT
    desws.*
FROM sys.dm_exec_session_wait_stats AS desws
WHERE desws.session_id = @@SPID
ORDER BY desws.wait_time_ms DESC;

Turns Out

The first thing to understand here is that the memory grant itself doesn’t matter. That’s why there’s no hint for it on this query, as there have been in the others.

WITH

precheck AS

(

SELECT

u.Id,

p.Id AS PostId,

p.AcceptedAnswerId,

n =

ROW_NUMBER() OVER

(

ORDER BY

u.Id

)

FROM dbo.Users AS u

JOIN dbo.Posts AS p

ON p.OwnerUserId = u.Id

WHERE u.CreationDate > '20110825'

AND u.Reputation > 100

AND p.PostTypeId = 1

)

SELECT

u.*,

p.*

FROM precheck p

JOIN dbo.Users AS u

ON p.Id = u.Id

WHERE ISNULL(p.n, u.Id) = 0

ORDER BY u.Id, u.Reputation

OPTION(USE HINT('ENABLE_PARALLEL_PLAN_PREFERENCE'), MERGE JOIN);

WITH precheck AS ( SELECT u.Id, p.Id AS PostId, p.AcceptedAnswerId, n = ROW_NUMBER() OVER ( ORDER BY u.Id ) FROM dbo.Users AS u JOIN dbo.Posts AS p ON p.OwnerUserId = u.Id WHERE u.CreationDate > '20110825' AND u.Reputation > 100 AND p.PostTypeId = 1 ) SELECT u.*, p.* FROM precheck p JOIN dbo.Users AS u ON p.Id = u.Id WHERE ISNULL(p.n, u.Id) = 0 ORDER BY u.Id, u.Reputation OPTION(USE HINT('ENABLE_PARALLEL_PLAN_PREFERENCE'), MERGE JOIN);

WITH 
    precheck AS 
(
    SELECT
        u.Id, 
        p.Id AS PostId,
        p.AcceptedAnswerId,
        n = 
            ROW_NUMBER() OVER
            (
                ORDER BY
            	    u.Id
            )
    FROM dbo.Users AS u
    JOIN dbo.Posts AS p
        ON p.OwnerUserId = u.Id
    WHERE  u.CreationDate > '20110825' 
    AND    u.Reputation > 100 
    AND    p.PostTypeId = 1
)
SELECT 
    u.*, 
    p.*
FROM precheck p
JOIN dbo.Users AS u
    ON p.Id = u.Id
WHERE ISNULL(p.n, u.Id) = 0
ORDER BY u.Id, u.Reputation
OPTION(USE HINT('ENABLE_PARALLEL_PLAN_PREFERENCE'), MERGE JOIN);

We do need to ask for a parallel merge join, because parallel merge joins were a mistake and should not have optimizer support.

Thanks for watching.

Back Brain

The waits here are interesting. At the top, we have perhaps high expected waits on parallelism, because the spills are encountered at parallel exchanges.

There are quite high waits on PAGEIOLATCH_EX, SLEEP_TASK, and SLEEP_BPOOL_STEAL rounding out the top five. This is quite interesting, because I’ve never explicitly thought of PAGEIOLATCH_EX waits in the context of exchange spills. Normally, I think of them when queries read pages from disk into memory for modification.

Going down the line, SLEEP_TASK is familiar from our time spent with hash spills, but SLEEP_BPOOL_STEAL is so far undocumented anywhere.

One can make some inferences based on the name about what it might be doing, but… It could also be a bit of a catch-all wait, like other waits that can be generated by spills.

From Me

Going further, the other interesting waits encountered here look like so:

If I had a Magic Eight Ball out right now, it’d probably just go into convulsions. Perhaps SLEEP_TASK (shown one above) is another clear sign of a different kind of spill, but there’s so much else going on here it’s hard to tell for certain which waits this query generated were from the spill, and which were from other portions of the query executing regularly.

Points for hitting PREEMPTIVE waits, though.

To You

You weren’t mis-reading when you saw the query plan. This really did execute for over two hours. I tried to get another exchange spill into the mix by changing the query slightly, but all I got was a very cranky Repartition Streams.

WITH

precheck AS

(

SELECT

u.Id,

p.Id AS PostId,

p.AcceptedAnswerId,

p.Body, --New

n =

ROW_NUMBER() OVER

(

ORDER BY

u.Id

)

FROM dbo.Users AS u

JOIN dbo.Posts AS p WITH(INDEX = p)

ON p.OwnerUserId = u.Id

WHERE u.CreationDate > '20110825'

AND u.Reputation > 100

AND p.PostTypeId = 1

)

SELECT

u.*,

p.*

FROM precheck p

JOIN dbo.Users AS u

ON p.Id = u.Id

WHERE ISNULL(p.n, u.Id) = 0

ORDER BY u.Id, u.Reputation

OPTION(USE HINT('ENABLE_PARALLEL_PLAN_PREFERENCE'), MERGE JOIN);

WITH precheck AS ( SELECT u.Id, p.Id AS PostId, p.AcceptedAnswerId, p.Body, --New n = ROW_NUMBER() OVER ( ORDER BY u.Id ) FROM dbo.Users AS u JOIN dbo.Posts AS p WITH(INDEX = p) ON p.OwnerUserId = u.Id WHERE u.CreationDate > '20110825' AND u.Reputation > 100 AND p.PostTypeId = 1 ) SELECT u.*, p.* FROM precheck p JOIN dbo.Users AS u ON p.Id = u.Id WHERE ISNULL(p.n, u.Id) = 0 ORDER BY u.Id, u.Reputation OPTION(USE HINT('ENABLE_PARALLEL_PLAN_PREFERENCE'), MERGE JOIN);

WITH 
    precheck AS 
(
    SELECT
        u.Id, 
        p.Id AS PostId,
        p.AcceptedAnswerId,
        p.Body, --New
        n = 
            ROW_NUMBER() OVER
            (
                ORDER BY
            	    u.Id
            )
    FROM dbo.Users AS u
    JOIN dbo.Posts AS p WITH(INDEX = p)
        ON p.OwnerUserId = u.Id
    WHERE  u.CreationDate > '20110825' 
    AND    u.Reputation > 100 
    AND    p.PostTypeId = 1
)
SELECT 
    u.*, 
    p.*
FROM precheck p
JOIN dbo.Users AS u
    ON p.Id = u.Id
WHERE ISNULL(p.n, u.Id) = 0
ORDER BY u.Id, u.Reputation
OPTION(USE HINT('ENABLE_PARALLEL_PLAN_PREFERENCE'), MERGE JOIN);

And it ran for an extra 45 or so minutes, too.

Ugly Baby

The results for this second query wait stats wise are just an amplified version of the original, of course.

Obviously dragging the Body column around had a rather profound influence on things.

Whole And Wet

These waits are perhaps less obvious than other spills, though one could make reasonable assumptions about SLEEP_TASK waits here as well.

I’d be concerned about any query emitting high amounts of it, or any server with high amounts of it compared to uptime.

It seems unfortunate that there’s no wait directly tied to hash spills, but perhaps when we’re all living in some quantum batch mode reality in the cloud it won’t matter at all.

Thanks for reading!

Going Further

Wait Stats During Hash Spills In SQL Server Query Plans

Posted on September 9, 2021May 16, 2022 by Erik Darling

Jam Job

Hash spills are, as we’ll see, sometimes identified by a different wait than sort spills. In small quantities, spills are often not worth bothering with. But when they pile up, they can really cause some severe performance issues.

In this post, I want to show that both Hash Aggregates and Joins can cause the same wait type, along with some evidence that strings make things worse.

In all the queries, I’m going to be using the MAX_GRANT_PERCENT hint to set the memory grant ridiculously low, to make the waits I care about stick out.

For each query we run, we’re gonna execute the query, and then this query to look at session level wait stats.

SELECT

desws.*

FROM sys.dm_exec_session_wait_stats AS desws

WHERE desws.session_id = @@SPID

ORDER BY desws.wait_time_ms DESC;

SELECT desws.* FROM sys.dm_exec_session_wait_stats AS desws WHERE desws.session_id = @@SPID ORDER BY desws.wait_time_ms DESC;

SELECT
    desws.*
FROM sys.dm_exec_session_wait_stats AS desws
WHERE desws.session_id = @@SPID
ORDER BY desws.wait_time_ms DESC;

Merch Pants

First up, a highly doctored hash aggregate:

SELECT

c.CreationDate,

c.PostId,

c.Score,

c.UserId

FROM dbo.Comments AS c

GROUP BY

c.CreationDate,

c.PostId,

c.Score,

c.UserId

HAVING COUNT_BIG(*) > 2147483647

OPTION(HASH GROUP, QUERYRULEOFF GenLGAgg, MAX_GRANT_PERCENT = 0.0);

SELECT c.CreationDate, c.PostId, c.Score, c.UserId FROM dbo.Comments AS c GROUP BY c.CreationDate, c.PostId, c.Score, c.UserId HAVING COUNT_BIG(*) > 2147483647 OPTION(HASH GROUP, QUERYRULEOFF GenLGAgg, MAX_GRANT_PERCENT = 0.0);

SELECT   
    c.CreationDate, 
    c.PostId, 
    c.Score, 
    c.UserId
FROM dbo.Comments AS c
GROUP BY 
    c.CreationDate, 
    c.PostId, 
    c.Score, 
    c.UserId
HAVING COUNT_BIG(*) > 2147483647 
OPTION(HASH GROUP, QUERYRULEOFF GenLGAgg, MAX_GRANT_PERCENT = 0.0);

But the important thing here is that there are no strings involved.

The spill goes on for about two minutes and twenty seconds, in row mode, at DOP 8.

That sure is bad, but in the words of Sticky Fingaz: Bu-bu-bu-but wait it gets worse.

Foolproof Plan

Let’s pull out another highly doctored hash aggregate, this time with our friend the Text column.

SELECT

c.CreationDate,

c.PostId,

c.Score,

c.Text,

c.UserId

FROM dbo.Comments AS c

GROUP BY

c.CreationDate,

c.PostId,

c.Score,

c.Text,

c.UserId

HAVING COUNT_BIG(*) > 2147483647

OPTION(HASH GROUP, QUERYRULEOFF GenLGAgg, MAX_GRANT_PERCENT = 0.0);

SELECT c.CreationDate, c.PostId, c.Score, c.Text, c.UserId FROM dbo.Comments AS c GROUP BY c.CreationDate, c.PostId, c.Score, c.Text, c.UserId HAVING COUNT_BIG(*) > 2147483647 OPTION(HASH GROUP, QUERYRULEOFF GenLGAgg, MAX_GRANT_PERCENT = 0.0);

SELECT   
    c.CreationDate, 
    c.PostId, 
    c.Score, 
    c.Text, 
    c.UserId
FROM dbo.Comments AS c
GROUP BY 
    c.CreationDate, 
    c.PostId, 
    c.Score, 
    c.Text, 
    c.UserId
HAVING COUNT_BIG(*) > 2147483647 
OPTION(HASH GROUP, QUERYRULEOFF GenLGAgg, MAX_GRANT_PERCENT = 0.0);

We see more of our friend SLEEP_TASK. Again, many other things may add to this wait, but holy hoowee, this is hard to ignore.

That’s a solid — heck, let’s just call it 18 minutes — of spill time. That’s just plain upsetting.

Filthy.

And Join

Causing the same problem where a Hash Join is in play will exhibit the same wait.

SELECT

c.*

FROM dbo.Votes AS v

LEFT JOIN dbo.Comments AS c

ON v.PostId = c.PostId

WHERE ISNULL(v.UserId, c.UserId) > 2147483647

OPTION(MAX_GRANT_PERCENT = 0.0);

SELECT c.* FROM dbo.Votes AS v LEFT JOIN dbo.Comments AS c ON v.PostId = c.PostId WHERE ISNULL(v.UserId, c.UserId) > 2147483647 OPTION(MAX_GRANT_PERCENT = 0.0);

SELECT 
    c.*
FROM dbo.Votes AS v
LEFT JOIN dbo.Comments AS c
    ON v.PostId = c.PostId
WHERE ISNULL(v.UserId, c.UserId) > 2147483647
OPTION(MAX_GRANT_PERCENT = 0.0);

Now we get stuck spilling for about 21 minutes, which is also awkward and uncomfortable.

Funkel

We’ve looked at sort spills being the cause of IO_COMPLETION waits, and hash spills being the cause of SLEEP_TASK waits.

Again, if you see a lot of these waits on your servers, you may want to check out the query here to find plans in the cache that are selects that cause writes, for reasons explained in the linked post.

Tomorrow we’ll wake up bright and early to look at which waits crop up during exchange spills.

Thanks for reading!

Going Further

Wait Stats During Sort Spills In SQL Server Query Plans

Posted on September 8, 2021May 16, 2022 by Erik Darling

Over Under

Over the next few days, I’m going to look at the wait stats that show up during different types of spills of different sizes. For no reason whatsoever, I’m going to start with sort spills. I’ll also cover hash and exchange spills, in case you were wondering just how long you’d have to sit here reading about them.

The point is not that spills are the sole things that cause these waits, it’s just to give you some things to potentially watch out for if you see these waits piling up and can’t pin down where they’re coming from.

In all the queries, I’m going to be using the MAX_GRANT_PERCENT hint to set the memory grant ridiculously low to make the waits I care about stick out.

For each query we run, we’re gonna execute the query, and then this query to look at session level wait stats.

SELECT

desws.*

FROM sys.dm_exec_session_wait_stats AS desws

WHERE desws.session_id = @@SPID

ORDER BY desws.wait_time_ms DESC;

SELECT desws.* FROM sys.dm_exec_session_wait_stats AS desws WHERE desws.session_id = @@SPID ORDER BY desws.wait_time_ms DESC;

SELECT
    desws.*
FROM sys.dm_exec_session_wait_stats AS desws
WHERE desws.session_id = @@SPID
ORDER BY desws.wait_time_ms DESC;

No Strings Attached

Strings are horrible. The bigger they are, they worse they are. Nearly every time I see oversized string columns, I see them causing problems.

File that under obvious foreshadowing while we look at a sort that doesn’t involve any strings.

SELECT

c.Id,

c.CreationDate,

c.PostId,

c.Score,

c.UserId

FROM

(

SELECT

c.Id,

c.CreationDate,

c.PostId,

c.Score,

c.UserId,

n =

ROW_NUMBER() OVER

(

ORDER BY

c.PostId DESC

)

FROM dbo.Comments AS c

) AS c

WHERE c.n = 0

OPTION(MAX_GRANT_PERCENT = 0.0);

SELECT c.Id, c.CreationDate, c.PostId, c.Score, c.UserId FROM ( SELECT c.Id, c.CreationDate, c.PostId, c.Score, c.UserId, n = ROW_NUMBER() OVER ( ORDER BY c.PostId DESC ) FROM dbo.Comments AS c ) AS c WHERE c.n = 0 OPTION(MAX_GRANT_PERCENT = 0.0);

SELECT 
    c.Id, 
    c.CreationDate, 
    c.PostId, 
    c.Score, 
    c.UserId
FROM 
(
    SELECT 
        c.Id, 
        c.CreationDate, 
        c.PostId, 
        c.Score, 
        c.UserId,
        n = 
            ROW_NUMBER() OVER 
            ( 
                ORDER BY 
                    c.PostId DESC 
            )
    FROM dbo.Comments AS c
) AS c
WHERE c.n = 0
OPTION(MAX_GRANT_PERCENT = 0.0);

This query executes at DOP 8 in row mode, and the primary wait that racks up is IO_COMPLETION.

Strings Attached

Remember when I told you to file that thing up there under another thing? Here’s why.

In this query, we’re also going to select the Text column from the Comments table , which is an nvarchar(700).

SELECT

c.Id,

c.CreationDate,

c.PostId,

c.Score,

c.Text, --New

c.UserId

FROM

(

SELECT

c.Id,

c.CreationDate,

c.PostId,

c.Score,

c.Text, --New

c.UserId,

n =

ROW_NUMBER() OVER

(

ORDER BY

c.PostId DESC

)

FROM dbo.Comments AS c

) AS c

WHERE c.n = 0

OPTION(MAX_GRANT_PERCENT = 0.0);

SELECT c.Id, c.CreationDate, c.PostId, c.Score, c.Text, --New c.UserId FROM ( SELECT c.Id, c.CreationDate, c.PostId, c.Score, c.Text, --New c.UserId, n = ROW_NUMBER() OVER ( ORDER BY c.PostId DESC ) FROM dbo.Comments AS c ) AS c WHERE c.n = 0 OPTION(MAX_GRANT_PERCENT = 0.0);

SELECT 
    c.Id, 
    c.CreationDate, 
    c.PostId, 
    c.Score, 
    c.Text, --New
    c.UserId
FROM 
(
    SELECT 
        c.Id, 
        c.CreationDate, 
        c.PostId, 
        c.Score, 
        c.Text, --New
        c.UserId,
        n = 
            ROW_NUMBER() OVER 
            ( 
                ORDER BY 
                    c.PostId DESC 
            )
    FROM dbo.Comments AS c
) AS c
WHERE c.n = 0
OPTION(MAX_GRANT_PERCENT = 0.0);

In this case, things get way worse. If you’re surprised, you should try reading blog posts; particularly this one.

Rather than ~16 seconds of spilling, we end up with a bit over four minutes of spilling. This is as the Mayans foretold, but the 2012 calendar was drunk and got the last two digits backwards.

Incomplete

Again, there are many potential causes of this wait, but on servers where I see them really piling up and ending up in top waits, there tends to be a dearth of memory and a lot of queries spilling to disk during sorts.

If you see this wait piling up, you may want to check out the query here to look for select queries that cause writes, which can indicate spills (among other things).

Thanks for reading!