Spinning Disk Mentality In SQL Server: Why You Don’t Need To Defragment Indexes Or Change Fill Factor

Pervasive


I often have to talk clients down from odd ledges. At this point, there’s about 20 years of advice about SQL Server out there, and for various reasons received wisdom often gets migrated en masse, whether it’s still applicable or not.

Outdated trace flags and settings are pretty common, and don’t get me started on coding practices.

But I get it — that stuff came along with advice that still makes sense — like parallelism and memory settings, separating off tempdb, etc.

Unfortunately, it often leads people to the “it used to be fast so it should always be fast” trap based on what used to work, rather than what makes sense currently. Or even what the current problem is.

And to avoid making any meaningful changes, they often double down on bad ideas and flail around with nonsensical ones.

Surprise and Dismay


Some advice made a lot of sense when you had servers on old spinning disks, and 32bit software with 2-3GB of RAM available for user space tasks.

You just couldn’t cache much data, and every time those disk heads had to pick up and spin about, things got creaky. Modern storage tends to avoid such dilemmas, but people still treat it like a record player that might skip if they rub a little too much funk on their Roger Rabbit.

Things like changing fill factor and constant index maintenance just aren’t the problem solvers they used to be, back when I/O — especially the random variety — was quite a nuisance to accomplish. SSD, Flash, and RAM just don’t have those moving pieces for you to concern yourself with.

I’m not saying there’s not a time a place to make those changes, but I am saying that the ROI on them is much lower than it used to be.

Pick On


Not surprisingly, I see people doing quite irresponsible things without measuring any metric particular to the what setting(s) they’re changing. The only expected outcome seems to be nods of approval if it “seems faster” or “got a little better”.

This process also seems to avoid determining what actual problems are, and focusing on a bit of advice from one of three blog posts by an author from 2009 where none of the pictures load and the code formatting is just italicized text.

And hey, look, if that’s your fetish, cool. There’s certainly some invaluable gems out there that Microsoft has managed to not delete yet, or migrate for the eleventeenth time and break every link in existence by tacking a GUID to the end of it.

The conversation usually goes something like:

0d5
lung

“Why is fill factor 60 on every index?”

“To cut down on page splits.”

“Did you have a lot of those?”

“I don’t know, xxVanWilderFan420xx said they were bad and we should avoid them.”

Well, okay then.

I suppose cargo culting around things that don’t work at least makes you look busy.

After all, you can just copy and paste that italicized code and F5 your way to nowhere.

And Chew


I’m probably guilty of this too, with things I don’t quite understand or I’m not knowledgeable about.

A while back I had an issue with Windows BSODs constantly, and all the advice I could find came from posts on NVIDIA forums (dated 2012, of course) that suggested rebuilding the ESET database and doing a clean install of the drivers.

Did I try it? You bet I did.

Did I try it more than once? You’re on a roll if you said yes.

But all it did was prolong fixing the real problem, which turned out to be some RAM that had gone bad.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Some Stack Exchange Questions I’ve Answered Recently

Good Citizen


When I have spare time, I answer questions over at the Database Administrators Stack Exchange site.

Here are some recent ones that I’ve found interesting:

If you like this style of Q&A, I’d highly suggest signing up and contributing. It’s a great way to share and learn.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

SARGability Week: Wrap Up

One Metric Week


This went on a bit longer than I thought it would, but to someone who tries to blog five days a week, that’s a good thing.

While this isn’t the most glamorous subject in the world, it seems it’s still a necessary one to write about. I don’t quite know how to feel about that, as I’ve read and written about it many times over the past 10 or so years.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

SARGability Week: Why Implicit Conversions Aren’t SARGable

President Precedence


Data types are one of those core things you need to get right. Whether it’s matching types between join columns, or between parameters and predicates, not doing so can really squash application performance in quite similar ways to writing non-SARGable predicates.

That’s because — wait for it — a lot of the stuff we’ve talked about over the last week that can happen with poorly written predicates can happen with poorly matched data types, too.

The main thing to remember, is that aside from max datatypes, what matters most in a situation with implicit conversion is that it doesn’t take place on a column. If it happens on a parameter or variable it’s far less of an issue, but it can still cause oddities in query plans, and with cardinality estimation.

Speak And Spell


Most of the problems I see with implicit conversion is with other datatypes being compared to nchar/nvarchar types, but not always.

For example, this query results in an index seek despite a variable declared as nvarchar(11).

DECLARE 
    @i nvarchar(11) = N'22656';

SELECT
   u.*
FROM dbo.Users AS u
WHERE u.AccountId = @i;
GO 

Note that the convert_implicit function is applied to the variable, and not the AccountId column.

SQL Server Query Plan
cameo

Cahn


To show an example of when implicit conversions act the same way was non-SARGable predicates, let’s make a crappy copy of some columns from the Users table.

DROP TABLE IF EXISTS
    dbo.UsersBad;

SELECT u.Id,
       ISNULL
       (
           CONVERT
           (
               varchar(40), 
               u.DisplayName
           ), 
           ''
       ) AS DisplayName,
       ISNULL
       (
           CONVERT
           (
               nvarchar(11), 
               u.Reputation
           ), 
           ''
       ) AS Reputation
INTO dbo.UsersBad
FROM dbo.Users AS u;

ALTER TABLE dbo.UsersBad 
    ADD CONSTRAINT PK_UsersBad_Id 
        PRIMARY KEY CLUSTERED (Id);

CREATE INDEX ur ON dbo.UsersBad(Reputation);
CREATE INDEX ud ON dbo.UsersBad(DisplayName);

Here, we’re converting DisplayName from nvarchar, and Reputation from an int. We’re also creating some indexes that will be rendered nearly useless by implicit conversions.

CREATE OR ALTER PROCEDURE 
    dbo.BadUsersQuery
(
    @DisplayName nvarchar(40),
    @Reputation  int
)
AS
BEGIN

    SELECT 
        u.DisplayName
    FROM dbo.UsersBad AS u
    WHERE u.DisplayName = @DisplayName;

    SELECT
        Reputation = 
            MAX(u.Reputation)
    FROM dbo.UsersBad AS u
    WHERE u.Reputation = @Reputation;

END;
GO 

EXEC dbo.BadUsersQuery 
    @DisplayName = N'Eggs McLaren',
    @Reputation = 1787;

Imaging


For both of these, the convert_implicit winds up on the column rather than on the parameter.

SQL Server Query Plan
whew

And that’s what makes for the type of implicit conversion that causes most performance problems I see.

This is a relatively small table, so the hit isn’t too bad, but comparatively it’s much worse, like in all the other examples of SARGability we’ve seen lately.

Aware


I think most people who do performance tuning regularly are quite aware of this problem. There’s even a query plan warning about it, for those who don’t.

It’s definitely a good problem to solve, but it often leads to finding other problems. For example, we get the seek predicate warning regardless of if there’s an index we can seek to. If I drop all the indexes off of the UsersBad table and run a query like this, we still get a warning:

DECLARE
    @DisplayName nvarchar(40);

SELECT 
    u.DisplayName
FROM dbo.UsersBad AS u
WHERE u.DisplayName = @DisplayName;
SQL Server Query Plan
well no

It’s also worth noting that getting rid of the implicit conversion — much like fixing other non-SARGable predicates — may reveal missing index requests that weren’t there before.

DECLARE
    @DisplayName nvarchar(40) = N'Eggs McLaren';

SELECT 
    u.*
FROM dbo.UsersBad AS u
WHERE u.DisplayName = @DisplayName;
GO 

DECLARE
    @DisplayName varchar(40) = 'Eggs McLaren';

SELECT 
    u.*
FROM dbo.UsersBad AS u
WHERE u.DisplayName = @DisplayName;
GO
SQL Server Query Plan
relief

Haunch


Solving implicit conversion issues is just as important (and often easier) than solving other issues with SARGable predicates, and just as important.

Even though it’s one of the first performance problems people learn about, I still see it out there enough to write about it. I think a lot of the reason that it still crops up is because ORMs leave developers detached from the queries, and they don’t see how parameter types end up getting inferred when not strongly typed.

Anyway, that’s it for this series. Tomorrow’s post will be a wrap up with links.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

SARGability Week: Using Indexed Views To Make Predicates SARGable

Boniface


There are some things that, in the course of normal query writing, just can’t be SARGablized. For example, generating and filtering on a windowing function, a having clause, or any other runtime expression listed here.

There are some interesting ways to use indexed views to our advantage for some of those things. While windowing functions and having clauses can’t be directly in an indexed view, we can give an indexed view a good definition to support them.

I Don’t Care For It


Starting with these indexes to help things along, they don’t really do as much as we’d hope.

CREATE INDEX c
ON dbo.Comments (PostId);

CREATE INDEX v
ON dbo.Votes (PostId);

CREATE INDEX p
ON dbo.Posts (Id, OwnerUserId, Score);

This query has to process a ton of rows, and no matter what we set the having expression to, the entire result set has to be generated before it can be applied. We could set it to > 0 or > infinity and it would take the same amount of time to have a working set to apply it to.

SELECT 
    p.OwnerUserId,
    TotalScore = 
        SUM(ISNULL(p.Score * 1., 0.)),
    records = 
        COUNT_BIG(*)
FROM dbo.Posts AS p
JOIN  dbo.Comments AS c
    ON c.PostId = p.Id
JOIN dbo.Votes AS v
    ON v.PostId = p.Id
GROUP BY 
    p.OwnerUserId
HAVING 
    SUM(ISNULL(p.Score * 1., 0.)) > 5000000.
ORDER BY
    TotalScore DESC;

Limited Liability


I know that having clause looks funny there, but it’s not my fault. The sum of Score ends up being a really big integer, and overflows the regular sized integers unless you explicitly convert it to a bigint or implicitly convert it to something floaty. The isnull is there because the column is NULLable, which is unacceptable to an indexed view.

So, here we are, forced to write something weird to conform.

Sizzling. Sparkling.

SQL Server Query Plan
grumble

Ignoring the woefully misaligned and misleading operator times, we can see in the query plan that again(!) a late Filter operator is applied that aligns with the predicate in our having clause.

Sarging Ahead


Let’s work some magic, here.

CREATE VIEW 
    dbo.BunchaCrap
WITH 
    SCHEMABINDING
AS
SELECT 
    p.OwnerUserId,
    TotalScore = 
        SUM(ISNULL(p.Score * 1., 0.)),
    records = 
        COUNT_BIG(*)
FROM dbo.Posts AS p
JOIN  dbo.Comments AS c
    ON c.PostId = p.Id
JOIN dbo.Votes AS v
    ON v.PostId = p.Id
GROUP BY 
    p.OwnerUserId;
GO 

CREATE UNIQUE CLUSTERED INDEX bc ON dbo.BunchaCrap(OwnerUserId);

This gives us an indexed view with the TotalScore expression materialized, which means we can search on it directly now without all the 50 some-odd seconds of nonsense leading up to it.

At The Mall


Our options now are either to query the indexed view directly with a noexpand hint, or to run the query as initially designed and rely on expression matching to pick up on things.

SELECT 
    bc.*
FROM dbo.BunchaCrap AS bc WITH (NOEXPAND)
WHERE bc.TotalScore > 5000000.
ORDER BY bc.TotalScore DESC;

In either case, we’ll get this query plan now:

SQL Server Query Plan
mutt and jeff

Which looks a whole heck of a lot nicer.

Tomorrow, we’ll look at how implicit conversion can look a lot like non-SARGable predicates.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

SARGability Week: Rewriting Scalar User Defined Functions To Make Them SARGable

Cheap Replica


The bottom line on scalar UDFs is that they’re poison pills for performance.

They’re bad enough in the select list, but they get even worse if they appear in join or where clause portions of the query.

The example query we’re going to use doesn’t even go out and touch other tables, which can certainly make things worse. It does all its processing “in memory”.

CREATE FUNCTION dbo.nonsargable(@d datetime)
RETURNS bit
WITH SCHEMABINDING,
     RETURNS NULL ON NULL INPUT
AS 
BEGIN

RETURN
(
    SELECT 
        CASE 
            WHEN DATEDIFF
                 (
                     DAY, 
                     @d, 
                     GETDATE()
                 ) > 30
            THEN 1
            ELSE 0
        END
);

END;
GO

Much Merch


When we run this query, the plan is a messmare.

SELECT
    c = COUNT_BIG(*)
FROM dbo.Users AS u
WHERE u.Reputation > 100
AND   dbo.nonsargable(u.LastAccessDate) = 1;
SQL Server Query Plan
that again

The Filter operator is a familiar face at this point.

SQL Server Query Plan
mask up

The function has to run once per row that leaves the Users table, and the predicate has to be calculated and evaluated later at the filter operation.

Twisty


If we look in dm_exec_function_stats, we can see just how many times, and how much CPU the function used when we ran just this one query:

2021 08 09 22 06 52
look out below

If you pay extra special attention, you’ll noticed that the execution_count here exactly matches the number of rows that pass through the Filter operator above.

The way to fix this is to  rewrite the function as an inline table valued function.

CREATE FUNCTION dbo.nonsargable_inline(@d datetime)
RETURNS TABLE
WITH SCHEMABINDING
AS 
RETURN

    SELECT 
        b = 
        CASE 
            WHEN DATEDIFF
                 (
                     DAY, 
                     @d, 
                     GETDATE()
                 ) > 30
            THEN 1
            ELSE 0
        END;

GO

Now we don’t have all those scalar problems.

Save The Wheels


We have to call our function a little bit differently, but that’s far less of a big deal.

SELECT
    c = COUNT_BIG(*)
FROM dbo.Users AS u
WHERE u.Reputation > 100
AND   (
          SELECT
              * 
          FROM dbo.nonsargable_inline(u.LastAccessDate)
       ) = 1;

And our query can go parallel, and take way less than 10 seconds.

SQL Server Query Plan
drones

While the predicate isn’t exactly ideal, in this case I’ll deal with it, and maybe even add some better indexes if I need to. At 183 milliseconds, I don’t need to do that right now.

Save The Feels


Scalar UDFs are still the downfall of many queries, and because this one has getdate in it, it can’t be inlined automatically in SQL Server 2019.

With simple enough functions, rewrites to inline table valued functions are easy enough. The more nonsense you put into the body of a function, the harder quick fixes get. I’ve spent days rewriting some that ran into the thousands of lines, and it’s not a pretty process.

Tomorrow we’re going to look at how indexed views can help you solve SARGability issues.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

SARGability Week: Max Data Type Columns And Predicates Aren’t SARGable

Stucco


When you sit down to choose data types, max really should be last on your list. There are very few legitimate uses for them.

Start by asking yourself if someone may ever throw in a string over 4000 or 8000 characters, you’re storing XML or JSON, or some other foolish endeavor.

If the answer is “no”, or “just to be safe“, stop right there. Put that max datatype down. You’re really gonna hate when they end up doing to your memory grants.

But you’ll also hate what they do to queries that try to filter on them.

Columns


No matter how much you scream, holler, and curse, when you try to filter data in a column with a max type, that predicate can’t be pushed to when you touch the index.

Leaving aside that max data types can’t be in the key of an index, because that would be insane, even if you stick them in the includes you’re looking at a lot of potential bulk, and not the good kind that makes you regular.

Let’s look at these two queries. The first one hits the Body column, and the second one hits the Title column.

SELECT
    c = COUNT_BIG(*)
FROM dbo.Posts AS p
WHERE p.Body = N'A';

SELECT
    c = COUNT_BIG(*)
FROM dbo.Posts AS p
WHERE p.Title = N'A';
SQL Server Query Plan
addams

Both of these queries suck because we have to scan the entire Posts table, but at least the second one would be indexable if we cared enough to add one.

Arguments


Starting off clear: This will happen regardless of if your search argument is a variable or a parameter, regardless of recompile hints.

DECLARE @A nvarchar(MAX) = N'A';
SELECT
    c = COUNT_BIG(*)
FROM dbo.Posts AS p
WHERE p.Title = @A;
GO 

DECLARE @A nvarchar(MAX) = N'A';
SELECT
    c = COUNT_BIG(*)
FROM dbo.Posts AS p
WHERE p.Title = @A
OPTION(RECOMPILE);
GO
SQL Server Query Plan
one time

I see this quite often in ORMs where people don’t explicitly define datatypes, and stored procedures where people are being daft.

Surgeons


This is the sort of stuff you have to deal with when you use max data types.

They really are a pain, and the larger your tables are, the harder it can be to make changes later. Add in any sort of data synchronization and it all gets much worse.

In tomorrow’s post, we’ll look at how user defined functions can make things horrible.

As usual.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

SARGability Week: Using Dynamic SQL To Fix Non-SARGable Queries

Specific


The non-SARGABLE pattern that dynamic SQL helps you deal with is the the catch all query pattern, which can look like:

  • col = @parameter or @parameter is null
  • col = isnull(@parameter, col)
  • col = coalesce(@parameter, col)

Or any similar variation of null/not null checking of a parameter (or variable) in the where/join clause at runtime.

Dynamic SQL allows you to build up the specific where clause that you need for the non-NULL set of search filters.

But First


Let’s look at one of my favorite demos, because it very simply shows the goofy kind of things that can go wrong when you don’t practice basic query hygiene.

I’m going to create these two indexes:

CREATE INDEX osc ON dbo.Posts
    (OwnerUserId, Score, CreationDate);

CREATE INDEX po ON dbo.Posts
    (ParentId, OwnerUserId);

They are fundamentally and obviously different indexes.

The query has a where clause on OwnerUserId and CreationDate, and an order by on Score.

The select list is, of course, everything.

DECLARE 
    @OwnerUserId int = 22656,
    @CreationDate datetime = '20190101',
    @SQLString nvarchar(MAX) = N'
SELECT 
    p.*
FROM dbo.Posts AS p
WHERE (p.OwnerUserId   = @OwnerUserId  OR @OwnerUserId IS NULL)
AND   (p.CreationDate >= @CreationDate OR @CreationDate IS NULL)
ORDER BY p.Score DESC;
';

EXEC sys.sp_executesql 
    @SQLString,
  N'@OwnerUserId INT,
    @CreationDate DATETIME',
    @OwnerUserId,
    @CreationDate;

But when we execute it, it uses the index on ParentId and OwnerUserId.

This is completely bizarre given the requirements of the query.

SQL Server Query Plan
picturesque

Now Second


Yes yes, I know, Captain Recompile. A hint will fix this problem. But then you might have another problem. Or a whole bunch of other problems.

Here’s an example of nice, safe dynamic SQL that gets the correct index used and a much more efficient query overall.

DECLARE 
    @OwnerUserId int = 22656,
    @CreationDate datetime = '20190101',
    @SQLString nvarchar(MAX) = N'
SELECT 
    p.*
FROM dbo.Posts AS p
WHERE 1 = 1' + NCHAR(10)

IF @OwnerUserId IS NOT NULL
BEGIN
    SET @SQLString += N'AND p.OwnerUserId   = @OwnerUserId' + NCHAR(10)
END

IF @CreationDate IS NOT NULL
BEGIN
    SET @SQLString += N'AND p.CreationDate >= @CreationDate' + NCHAR(10)
END

SET @SQLString += N'ORDER BY p.Score DESC;'

PRINT @SQLString;

EXEC sys.sp_executesql 
    @SQLString,
  N'@OwnerUserId INT,
    @CreationDate DATETIME',
    @OwnerUserId,
    @CreationDate;

 

SQL Server Query Plan
new age

And On The Seventh Day


The title of this series is SARGability week, and at first I had five posts set to go on this. As I was writing, I realized there were a few other things that I wanted to cover.

Next week I’m going to talk about max data types, user defined functions, and implicit conversions, then wrap things up.

Unless I think of something else. After all, I’m writing this on the 9th. Time travel, baby!

Thanks for reading

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

SARGability Week: Using Index Column Order To Fix Non-SARGable Queries In SQL Server

Query Bum


Quite a bit, I find myself working with people who are able to change indexes, but unable to change queries.

Even making small, sane changes would nix their support, should they ask for it. I do sometimes have to laugh at the situation: if support were that great, they wouldn’t need me, and if we made the change, they probably wouldn’t need support.

Oh well, though. Keep me employed, ISVs.

When we are allowed to change indexes, sometimes we can fix problems enough to avoid needing to change the code.

Big Time


Let’s start by creating two different indexes. They have the same key columns, just in different order.

CREATE INDEX v ON dbo.Votes
    (UserId, CreationDate);

CREATE INDEX vv ON dbo.Votes
    (CreationDate, UserId);

The query that we care about it this one:

SELECT
    c = COUNT_BIG(*)
FROM dbo.Votes AS v
WHERE ISNULL(v.UserId, 0) > 0
AND   v.CreationDate >= '20180601';

I know, you’re smart, you’re savvy, you’d never write a query like this. But I see it constantly.

Daring


If we compare query performance using the two different nonclustered indexes, the one that leads with CreationDate is the clear winner.

SQL Server Query Plan
swished up

Bookends


The main advantage of a Seek here is we’re able to seek to a much smaller range of rows first, and then apply the non-SARGable predicate to UserId.

SQL Server Query Plan
no touching

Obviously, scanning 182,692,000 rows is a bit slower than seeking to 32,411,155 rows and applying the residual predicate.

Tomorrows


If you can rewrite queries like this, you absolutely should. That people still write queries like this is a sad testament to… Well, I’m not sure what.

In tomorrow’s post, we’ll look at how dynamic SQL can help ward off non-SARGable predicates.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

SARGability Week: Using Temp Tables To Fix Non-SARGable Query Performance Problems In SQL Server

International


This post is especially interesting because it highlights the oddball performance situations you can run into when you write the type of All-In-One™ queries that the optimizer is typically not very good at handling, and also rely on expressions calculated at runtime as predicates.

I mean, it’s especially interesting if you’re into this sort of thing. If you’re not into this sort of thing, you’ll probably find it as interesting as I find posts about financial responsibility or home cooking.

I’ve seen query patterns like this while working with clients, and they’ve always ended poorly.

Anyway, on with the post!

Skeletons


To make sure we have a good starting point, and you can’t tell me that “scanning the clustered index is bad”, let’s create an index:

CREATE INDEX p
ON dbo.Posts 
    (OwnerUserId, Score DESC)
INCLUDE 
    (PostTypeId)
WHERE PostTypeId IN (1, 2);

Now let’s take a look at this query, and what gets weird with it.

WITH top_questions AS
(
    SELECT
        p.OwnerUserId,
        QuestionScore = 
            p.Score,
        tq = 
            ROW_NUMBER() OVER
            (
                PARTITION BY
            	    p.OwnerUserId
            	ORDER BY p.Score DESC
            )
    FROM dbo.Posts AS p
    WHERE p.PostTypeId = 1
    AND   p.Score > 1
),
     top_answers AS
(
    SELECT
        p.OwnerUserId,
        AnswerScore = 
            p.Score,
        ta = 
            ROW_NUMBER() OVER
            (
                PARTITION BY
            	    p.OwnerUserId
            	ORDER BY p.Score DESC
            )
    FROM dbo.Posts AS p
    WHERE p.PostTypeId = 2
    AND   p.Score > 1
)
SELECT TOP (1000)
    tq.OwnerUserId,
    tq.QuestionScore,
    ta.AnswerScore
FROM top_questions AS tq
JOIN top_answers AS ta
    ON  tq.OwnerUserId = ta.OwnerUserId
    AND tq.tq = ta.ta
ORDER BY
    tq.QuestionScore DESC,
    ta.AnswerScore DESC;

The non-SARGable portion is, of course, generating and joining on the row_number function. Since it’s an expression that gets calculated at runtime, we have to do quite a bit of work to execute this query.

Community Board


The query plan for this is all over the place, and also bad. Parallel merge joins were a mistake.

SQL Server Query Plan
planetary

The portions of the query plan that are particularly interesting — again, if you’re into this sort of thing — is that there are four Repartition Streams operators, and all of them spill. Like I said above, this is the sort of thing you open yourself up to when you write queries like this.

In all, the query runs for about 50 seconds. This can be avoided by hinting a hash join, of course, for reasons explained here.

But good luck figuring out why this thing runs for 50 seconds looking at a cached, or estimated execution plan, which doesn’t show you spills or operator times.

Breakup


One way to avoid the situation is to materialize the results of each CTE in a #temp table, and join those together.

WITH top_questions AS
(
    SELECT
        p.OwnerUserId,
        QuestionScore = 
            p.Score,
        tq = 
            ROW_NUMBER() OVER
            (
                PARTITION BY
            	    p.OwnerUserId
            	ORDER BY p.Score DESC
            )
    FROM dbo.Posts AS p
    WHERE p.PostTypeId = 1
    AND   p.Score > 1
)
SELECT
    *
INTO #top_questions
FROM top_questions;

WITH top_answers AS
(
    SELECT
        p.OwnerUserId,
        AnswerScore = 
            p.Score,
        ta = 
            ROW_NUMBER() OVER
            (
                PARTITION BY
            	    p.OwnerUserId
            	ORDER BY p.Score DESC
            )
    FROM dbo.Posts AS p
    WHERE p.PostTypeId = 2
    AND   p.Score > 1
)
SELECT
    *
INTO #top_answers
FROM top_answers;

SELECT TOP (1000)
    tq.OwnerUserId,
    tq.QuestionScore,
    ta.AnswerScore
FROM #top_questions AS tq
JOIN #top_answers AS ta
    ON tq.OwnerUserId = ta.OwnerUserId
    AND tq.tq = ta.ta
ORDER BY
    tq.QuestionScore DESC,
    ta.AnswerScore DESC;

Breakdown


The end result takes about 10 seconds, and has no exchange spills.

SQL Server Query Plan
fully

Infinito


For completeness, hinting the query with a hash join results in just about the same execution time as the temp table rewrite at 10 seconds. There are also very strong benefits to using Batch Mode. The query as originally written, and with no hints, finishes in about two seconds with no exchange spills, and I absolutely love that.

In tomorrow’s post, we’ll look at how we can sometimes adjust index key column order to solve SARGability issues.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.