Indexing SQL Server Queries For Performance: Fixing A Sort

Orderly


Ordered data is good for all sorts kinds of things in databases. The first thing that may come to mind is searching for data, because it’s a whole lot easier to get what you need when you know where it is.

Think of a playlist. Sometimes you want to find a song or artist by name, and that’s the easiest way to find what you want.

Without things sorted the way you’re looking for them, it’s a lot like hitting shuffle until you get to the song you want. Who knows when you’ll find it, or how many clicks it will take to get there.

The longer your playlist is, well, you get the idea. And people get all excited about Skip Scans. Sheesh.

Anyway, let’s look at poor optimizer choices, and save the poor playlist choices for another day.

A Normal Query


This is a query that I know and love.

SELECT   
    p.*
FROM dbo.Posts AS p
JOIN dbo.Votes AS v
  ON p.Id = v.PostId
WHERE p.PostTypeId = 2
AND   p.CreationDate >= '20131225'
AND   v.VoteTypeId = 2
ORDER BY 
    p.Id;

I love it because it gets a terribly offensive query plan.

sql server query plan
ban me

Look at this monstrosity. A parallel merge join that requires a sort to enable its presence. Who would contrive such a thing?

A Sidebar


This is, of course, a matter of costing. For some reason the optimizer considered many other alternatives, and thought this one was the cheapest possible way to retrieve data.

For reference, the above query plan has an estimated cost of 2020.95 query bucks. Let’s add a couple hints to this thing.

SELECT   
    p.*
FROM dbo.Posts AS p
JOIN dbo.Votes AS v
  ON p.Id = v.PostId
WHERE p.PostTypeId = 2
AND   p.CreationDate >= '20131225'
AND   v.VoteTypeId = 2
ORDER BY 
    p.Id
OPTION
(
    HASH JOIN, 
    USE HINT('DISALLOW_BATCH_MODE')
);

Using this query, I’m telling SQL Server to use a hash join instead of a merge join. I’m also restricting batch mode to keep things a bit more fair, since the initial query doesn’t use it.

Here’s the execution plan:

sql server query plan
hard to explain

SQL Server’s cost-based optimizer looks at this plan, and thinks it will cost 13844 query bucks to execute, or nearly 6x the cost of the merge join plan.

Of course, it finishes about 5 seconds faster.

Like I end up having to tell people quite a bit: query cost has nothing to do with query speed. You can have high cost queries that are very fast, and low cost queries that are very slow.

What’s particularly interesting is that on the second run, memory grant feedback kicks in to reduce the memory grant to ~225MB, down from the initial granted memory of nearly 10GB.

The first query retains a 2.5GB memory grant across many executions, because sorting the entire Votes table requires a bit of memory for the effort.

But This Is About Indexes, Not Hints


With that out of the way, let’s think about an index that would help the Votes table not need sorting.

You might be saying to yourself:

SELECT   
    p.*
FROM dbo.Posts AS p
JOIN dbo.Votes AS v
  ON p.Id = v.PostId /*We have to sort by this column for the merge join, let's put it first in the index*/
WHERE p.PostTypeId = 2
AND   p.CreationDate >= '20131225'
AND   v.VoteTypeId = 2 /*We can put this second in the index so we don't need to do any lookups for it*/
ORDER BY 
    p.Id; /*It's the clustered primary key, so we can just let the nonclustered index inherit it*/

Which would result in this index:

CREATE INDEX
    v   
ON dbo.Votes
    (PostId, VoteTypeId)
WITH
    (SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);

And you’d be right this time, but you wouldn’t be right every time. With that index, this is the plan we get:

sql server query plan
job well done

The optimizer chooses apply nested loops, and seeks both to the PostIds and VoteTypeIds that we care about.

That Won’t Always Happen


Sometimes, you’ll need to reverse the columns, and use an index like this:

CREATE INDEX
    v2   
ON dbo.Votes
    (VoteTypeId, PostId)
WITH
    (SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);

This can be useful when the where clause predicate is really selective, and the join predicate is less so. We can still get a plan without a sort, and I’ll talk about why in a minute.

For now, let’s marvel at the god awful query plan SQL Server’s optimizer chooses for this index:

sql server query plan
daffy duck

I think if I ever got my hands on the SQL Server source code, I’d cost merge joins out of existence.

But anyway, note that there’s no sort operator needed here.

Before I explain, let’s look at what the query plan would look like if SQL Server’s optimizer didn’t drink the hooch and screw the pooch so badly.

sql server query plan
how nice of you to join us

It’s equally as efficient, and also requires no additional sorting.

Okay, time to go to index school.

Index 101


Let’s say we have an index that looks like this:

CREATE INDEX
    whatever_multi_pass
ON dbo.Users
(
    Reputation,
    UpVotes,
    DownVotes,
    CreationDate DESC
)
INCLUDE
(
    DisplayName
);

In row store indexes, the key columns are in stored in sorted order to make it easy to navigate the tree to efficiently locate rows, but they are not stored or sorted “individually”, like in column store indexes.

Let’s think about playlists again. Let’s say you have one sorted by artist, release year, album title, and track number. Who knows, maybe someone (like DMX) released two great albums in a single year.

You would have:

  • The artist name, which would have a bunch of duplicates for each year (if it’s DMX), duplicates for album title, and then unique track ids
  • The release year, which may have duplicates (if it’s DMX) for each album, and then unique track ids
  • The album title which would have duplicates for unique track id

But for each of those sets of duplicates, things would be stored in order.

So, going back to our index, conceptually the data would be stored looking like this, if we ran this query:

SELECT TOP (1000)
    u.Reputation,
    u.UpVotes,
    u.DownVotes,
    u.CreationDate
FROM dbo.Users AS u
WHERE u.Reputation IN (124, 125)
AND   u.UpVotes < 11
AND   u.DownVotes > 0
ORDER BY
    u.Reputation,
    u.UpVotes,
    u.DownVotes,
    u.CreationDate DESC;

I’ve cut out some rows to make the image a bit more manageable, but here you go:

sql server query results
storage!

For every row where reputation is 124, upvotes are sorted in ascending order, and then for any duplicates in upvotes, downvotes are stored in ascending order, and for any duplicate downvotes, creation dates are stored in descending order.

Then we hit 125, and each of those “reset”. Upvotes starts over again at 1, which means we have new duplicate rows to sort downvotes for, and then new duplicate rows in downvotes to sort creation dates in.

Going back to our query, the reason why we didn’t need to sort data even when PostId was the second column is because we used an equality predicate to find VoteTypeIds with a value of 2. Within that entire range, PostId were stored in ascending order.

Understanding concepts like this is really important when you’re designing indexes, because you probably have a lot of complicated queries, with a lot of complicated needs:

  • Multiple where clause predicates
  • Multiple join columns to different tables
  • Maybe with grouping and ordering
  • Maybe with a windowing function

Getting indexes right for a single query can be a monumental feat. Getting indexes right for an entire workload can seem quite impossible.

The good news, though, is that not every query can or should have perfect indexes. It’s okay for some queries to be slow; not every one is mission critical.

Making that separation is crucial to your mental health, and the indexing health of your databases.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.

A Little About Nested Loops, Parallelism, and the Perils of Recursive Common Table Expressions

A Little About Nested Loops, Parallelism, and the Perils of Recursive Common Table Expressions



How can I optimize a recursive CTE inside a IVTF?​

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.

When SQL Server Isn’t Smart About Aggregates Part 2

Keep It A Buck


Here are the missing indexes that SQL Server wants for our aggregation queries from yesterday:

CREATE INDEX
    p2   
ON dbo.Posts
    (OwnerUserId, Score)
WITH
    (MAXDOP = 8, SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);


CREATE INDEX
    c2
ON dbo.Comments 
    (UserId, Score)
WITH
    (MAXDOP = 8, SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);

I’ve taken a small bit of artistic license with them.

The crappy thing is… They really do not help and in some cases things get substantially worse.

Original Query


The original query plan is still awful. It is using both of our new indexes.

sql server query plan
oh okay

No early aggregation whatsoever. Though yesterday’s takes 23 seconds, and today’s takes 22 seconds, I’d hardly call ourselves indexing victors for the improvement.

Rewrite #1: Manually Aggregate Posts


This one eats it the hardest, again, using both of our new indexes.

sql server query plan
we gotta talk.

If one were to appreciate any aspect of this query plan, it’s that the optimizer didn’t choose a parallel merge join plan. Parallel merge joins were a mistake, and have driven me closer to alcohol induced comas than the Red Sox in the 90s.

The total runtime for this query shoots up to about 8 seconds. The biggest change, aside from a serial execution plan being chosen, is that only the Hash Match operator at the very end runs in Batch Mode. All other operators execute in Row Mode.

Rewrite #2: Manually Aggregate Comments


We go back to a parallel plan, but again, in Row Mode. This query now takes 2x as long as yesterday’s Batch Mode plan.

sql server query plan
try, try again

Again, both new indexes are in use here. This one is the most disappointing.

Rewrite #3: Manually Aggregate Both


The fun thing about all of these plans is that, aside from the things I’ve been talking about, they all have the same problem as yesterday’s plans: Unless we tell SQL Server to aggregate things, it’s not trying to do that before the joins happen.

sql server query plan
wrongo boyo

Again, the entire plan runs in Row Mode, using both new indexes. Though most of the operators are ineligible for Batch Mode, the hash operations are, but… Just don’t use it

It’s not the end of the world for this query. It runs within a few milliseconds of yesterday’s with the old indexes. It’s just disappointing generally.

Rewrite #4: Manually Aggregate Both, Force Join Order


I’m going through the motions a touch with this one, because unlike yesterday’s plan, this one uses the forced join order naturally. It ends up in a similar situation as the above query plan though.

sql server query plan
e-mo-shuns

Again, both indexes are in use, but just not helping.

It Seems Strange


Why would SQL Server’s query optimizer decide that, with opportune indexes, Batch Mode just wouldn’t be useful?

Regardless of key column order, the same number of rows are still in play in all of my examples, with or without aggregations. In many cases, the new indexes are also scanned to acquire all of the rows, but also even the seek operators need to acquire all the rows!

There’s no where clause to help things, and only a single one of the Row Mode queries uses a Bit Map operator that can be used to filter some rows out of the joined table early.

Quite a strange brew of things to consider here. But the bottom line is, additional indexes are not always helpful for aggregation queries like this, and may result in really weird plan choices.

If you’re dealing with queries that aggregate a lot of data, and SQL Server isn’t choosing early partial or full aggregations before joining tables together, you’re probably going to have to roll up your sleeves and do it yourself.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.

T-SQL Tuesday: The Last Ticket I Closed

Foreign Keys Suck


For this month’s T-SQL Tuesday, Brent Ozar (blog) wants to see your receipts.

I didn’t have a chance to write this post yesterday, because I was in the midst of closing the ticket.

Here’s a short synopsis from the client emergency:

  • Third party vendor
  • Merge statement compiled in a DLL file
  • Called frequently
  • Uses a Table-Valued Parameter
  • Merges into one small table…
  • Which has an indexed foreign key to a huge table
  • Constantly deadlocking with other calls to the same merge query

To evaluate the foreign key, SQL Server was choosing this execution plan:

sql server query plan
suck

I hate the optimizer, some days. I really do.

Whatever year it’s stuck in is an unfortunate one.

 

Eager Index Spools Suck


Why in the name of Babe the blue-balled Ox would SQL Server’s advanced, intelligent, hyper-awesome, $7k a core query optimizer choose to build an index spool here, on 7 million rows?

Here are some things we tried to get rid of it:

  • Add a clustered index to the target table
  • Add a single-column index on the already-indexed clustered index key
  • Add primary keys to the Table Types

If I had access to the merge statement, I would have torn it to shreds separate insert, update, and delete statements.

But would that have helped with SQL Server’s dumb execution plan choice in evaluating the foreign key? Would a FORCESEEK hint even be followed into this portion of the execution plan?

RCSI wouldn’t help here, because foreign key evaluation is done under Read Committed Locking isolation.

I don’t know. We can’t just recompile DLLs. All I know is that building the eager index spool is slowing this query down just enough to cause it to deadlock.

So, I took a page out of the Ugly Pragmatism handbook. I disabled the foreign key, and set up a job to look for rogue rows periodically.

Under non-Merge circumstances, I may have written a trigger to replace the foreign key. In that very moment, I had some doubts about writing a trigger quickly that would have worked correctly with:

  • All of Merge’s weirdness
  • Under concurrency

In reality, the foreign key wasn’t contributing much. The application only ever allows users to put rows in the parent table, and additional information only gets added to the child table by a system process after the original “document” is processed.

So, goodbye foreign key, goodbye eager index spool, goodbye deadlocks.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.

When SQL Server Isn’t Smart About Aggregates Part 1

With Sympathy


Of all the cardinality estimation processes that SQL Server has to do, the two that I sympathize the most with are joins and aggregations.

It would be nice if the presence of a foreign key did absolutely anything at all whatsoever to improve join estimation, but that’s just another example of a partial birth feature in SQL Server.

While SQL Server 2022 does have Cardinality Estimation Feedback available, you need to:

  • Use compatibility level 160
  • Enable the database scoped configuration
  • Have Query Store enabled to persist the hint

The compatibility level thing can be a real deal-breaker for a lot of people, though I think a safer path forward is to use the legacy cardinality estimator in conjunction with higher compatibility levels, like so:

ALTER DATABASE SCOPED CONFIGURATION 
SET LEGACY_CARDINALITY_ESTIMATION = ON;

ALTER DATABASE StackOverflow2013 
SET COMPATIBILITY_LEVEL = 160;

At any rate, Cardinality Estimation feedback, at least in its initial implementation, does not work do anything for aggregations.

Team Spirit


One thing I’m always grateful for is the circle of smart folks I can share my demos, problems, ideas, and material with for sanity checking.

While I was talking about the HT waits, this demo query came up, where SQL Server just makes the dumbest possible choice, via Paul White (b|t):

SELECT
    PScore = MAX(p.Score),
    CScore = MAX(c.Score)
FROM dbo.Posts AS p
JOIN dbo.Comments AS c
  ON c.UserId = p.OwnerUserId;
GO

Here’s the query plan, which yes, you’re reading correctly, runs for ~23 seconds, fully joining both tables prior to doing the final aggregation.

sql server query plan
poor choice, that

I’m showing you a little extra here, because there are missing index requests that the optimizer asks for, but we’ll talk about those in tomorrow’s post.

The wait stats for this query, since it’s running in Batch Mode, are predictably HT-related.

sql server wait stats
ouch dude

It’s not that the optimizer isn’t capable of doing early aggregations — in many cases it will do so quite eagerly — it just… Doesn’t here.

Rewrite #1: Manually Aggregate Posts


Part of what I get paid to do is spot this stuff, and figure out how to make queries better.

If I saw this one, I’d probably start by trying something like this:

WITH
    p AS 
(
    SELECT
        UserId = p.OwnerUserId,
        Score = MAX(p.Score)
    FROM dbo.Posts AS p
    GROUP BY
        p.OwnerUserId
)
SELECT
    PScore = MAX(p.Score),
    CScore = MAX(c.Score)
FROM p
JOIN dbo.Comments AS c
  ON c.UserId = p.UserId;
GO

Which would be a pretty good improvement straight out of the gate.

sql server query plan
worth the money

From 23 seconds to 1.2 seconds right off the bat. Pretty good. Note the join placement though, with Posts on the outer, and Comments on the inner side of the join.

Rewrite #2: Manually Aggregate Comments


What if we thought this was maybe a bad situation, and we wanted to try get a different query plan? What if we really didn’t enjoy the 986% overestimate?

WITH
    c AS
(
    SELECT
        UserId = c.UserId,
        Score = MAX(c.Score)
    FROM dbo.Comments AS c
    GROUP BY
        c.UserId
)
SELECT
    PScore = MAX(p.Score),
    CScore = MAX(c.Score)
FROM dbo.Posts AS p
JOIN c
  ON c.UserId = p.OwnerUserId;
GO

We could write the query like above, and see if SQL Server does any better. Right? Right.

sql server query plan
christmas bonus

Now we’re down under a second. Comments is on the outer, and Posts is on the inner side of the join, and estimates across the board are just about spot-on.

I know what you’re thinking: We should aggregate BOTH first. When we leave it up to SQL Server’s optimizer, it’s still not getting the early aggregation message.

Rewrite #3: Manually Aggregate Both


You might be thinking “I bet if we aggregate both, it’ll take 500 milliseconds”. You’d be thinking wrong. Sorry.

WITH
    p AS 
(
    SELECT
        UserId = p.OwnerUserId,
        Score = MAX(p.Score)
    FROM dbo.Posts AS p
    GROUP BY
        p.OwnerUserId
),
    c AS
(
    SELECT
        UserId = c.UserId,
        Score = MAX(c.Score)
    FROM dbo.Comments AS c
    GROUP BY
        c.UserId
)
SELECT
    PScore = MAX(p.Score),
    CScore = MAX(c.Score)
FROM p
JOIN c
  ON c.UserId = p.UserId;

This is great. Aren’t common table expressions just great? Yay. yay. y?

sql server query plan
oh you big dummy

We made things nearly 500 milliseconds worse. I want to take this opportunity to share something annoying with you: it doesn’t matter which order we write our common table expressions in, or which order we join them in when we select data out of it, SQL Server’s optimizer chooses the exact same plan as this one. There’s no point in showing you the other query plans, because they look identical to this one: Posts is on the outer, and Comments is on the inner, side of the join.

Cardinality estimates improve somewhat but not in a meaningful way. We just know we’re gonna have to aggregate both sets before doing the join, so we get table cardinality right, but cardinality estimation for the aggregates are both pretty bad, and the join is screwed.

Rewrite #4: Manually Aggregate Both, Force Join Order


This must be the magic that finally improves things substantially. Right? Wrong.

WITH
    p AS 
(
    SELECT
        UserId = p.OwnerUserId,
        Score = MAX(p.Score)
    FROM dbo.Posts AS p
    GROUP BY
        p.OwnerUserId
),
    c AS
(
    SELECT
        UserId = c.UserId,
        Score = MAX(c.Score)
    FROM dbo.Comments AS c
    GROUP BY
        c.UserId
)
SELECT
    PScore = MAX(p.Score),
    CScore = MAX(c.Score)
FROM c
JOIN p
  ON c.UserId = p.UserId
OPTION(FORCE ORDER);
GO

Doing this will suplex the optimizer into putting Comments on the outer, and Posts on the inner side of the join.

sql server query plan
anybody else

This, unfortunately, only gets us in about as good a situation as when we only did a manual aggregate of the Comments table. Given the current set of indexes, the only thing I could find that gave meaningful improvement was to run at DOP 16 rather than DOP 8.

The current set of indexes look like this:

CREATE INDEX 
    c
ON dbo.Comments 
    (Score)
INCLUDE
    (UserId)
WITH
    (MAXDOP = 8, SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);

CREATE INDEX
    p   
ON dbo.Posts
    (Score, OwnerUserId)
INCLUDE
    (PostTypeId)
WITH
    (MAXDOP = 8, SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);

And remember I said at the beginning: SQL Server’s optimizer is insistent that better indexes would make things better.

In tomorrow’s post, we’ll look at how that goes.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.

Join me at DataTune in Nashville, March 8-9 2024

Spring Training


This March, I’ll be presenting my full day training session The Foundations Of SQL Server Performance Tuning.

All attendees will get free access for life to my SQL Server performance tuning training. That’s about 25 hours of great content.

Get your tickets here for this event, taking place Friday, March 8th-9th 2024 at Belmont University – Massey Center 1900 Belmont Blvd, Nashville, TN 37212

Here’s what I’ll be presenting:

The Foundations Of SQL Server Performance Tuning

Session Abstract:

Whether you want to be the next great query tuning wizard, or you just need to learn how to start solving tough business problems at work, you need a solid understanding of not only what makes things fast, but also what makes them slow.

I work with consulting clients worldwide fixing complex SQL Server performance problems. I want to teach you how to do the same thing using the same troubleshooting tools and techniques I do.

I’m going to crack open my bag of tricks and show you exactly how I find which queries to tune, indexes to add, and changes to make. In this day long session, you’re going to learn about hardware, query rewrites that work, effective index design patterns, and more.

Before you get to the cutting edge, you need to have a good foundation. I’m going to teach you how to find and fix performance problems with confidence.

Event Details:

Get your tickets here for this event!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.

Indexing SQL Server Queries For Performance: Fixing Aggregates With An Indexed View

Lead Up


I have a couple posts scheduled next week about aggregates, so now is a good time to continue my series on how you, my dear, sweet, developers, can use indexes in various ways to fix performance issues.

In this one, we’re going to get into a weird cross-section, because I run into a lot of developers who think that modularizing abstracting queries away in various things is somehow better for performance.

It’ll happen with functions a lot where I’ll hear that, but by far, it is more common to hear this about views.

Everything from results getting cached or materialized to metadata lookups being faster has been thrown at me in their defense.

By far the strangest was someone telling me that SQL Server creates views automatically for frequently used queries.

Hearts were blessed that day.

A Query!


So, here’s where indexed views are tough: THERE ARE SO MANY RESTRICTIONS ON THEM IT’S ABSURD.

They are borderline unusable. Seriously, read through the list of “things you can’t do” I linked to up there, and note the utter scarcity of “possible workarounds”.

How a $200 billion dollar a year company has an indexed view feature that doesn’t support MIN, MAX, AVG, subqueries (including EXISTS and NOT EXISTS), windowing functions, UNION, UNION ALL, EXCEPT, INTERSECT, or HAVING in the year 2024 is beyond me, and things like this are why many newer databases will continue to eat SQL Server’s lunch, and many existing databases (Oracle, Postgres) which have much richer features available for indexed (materialized) views point and laugh at us.

Anyway, here’s the query.

SELECT
    u.Id,
    u.DisplayName,
    QuestionUpScore =
        SUM(CONVERT(bigint, CASE WHEN p.PostTypeId = 1 AND v.VoteTypeId = 2 THEN  1 ELSE 0 END)),
    QuestionDownScore =
        SUM(CONVERT(bigint, CASE WHEN p.PostTypeId = 1 AND v.VoteTypeId = 3 THEN -1 ELSE 0 END)),
    AnswerAcceptedScore =
        SUM(CONVERT(bigint, CASE WHEN p.PostTypeId = 2 AND v.VoteTypeId = 1 THEN  1 ELSE 0 END)),
    AnswerUpScore =
        SUM(CONVERT(bigint, CASE WHEN p.PostTypeId = 2 AND v.VoteTypeId = 2 THEN  1 ELSE 0 END)),
    AnswerDownScore = 
        SUM(CONVERT(bigint, CASE WHEN p.PostTypeId = 2 AND v.VoteTypeId = 3 THEN -1 ELSE 0 END)),
    CommentScore = 
        SUM(CONVERT(bigint, c.Score))
FROM dbo.Users AS u
JOIN dbo.Posts AS p
  ON p.OwnerUserId = u.Id
JOIN dbo.Comments AS c
  ON c.UserId = u.Id
JOIN dbo.Votes AS v
  ON v.PostId = p.Id
WHERE u.Reputation > 500000
AND   p.Score > 0
AND   c.Score > 0
AND   p.PostTypeId IN (1, 2)
AND   v.VoteTypeId IN (1, 2, 3)
GROUP BY
    u.Id,
    u.DisplayName;

It just barely qualifies for indexed view-ness, but only if I add a COUNT_BIG(*) to the select list.

Now, here’s the thing, currently. We gotta look at a query plan first. Because it’s going to drive a lot of what I tell you later in the post.

A Query Plan!


Right now when I execute this query, I get a nice, happy, Batch Mode on Row Store query plan.

It’s not the fastest, which is why I still want to create an indexed view. But stick with me. It runs for about a minute:

sql server query plan
ONE MORE MINUTE

What’s particularly interesting is a Batch Mode Compute Scalar operator running for 22 seconds on its own. Fascinating.

Okay, but the important thing here: It takes one minute to run this query.

In Batch Mode.

An Indexed View Definition!


Let’s take a stab at creating an indexed view out of this thing, named after the Republica song that was playing at the time.

CREATE OR ALTER VIEW
    dbo.ReadyToGo
WITH SCHEMABINDING
AS 
SELECT
    u.Id,
    u.DisplayName,
    QuestionUpScore =
        SUM(CONVERT(bigint, CASE WHEN p.PostTypeId = 1 AND v.VoteTypeId = 2 THEN  1 ELSE 0 END)),
    QuestionDownScore =
        SUM(CONVERT(bigint, CASE WHEN p.PostTypeId = 1 AND v.VoteTypeId = 3 THEN -1 ELSE 0 END)),
    AnswerAcceptedScore =
        SUM(CONVERT(bigint, CASE WHEN p.PostTypeId = 2 AND v.VoteTypeId = 1 THEN  1 ELSE 0 END)),
    AnswerUpScore =
        SUM(CONVERT(bigint, CASE WHEN p.PostTypeId = 2 AND v.VoteTypeId = 2 THEN  1 ELSE 0 END)),
    AnswerDownScore = 
        SUM(CONVERT(bigint, CASE WHEN p.PostTypeId = 2 AND v.VoteTypeId = 3 THEN -1 ELSE 0 END)),
    CommentScore = 
        SUM(CONVERT(bigint, c.Score)),
    WellOkayThen =
        COUNT_BIG(*)
FROM dbo.Users AS u
JOIN dbo.Posts AS p
  ON p.OwnerUserId = u.Id
JOIN dbo.Comments AS c
  ON c.UserId = u.Id
JOIN dbo.Votes AS v
  ON v.PostId = p.Id
WHERE u.Reputation > 500000
AND   p.Score > 0
AND   c.Score > 0
AND   p.PostTypeId IN (1, 2)
AND   v.VoteTypeId IN (1, 2, 3)
GROUP BY
    u.Id,
    u.DisplayName;
GO

CREATE UNIQUE CLUSTERED INDEX
    RTG
ON dbo.ReadyToGo
    (Id)
WITH
    (SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);
GO

This will create successfully but…

An Indexed View Creation Query Plan!


Indexed views can’t be created using Batch Mode. They must be created in Row Mode.

This one takes two hours, almost.

sql server query plan
EXCUSE ME WHAT

Yes, please do spend an hour and twenty minutes in a a Nested Loops Join. That’s just what I wanted.

Didn’t have anything else to do with my day.

Of course, with that in place, the query finishes instantly, so that’s nice.

Perils!


So yeah, probably not great that creating the indexed view takes that long. Imagine what that will do to any queries that modify the base data in these tables.

Hellish. And all that to produce a 30 row indexed view. Boy to the Howdy.

This is a bit of a cautionary tale about creating indexed views that span multiple tables. It is probably not the greatest idea, because maintaining them becomes difficult as data is inserted, updated, or deleted. I’m leaving the M(erge) word out of this, because screw that thing anyway.

If we, and by we I mean me, wanted to be smarter about this, we would have taken a better look at the query and taken stock of a couple things and considered some different options.

  • Maybe the Comments aggregation should be in its own indexed view
  • Maybe the Posts aggregation should be in its own indexed view (optionally joined to Votes)
  • Maybe the Comments and Posts aggregations being done in a single indexed view would have been good enough

Of course, doing any of those things differently would change our query a bit. Right now, we’re using PostTypeId to identify questions and answers, but it’s not in the select list otherwise. We’d need to add that, and group by it, too, and we still need to join to Votes to get the VoteTypeId, so we know if something was an upvote, downvote, or answer acceptance.

We could also just live with a query taking a minute to run. If you’re going to sally forth with indexed views, consider what you’re asking them to do, and what you’re asking SQL Server to maintain when you add them across more than one table.

They can be quite powerful tools, but they’re incredibly limited, and creating them is not always fast or straightforward.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.

What Happens To Queries With Recompile Hints In Query Store

Worst Behavior


For many SQL Server developers, using statement-level OPTION(RECOMPILE) hints is the path of least resistance for tuning parameter sensitive queries.

And I’m okay with that, for the most part. Figuring out what parameters a stored procedure compiled with, and was later executed with is a hard task for someone busy trying to bring new features to a product.

But let’s say one day you have performance issues regardless of all the recompiling in the world. No matter what set of parameters get passed in, your procedure is just sleepy-dog slow.

Things get so bad that you hire a young, handsome consultant with reasonable rates like myself to help you figure out why.

The plan cache will be pretty useless for troubleshooting the recompile-hinted queries, but we can still use Query Store.

Example


Here’s a simple test procedure to show what I mean, named after a random set of cursor options.

CREATE OR ALTER PROCEDURE
    dbo.LocalDynamic
(
    @OwnerUserId integer
)
AS 
BEGIN
    SET NOCOUNT, XACT_ABORT ON;
        
    SELECT
        c = COUNT_BIG(*)
    FROM dbo.Posts AS p
    WHERE p.OwnerUserId = @OwnerUserId;
    

    SELECT
        c = COUNT_BIG(*)
    FROM dbo.Posts AS p
    WHERE p.OwnerUserId = @OwnerUserId
    OPTION(RECOMPILE);
END;
GO

EXEC dbo.LocalDynamic
    @OwnerUserId = 22656;

EXEC sp_QuickieStore
    @procedure_name = 'LocalDynamic';

The end result in Query Store (as observed with the lovely and talented sp_QuickieStore) will show two execution plans.

  1. The query without the recompile hint will show a compiled parameter value of 22656
  2. The query with the recompile hint will show the literal values used by parameters as predicates

Here’s what I mean. This query has a Parameter List attribute.

sql server query plan
cool, cool

This query won’t have the Parameter List attribute, but that’s okay. We can see what got used as a literal value.

sql server query plan
don’t despair

Plumbing


This all comes down to the way statement-level recompile hints work. They tell the optimizer to compile a plan based on the literal values that get passed in, that doesn’t have to consider safety issues for other parameter values.

Consider the case of a filtered index to capture only “active” or “not deleted” rows in a table.

Using a parameter or variable to search for those won’t use your filtered index (without a recompile hint), because a plan would have to be cached that safe for searching for 1, 0, or NULL.

If you’re troubleshooting performance problems using Query Store, and you’re dealing with queries with statement-level recompile hints, you just need to look somewhere else for the parameter values.

What this can make tough, though, if you want to re-execute the stored procedure, is if you have multiple queries that use incomplete sets of required parameters. You’ll have to track down other query plans.

But quite often, if there’s one problem query in your procedure, the parameter values it requires will be enough to go on.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.

A Follow Up On HT Waits, Row Mode, Batch Mode, and SQL Server Error 666

A Follow Up On HT Waits, Row Mode, Batch Mode, and SQL Server Error 666


Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.

Performance Tuning Batch Mode HTDELETE and HTBUILD Waits In SQL Server

Performance Tuning Batch Mode HTDELETE and HTBUILD Waits In SQL Server



Thanks for watching!

Here’s the demo query, in case you’re interested in following along.

SELECT
    u.Id,
    u.DisplayName,
    u.Reputation,
    TopQuestionScore = 
        MAX(CASE WHEN p.PostTypeId = 1 THEN p.Score ELSE 0 END),
    TopAnswerScore = 
        MAX(CASE WHEN p.PostTypeId = 2 THEN p.Score ELSE 0 END),
    TopCommentScore = 
        MAX(c.Score)
FROM dbo.Users AS u
JOIN dbo.Posts AS p
  ON u.Id = p.OwnerUserId
JOIN dbo.Comments AS c
  ON u.Id = c.UserId
WHERE EXISTS
(
    SELECT
        1/0
    FROM dbo.Votes AS v
    WHERE v.PostId = p.Id
    AND   v.VoteTypeId IN (1, 2, 3)
)
AND  u.Reputation > 10000
AND  p.Score > 0
AND  c.Score > 0
GROUP BY 
    u.Id,
    u.DisplayName,
    u.Reputation
ORDER BY
    TopAnswerScore DESC
OPTION
(
    RECOMPILE, 
    USE HINT('FORCE_LEGACY_CARDINALITY_ESTIMATION'), 
    USE HINT('QUERY_OPTIMIZER_COMPATIBILITY_LEVEL_140')
);

 

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.