If you’re the kind of person who needs quick and easy ways to troubleshoot SQL Server performance problems, and you haven’t tried my free scripts yet, you’re probably going to keep having SQL Server performance problems.
I don’t get a lot of visitor contributions to my code (and here I thought it was just because it’s perfect), but I had a couple cool recent additions to sp_QuickieStore, my free tool for searching and analyzing Query Store data.
First, Ben Thul did a great job of simplifying the process of searching for only for queries that run during configurable business hours. I had gone through a whole process of creating a lookup table with times and a bunch of other nonsense. Ben, being smart, converted that over to just using parameters with a time type, so it doesn’t matter if you use 12- or 24-hour time. Thank you, Ben.
Second, Bill Finch dropped a really interesting pull request on me that allows for searching for query text that includes square brackets. I had no idea that didn’t work, but apparently I don’t go searching for Entity Framework created query text all that often. Very cool stuff, and a thank you to Bill as well!
Third, since I keep running into databases where Query Store is in a weird state, I added an initial check to see if it’s read only, if the desired and current state disagree with each other, or if auto-cleanup is disabled. Of course, I haven’t run into that since. Lucky me.
Fourth, Cláudio Silva added a new parameter to search Query Store for only plans that have hints (2022+, probably whatever Azure nonsense). An idea so cool, I expanded on it to also allow searching for queries with feedback and variants (also 2022+, probably whatever Azure nonsense)
Numbers are now nicely formatted with commas, so it’s easy to identify the precise scale of misery you’re experiencing.
A Friend At Microsoft told me that wait durations should already be in milliseconds in the system health extended event, and that I didn’t need to divide those numbers by 1000 to convert them from microseconds. This change is somewhat experimental, because some awfully big numbers show up. If you happen to know better, or feel like testing to verify the change, give the latest version a run.
If you’re searching for warnings only, I added a parameter (@pending_task_threshold) to reduce the number of warnings lines from the cpu task details results. You’ll get a warning here even if there’s one pending task, which isn’t very useful. You usually want to find when LOTS of pending tasks were happening. The default is 10.
Finally, I added a contributing guide. It’s not very extensive (which prevents it from being exhausting); the main point I’m trying to get across is that forks and pull requests must be made from and to the dev branch only. Committing directly to main is verboten. Totes verbotes, as they say in Germany and surrounding German-speaking countries, I’ve been informed by Reliable Sources.
If you have questions, run into bugs, or think adding some code to any of my procedures, open up an issue. I don’t do support via email or blog comments.
Thanks for reading!
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.
A lot of the time, the answer to performance issues with ranking windowing functions is simply to get Batch Mode involved. Where that’s not possible, you may have to resort to adding indexes.
Sometimes, even with Batch Mode, there is additional work to be done, but it really does get a lot of the job done.
In this post I’m going to cover some of the complexities of indexing for ranking windowing functions when there are additional considerations for indexing, like join and where clause predicates.
I also want to show you the limitations of indexing for solving performance problems for ranking windowing functions in Row Mode. This will be especially painful for developers forced to use Standard Edition, where Batch Mode is hopelessly hobbled into oblivion.
At some point, the amount of data that you’re dealing with becomes a bad fit for ranking windowing functions, and other approaches make more sense.
Of course, there are plenty of things that other variety of windowing functions do, that simple query rewrites don’t cover.
Here are some examples:
playing favorites
I realize that aggregate and analytic functions have many more options available, but there are only four ranking functions, and here at Darling Data, we strive for symmetry and equality.
It would be difficult to mimic the results of some of those — particularly the analytic functions — without performance suffering quite a bit, complicated self-joins, etc.
But, again, Batch Mode.
Hey Dude
Let’s start with a scenario I run into far too often: tables with crappy supporting indexes.
These aren’t too-too crappy, because I only have so much patience (especially when I know a blog post is going to be on the long side).
The index on Posts gets me to the data I care about fast enough, and the index on Votes allows for easy Apply Nested Loops seeking to support the Cross Apply.
There are some unnecessary includes in the index on Votes, because the demo query itself changed a bit as I was tweaking things.
But you know, if there’s one thing I’ve learned about SQL Server, there are lots of unnecessary includes in nonclustered indexes because of queries changing over the years.
CREATE INDEX
p
ON dbo.Posts
(PostTypeId)
INCLUDE
(Score)
WITH
(SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);
CREATE INDEX
v
ON dbo.Votes
(PostId)
INCLUDE
(UserId, BountyAmount, VoteTypeId, CreationDate)
WITH
(SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);
Now, the query I’m using is quite intentionally a bit of a stress test. I’m using two of the larger tables in the database, Posts and Votes.
But it’s a good example, because part of what I want to show you is how larger row counts can really mess with things.
I’m also using my usual trick of filtering to where the generated row number is equal to zero outside the apply.
That forces the query to do all of the window function work, without having to wait for 50 billion rows to render out in SSMS.
SELECT
p.Id,
p.Score,
v.VoteTypeId,
v.LastVoteByType
FROM dbo.Posts AS p
CROSS APPLY
(
SELECT
v.*,
LastVoteByType =
ROW_NUMBER() OVER
(
PARTITION BY
v.VoteTypeId
ORDER BY
v.CreationDate DESC
)
FROM dbo.Votes AS v
WHERE v.PostId = p.Id
AND v.VoteTypeId IN (1, 2, 3)
AND v.CreationDate >= '20080101'
) AS v
WHERE p.PostTypeId = 2
AND v.LastVoteByType = 0;
Assume that the initial goal is that we care very much about the ~4.2GB memory grant that this query acquires to Sort data for the windowing function, and to create an index that solves for that.
Dark Therapy
The query plan isn’t too bad, but like we looked at in the post in this series about fixing sorts, there is a bit of a sore spot.
get in line
Now, it has been blogged about many times, so I’m not going to belabor the point too much: the columns that need sorting are the ones in the partition by and order by of the windowing function.
But the index needs to match the sort directions of those columns exactly. For example, if I were to create this index, where the sort direction of the CreationDate column is stored ascending, but the windowing function asks for descending, it won’t work out.
CREATE INDEX
v
ON dbo.Votes
(PostId, VoteTypeId, CreationDate)
INCLUDE
(UserId, BountyAmount)
WITH
(SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE, DROP_EXISTING = ON);
In fact, it’s gonna slow things down a bit. Score another one for crappy indexes, I suppose.
30 love
The reason why this one is so much slower is because of the Seek. I know, I know, how could a Seek be bad?! Well, it’s not one seek, it’s three seeks in one.
Time spent in each of the Row Mode operators in both of the plans you’ve seen so far is nearly identical, aside from the Seek into the Votes index. If we compare each tool tip…
one seek vs three seeks
The plan properties for the Seek are only interesting for the second query. It’s not very easy to see from the tool tips above, because Microsoft is notoriously bad at user experience in its products.
threefer
It is somewhat easier to see, quite verbosely, that for each PostId, rather than a single seek and residual predicate evaluation, three seeks are done.
But, anyway, the problem we’re aiming to solve persists — the Sort is still there — and we spend about 4.5 seconds in it.
Your Best Won’t Do
With a similar index, the best we can do is get back to the timing of the original query, minus the sort.
The index we created above was useless for that, because we were careless in our specification. We created it with CreationDate sorted in ascending order, and our query uses it in descending order.
CREATE INDEX
v
ON dbo.Votes
(PostId, VoteTypeId, CreationDate DESC)
INCLUDE
(UserId, BountyAmount)
WITH
(SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE, DROP_EXISTING = ON);
Now, we’ve gotten rid of the sort, so our query is no longer asking for 4.2GB of RAM, but the runtime is only roughly equivalent to the original query.
i see london, i see france
A bit amusing that we were better off with a query plan where the sort spilled to disk, but what can you do? Just marvel at your luck, sometimes.
Improving Runtime
The sort of sad thing is that the cross apply method is purely Row Mode mentality. A bit like when I poke fun at folks who spend a lot of energy on index fragmentation, page splits, and fill factor as having 32bit mentality, modern performance problems often require Batch Mode mentality.
Query tuning is often about trade-offs, and this is no exception. We can reduce runtime dramatically, but we’re going to need memory to do it. We can take this thing from a best of around 15 seconds, to 2-3 seconds, but that Sort is coming back.
Using the normal arsenal tricks, getting Batch Mode on the inner side of a cross apply doesn’t seem to occur easily. A rewrite to get Batch Mode for a cross apply query is not exactly straightforward.
SELECT
p.Id,
p.Score,
v.VoteTypeId,
v.LastVoteByType
FROM dbo.Posts AS p
CROSS APPLY
(
SELECT
v.*
FROM
(
SELECT
v.*,
LastVoteByType =
ROW_NUMBER() OVER
(
PARTITION BY
v.VoteTypeId
ORDER BY
v.CreationDate DESC
)
FROM dbo.Votes AS v
) AS v
WHERE v.PostId = p.Id
AND v.VoteTypeId IN (1, 2, 3)
AND v.CreationDate >= '20080101'
) AS v
WHERE p.PostTypeId = 2
AND v.LastVoteByType >= '99991231'
OPTION(RECOMPILE);
Let’s change our query to use the method that I normally advise against when working in Row Mode.
SELECT
p.Id,
p.Score,
v.VoteTypeId,
v.LastVoteByType
FROM dbo.Posts AS p
JOIN
(
SELECT
v.*,
LastVoteByType =
ROW_NUMBER() OVER
(
PARTITION BY
v.VoteTypeId
ORDER BY
v.CreationDate DESC
)
FROM dbo.Votes AS v
WHERE v.VoteTypeId IN (1, 2, 3)
AND v.CreationDate >= '20080101'
) AS v
ON v.PostId = p.Id
WHERE p.PostTypeId = 2
AND v.LastVoteByType = 0;
In Row Mode, this sucks because the entire query in the derived join needs to be executed, producing a full result set of qualifying rows in the Votes table with their associated row number. Watch the video I linked above for additional details on that.
However, if we have our brains in Batch Mode, this approach can be much more useful, but not with the current index we’re using that leads with PostId.
When we used cross apply, having PostId as the leading column allowed for the join condition to be correlated inside the apply. We can’t do that with the derived join, we can only reference it in the outer part of the query.
Tweaking Indexes
An index that looks like this, which allows for finding the rows we care about in the derived join easily makes far more sense.
CREATE INDEX
v2
ON dbo.Votes
(VoteTypeId, CreationDate DESC, PostId)
INCLUDE
(UserId, BountyAmount)
WITH
(SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);
With all that done, here’s our new query plan. Something to point out here is that this is the same query plan as the more complicated rewrite that I showed you in the last section, with the same memory grant. Some of these memory grant numbers are with memory grant feedback involved, largely shifting numbers downwards, which is what you would expect to see if you were doing this in real life.
daydreamer
It could be far less of a concern for concurrency to grant out ~2GB of memory for 2 seconds, than for 15-20 seconds.
Even in a situation where you’re hitting RESOURCE_SEMAPHORE waits, it’s far less harmful to hit them for 3 seconds on average than 15-20 seconds on average. It’s also hard to imagine that you’re on a server where you truly care about high-end performance if 2GB memory grants lead you to RESOURCE_SEMAPHORE waits. If you have 128GB of RAM, and max server memory set to 116-120GB, you would be able to run ~80 of these queries concurrently before having a chance of a problem hitting RESOURCE_SEMAPHORE waits, assuming that you don’t get Resource Governor involved.
Tweaking The Query
Like I said early on, there’s only so good you can get with queries that use windowing functions where there are no alternatives.
Sticking with our Batch Mode mindset, let’s use this rewrite. It’s not that you can’t cross apply this, it’s just that it doesn’t improve things the way we want. It takes about 5 seconds to run, and uses 1.3GB of RAM for a query memory grant.
SELECT
p.Id,
p.Score,
v.VoteTypeId,
v.LastVoteByType
FROM dbo.Posts AS p
JOIN
(
SELECT
v.PostId,
v.VoteTypeId,
LastVoteByType =
MAX(v.CreationDate)
FROM dbo.Votes AS v
WHERE v.VoteTypeId IN (1, 2, 3)
AND v.CreationDate >= '20080101'
GROUP BY
v.PostId,
v.VoteTypeId
) AS v
ON v.PostId = p.Id
LEFT JOIN dbo.columnstore_helper AS ch
ON 1 = 0 /*This is important*/
WHERE p.PostTypeId = 2
AND v.LastVoteByType >= '99991231';
Note that I don’t naturally get batch mode via Batch Mode On Row Store. I’m using a table with this definition to force SQL Server’s hand a bit, here:
CREATE TABLE
dbo.columnstore_helper
(
cs_id bigint NOT NULL,
INDEX cs_id CLUSTERED COLUMNSTORE
);
But the result is pretty well worth it. It’s around 1 second faster than our best effort, with a 1.6GB memory grant.
waterfall
There may be even weirder rewrites out there in the world that would be better in some way, but I haven’t come across them yet.
Coverage
We covered a number of topics in this post, involving indexing, query rewrites, and the limitations of Row Mode performance in many situations.
The issues you’ll see in queries like this are quite common in data science, or data analysis type workloads, including those run by common reporting tools like PowerBI. Everyone seems to want a row number.
I departed a bit from what I imagined the post would look like as I went along, as additional interesting details came up. I hope it was an enjoyable, and reasonably meandering exploration for you, dear reader.
There’s one more post planned for this series so far, and I should probably provide some companion material for why the multi-seek query plan is 2x slower than the seek + residual query plan.
Anyway, I’m tired.
Thanks for reading!
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.
Whether you want to be the next great query tuning wizard, or you just need to learn how to start solving tough business problems at work, you need a solid understanding of not only what makes things fast, but also what makes them slow.
I work with consulting clients worldwide fixing complex SQL Server performance problems. I want to teach you how to do the same thing using the same troubleshooting tools and techniques I do.
I’m going to crack open my bag of tricks and show you exactly how I find which queries to tune, indexes to add, and changes to make. In this day long session, you’re going to learn about hardware, query rewrites that work, effective index design patterns, and more.
Before you get to the cutting edge, you need to have a good foundation. I’m going to teach you how to find and fix performance problems with confidence.
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.
Ordered data is good for all sorts kinds of things in databases. The first thing that may come to mind is searching for data, because it’s a whole lot easier to get what you need when you know where it is.
Think of a playlist. Sometimes you want to find a song or artist by name, and that’s the easiest way to find what you want.
Without things sorted the way you’re looking for them, it’s a lot like hitting shuffle until you get to the song you want. Who knows when you’ll find it, or how many clicks it will take to get there.
The longer your playlist is, well, you get the idea. And people get all excited about Skip Scans. Sheesh.
Anyway, let’s look at poor optimizer choices, and save the poor playlist choices for another day.
A Normal Query
This is a query that I know and love.
SELECT
p.*
FROM dbo.Posts AS p
JOIN dbo.Votes AS v
ON p.Id = v.PostId
WHERE p.PostTypeId = 2
AND p.CreationDate >= '20131225'
AND v.VoteTypeId = 2
ORDER BY
p.Id;
I love it because it gets a terribly offensive query plan.
ban me
Look at this monstrosity. A parallel merge join that requires a sort to enable its presence. Who would contrive such a thing?
A Sidebar
This is, of course, a matter of costing. For some reason the optimizer considered many other alternatives, and thought this one was the cheapest possible way to retrieve data.
For reference, the above query plan has an estimated cost of 2020.95 query bucks. Let’s add a couple hints to this thing.
SELECT
p.*
FROM dbo.Posts AS p
JOIN dbo.Votes AS v
ON p.Id = v.PostId
WHERE p.PostTypeId = 2
AND p.CreationDate >= '20131225'
AND v.VoteTypeId = 2
ORDER BY
p.Id
OPTION
(
HASH JOIN,
USE HINT('DISALLOW_BATCH_MODE')
);
Using this query, I’m telling SQL Server to use a hash join instead of a merge join. I’m also restricting batch mode to keep things a bit more fair, since the initial query doesn’t use it.
Here’s the execution plan:
hard to explain
SQL Server’s cost-based optimizer looks at this plan, and thinks it will cost 13844 query bucks to execute, or nearly 6x the cost of the merge join plan.
Of course, it finishes about 5 seconds faster.
Like I end up having to tell people quite a bit: query cost has nothing to do with query speed. You can have high cost queries that are very fast, and low cost queries that are very slow.
What’s particularly interesting is that on the second run, memory grant feedback kicks in to reduce the memory grant to ~225MB, down from the initial granted memory of nearly 10GB.
The first query retains a 2.5GB memory grant across many executions, because sorting the entire Votes table requires a bit of memory for the effort.
But This Is About Indexes, Not Hints
With that out of the way, let’s think about an index that would help the Votes table not need sorting.
You might be saying to yourself:
SELECT
p.*
FROM dbo.Posts AS p
JOIN dbo.Votes AS v
ON p.Id = v.PostId /*We have to sort by this column for the merge join, let's put it first in the index*/
WHERE p.PostTypeId = 2
AND p.CreationDate >= '20131225'
AND v.VoteTypeId = 2 /*We can put this second in the index so we don't need to do any lookups for it*/
ORDER BY
p.Id; /*It's the clustered primary key, so we can just let the nonclustered index inherit it*/
Which would result in this index:
CREATE INDEX
v
ON dbo.Votes
(PostId, VoteTypeId)
WITH
(SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);
And you’d be right this time, but you wouldn’t be right every time. With that index, this is the plan we get:
job well done
The optimizer chooses apply nested loops, and seeks both to the PostIds and VoteTypeIds that we care about.
That Won’t Always Happen
Sometimes, you’ll need to reverse the columns, and use an index like this:
CREATE INDEX
v2
ON dbo.Votes
(VoteTypeId, PostId)
WITH
(SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);
This can be useful when the where clause predicate is really selective, and the join predicate is less so. We can still get a plan without a sort, and I’ll talk about why in a minute.
For now, let’s marvel at the god awful query plan SQL Server’s optimizer chooses for this index:
daffy duck
I think if I ever got my hands on the SQL Server source code, I’d cost merge joins out of existence.
But anyway, note that there’s no sort operator needed here.
Before I explain, let’s look at what the query plan would look like if SQL Server’s optimizer didn’t drink the hooch and screw the pooch so badly.
how nice of you to join us
It’s equally as efficient, and also requires no additional sorting.
Okay, time to go to index school.
Index 101
Let’s say we have an index that looks like this:
CREATE INDEX
whatever_multi_pass
ON dbo.Users
(
Reputation,
UpVotes,
DownVotes,
CreationDate DESC
)
INCLUDE
(
DisplayName
);
In row store indexes, the key columns are in stored in sorted order to make it easy to navigate the tree to efficiently locate rows, but they are not stored or sorted “individually”, like in column store indexes.
Let’s think about playlists again. Let’s say you have one sorted by artist, release year, album title, and track number. Who knows, maybe someone (like DMX) released two great albums in a single year.
You would have:
The artist name, which would have a bunch of duplicates for each year (if it’s DMX), duplicates for album title, and then unique track ids
The release year, which may have duplicates (if it’s DMX) for each album, and then unique track ids
The album title which would have duplicates for unique track id
But for each of those sets of duplicates, things would be stored in order.
So, going back to our index, conceptually the data would be stored looking like this, if we ran this query:
SELECT TOP (1000)
u.Reputation,
u.UpVotes,
u.DownVotes,
u.CreationDate
FROM dbo.Users AS u
WHERE u.Reputation IN (124, 125)
AND u.UpVotes < 11
AND u.DownVotes > 0
ORDER BY
u.Reputation,
u.UpVotes,
u.DownVotes,
u.CreationDate DESC;
I’ve cut out some rows to make the image a bit more manageable, but here you go:
storage!
For every row where reputation is 124, upvotes are sorted in ascending order, and then for any duplicates in upvotes, downvotes are stored in ascending order, and for any duplicate downvotes, creation dates are stored in descending order.
Then we hit 125, and each of those “reset”. Upvotes starts over again at 1, which means we have new duplicate rows to sort downvotes for, and then new duplicate rows in downvotes to sort creation dates in.
Going back to our query, the reason why we didn’t need to sort data even when PostId was the second column is because we used an equality predicate to find VoteTypeIds with a value of 2. Within that entire range, PostId were stored in ascending order.
Understanding concepts like this is really important when you’re designing indexes, because you probably have a lot of complicated queries, with a lot of complicated needs:
Multiple where clause predicates
Multiple join columns to different tables
Maybe with grouping and ordering
Maybe with a windowing function
Getting indexes right for a single query can be a monumental feat. Getting indexes right for an entire workload can seem quite impossible.
The good news, though, is that not every query can or should have perfect indexes. It’s okay for some queries to be slow; not every one is mission critical.
Making that separation is crucial to your mental health, and the indexing health of your databases.
Thanks for reading!
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.
In this video, I dive into some interesting aspects of SQL Server query execution plans, focusing on nested loops joins and the peculiarities of recursive common table expressions (CTEs). Despite my current predicament of needing a haircut, which I can’t get for two more weeks, I share insights that might save you from similar delays in understanding complex query behaviors. I explore how parallelism works with nested loops joins, particularly highlighting the importance of one-row guarantees and the concept of “parallel skew.” By demonstrating practical examples, I illustrate why even a small tweak like adding a `TOP` clause can significantly impact performance and plan execution.
Full Transcript
Erik Darling here with Darling Data, for as long as you’ll have me. I want to talk today a little bit, despite the fact that I’m desperately in need of a haircut, and I can’t get one for like two weeks, and I don’t want you to have to wait two weeks to learn about SQL Server stuff, I want to talk today a little bit about nested loops, specifically parallel nested loops. and the perils of recursive CTE. And, you know, you don’t need specifically a recursive CTE to see kind of performance stuff I’m going to talk about, but they do make a rather clever foil in that regard. And, well, I answered a question recently about them on a database, is an endpoint, typical recursive CTE plan. So usually when you use a recursive common table expression, the recursive part of the query, in this case that’s most of the query, most of the query plan rather, you are not eligible for parallelism in here. SQL Server just doesn’t go for it. It will cause a parallel zone in your query plan and also is like, you know, sort of an aside, I believe batch mode is also ineligible for the inner side of recursive CTE plan, which kind of makes sense because there’s really not a lot of operators in here that are eligible for batch mode anyway. There’s nested loops, joins, spools, and I suppose compute scalars are, I think concatenation is too, but like what’s the point of doing that, right? Not a lot of it.
Now, you can use cross-apply sometimes in some ways to change that, right? So this query here, we’re cross-applying, but SQL Server still chooses a single threaded execution plan. For this one down here, where we do things a little bit differently, rather than looking for just a single post ID. Now, remember, the ID table, or sorry, the ID column in the post table is a clustered primary key on there.
So looking for one value means one row is going to come out of it. This is a particularly interesting value because it actually has a lot of answers and comments associated with it. So if I tried to run this for real, it would run for a long time, and it would fail without the max recursion zero hint. So a lot of fun background there, right? Life-changing stuff. This version of the query, though, in this version of the query, we are looking for, and I don’t know why IntelliSense is crapping all over this thing and telling me that the post table doesn’t exist, and let’s get rid of that. I don’t need I don’t need your IntelliSense today anyway. These objects clearly exist. It’s not like the video where I was telling you to maybe pretend to select from an object that doesn’t exist so that you don’t cache query plans, but whatever. This query changes a little bit, and the reason we’re going to talk about why in a minute, but we’re also going to talk about a little bit about parallel nested loops, because a lot of stuff about them that a lot of people kind of don’t understand.
This whole query now is eligible for parallelism, and there are some rules and some reasons and some things to watch out for. So let’s hide those. I don’t want to ruin the big surprise. I’m going to scroll up here a little bit where I’ve written some things about parallel nested loops that you should know. And there we go. That’s a bit better on the Zoom. I would just have to move this so my big head isn’t covering up anything important. And let’s talk a little bit about parallel nested loops.
It’s only chosen for the inner side when the outer side has a one row guarantee. So like in the function that I was running, where even when I was searching for owner user ID 22656, which has about 27, 28,000 rows associated with it, what we’re passing into the inline table valued function that locates a post to look for is the ID of the table, right? Again, the clustered primary key on the table. So even though 27,000 rows will be taken and looped, or 28,000 rows will eventually be taken and looped, what SQL Server cares about is that what we’re finding each time we do that loop has a one row guarantee. And that join, rather that correlation on the clustered primary key of the post table is what gives it that. The cost reduction stuff doesn’t consider the inner side of the nested loops join when SQL Server is figuring out if it’s going to give you a parallel plan or not. It only cares about making the outer part of the nested loops cheaper. And coming back over here, what I mean by that is that only this, this is all SQL Server cares about. If SQL Server thought that it wouldn’t be any harder using a, or wouldn’t be any more costly using a single threaded execution plan, then it would have chosen a serial plan again, like we saw in the other ones. The parallel plan for this has an estimated subtree cost of 135 query bucks and change. And SQL Server thought this was a cheaper option than scanning the post table using a single thread. So none of the stuff on this side of the nested loops join is considered when costing a query for parallelism because, get ready for this because this is a wild one, what’s running in here in parallel is not exactly parallel. It’s running .copies of the same query across .threads and each thing is sort of like a serial plan inside running individually, which conceptually is kind of like what happens with like other parallel plan types, other parallel join types rather like hash joins and merge joins and stuff like that. But you should, parallel merge joins were a mistake and we’re not going to talk about those in SULLI this fine video. So all that out of the way, I’m going to talk a little bit about the setup for the query plans that we’re about to look at and the function that they use.
And then I’m going to show you why parallel nested loops joins can be just terrifically sensitive to parallel skew. And by parallel skew, I mean how many rows end up on each thread and a neat way that you can fix those issues if you are having those issues. So we have a couple indexes here to help our demo along. We have one on the post table and one on the comments table. The columns in use here make total sense for what the query, what the function is doing. The function here, here’s the anchor part of the recursive CTE where we look up a single post ID. And I have littered this function with force sequence because SQL Server, despite my fantastic indexes, was choosing to take those indexes and build nearly identical indexes from them using an eager index pool, which was making everything terrifically slow. So lesson one, if you have a good index on your table and SQL Server is making an eager index pool from that index, use a force sequence to make your life easy.
We have had a lot of instances lately helping clients out where SQL Server was choosing daft execution plans and force sequence were quite useful in addressing those issues. Now down here in the, you have the anchor and then the hierarchy building part of your common table expression. There’s a word for it that’s slipping my mind right now. Maybe I should have done some research before I started yacking away. So here’s where we go find things in the post table. So like what we found up top was an answer, right? Because that’s the clustered primary keys, finds that, finds an answer that we care about. And then in here is where we find questions or rather answers to the question. And then down here is where we find comments on the question or answer. So we find all the stuff that we can in here. And then we select stuff out of the recursive CTE. This is still inside the function. And then here are a couple of queries that demonstrate really well what I meant by parallel nested loops being very sensitive to skew. So I’m going to make sure that I hit R and not E so I don’t try to run this whole demo file over again. And we’re going to look at a couple portions of these query plans.
Now notice that the only difference between these two queries is here we just do a straight select from cross apply where p dot creation date is greater than or equal to 2013-03-06. And in here we have a select count from and then we have a top in here with a big top number so that we don’t have to worry about maybe not getting enough rows back, right? We definitely have fewer than 2.1 billion rows in the table. And then we cross apply to the hierarchy function outside of that, right? So this top query takes just about a full minute to run. If we look at the final operator in the query plan, that’s 58.443 milliseconds. And there’s not like a lot of weird mumbo jumbo in here where SQL Server messes up query operator times because it does that quite a bit too. But really what’s important here is when we if we right click on this, that little line, and we look at the actual number of rows for all executions, we are going to see quite a bit of skew here, right? Like all these rows, let me actually clear that out. Let’s just highlight the whole thing. Some rows like here and here and here end up with a lot more rows on them than other threads like, stay pink, u and u and u and u. So the way that SQL Server figures which rows are going to go on which thread is it uses some like modulus-ish math and this thing called the parallel page supplier starts dividing rows up depending on you know how many threads are involved and you know like how many rows and other stuff those things get split up and you hope that they end up getting split up evenly but that doesn’t always happen the way things that the values that come out of the table end up hashing out. If you have like values that are really like all even then like like then like whatever like modulus value for the odd numbers are going to be screwed, right?
Because they’re not going to find anything. You could end up with like million rows, zero rows, million rows, zero rows, million rows, zero rows for all the threads in your query which would be a bad situation like this. This is this is nearly as bad a situation uh just not with one with million zero it’s just pretty close with 1.4 million, 1.4 million, 1.6 million and a bunch of other that have like 10, 12, 10 to 12,000 rows on each thread. So that’s a bad time and this query ends up running for a full, well just about a full minute. We can call it a full minute. I feel feel like it’s a full minute. It felt like a full minute of my life anyway and then we have this query down here which finishes twice as fast. The final timing on this is just about 30 seconds, close enough to 30 seconds for me anyway and if it’s close enough for me then it’s close enough. And the reason why is because and I wish this thing would stop jumping around and reframing it’s pretty annoying. When we have a top in a query, that top under most circumstances unless it’s on the inner side of a nested loops join, uh that top will force a serial zone in your query plan. It does not force the whole query to be serial which is the way the crappy Microsoft documentation about parallelism makes it sound.
Uh it just for which I’ve had like three or four people be confused about that in my and like in recent memory. So uh it’s just badly documented. So it forces a serial zone in the plan up here. But because the overall plan is parallel right so like this is all parallel and this is all parallel and all this is obviously parallel. We know that because we have our fun little racing stripes here.
SQL Server needs to split those rows back out after the parallel zone. All right so if we, oh boy this reef, this SMS is really really fighting me today. So what SQL Server does to do that is it has to distribute streams. So we have a parallel scan, we have a gather streams, and then we have a distribute streams because this top has no racing stripes. Slow top right? Single threaded top. I’m not saying it’s slow. It’s 750 milliseconds. Who cares? Uh but what this distribute streams operator forces us to do is redistribute those rows. And what a distribute streams operator gives us the opportunity to do is distribute those rows evenly. So boy you’re a real jerk. So if we look at this and we go to the properties and we look at the actual number of rows, the same thing happened here right? These numbers are all all over the place. I mean they’re not exactly the same but they’re definitely all over the place. But if we look at the row distribution after that distribute streams going into the nested loops join, these are all very very even right? Right we have 2, 4, 3, 2, 4, 3, and then 2, 4, 2. So we have a much better balance of rows on our parallel threads because of that. So because I’ve talked long enough and because I have a dinner reservation soon and I want to go eat, I’m hungry. I haven’t eaten all day for various reasons. I’ve been very busy. I know I don’t look like the type of person who skips meals but dinner is the most important drink of the day.
I thought breakfast was the most important meal of the day. I don’t know. One of those things. I’m sure maybe I’ll write an article about what the most important drink of the day is for BeerGut magazine. See what happens there. So anyway, uh, recursive CTE, kind of a pain in the butt. They can be very slow because oftentimes queries aren’t written away where they can engage a parallel plan. And even when they do engage a parallel plan, you have to be very, very careful about how that parallel plan is written because you could end up in a situation with incredibly skewed rows across parallel threads, making your query very slow. In this case, it’s a 30 second verse full minute query, and I will take 30 seconds over 60 seconds when query tuning any day. So that being said, thank you for watching. I hope you learned something. Hope you enjoyed yourselves. If you like SQL Server performance tuning content like this for free from young, handsome men who need haircuts, feel free to subscribe to my channel. If you like this video, well, give it the old thumbs up because that’s pretty much the only thing that brings me joy these days, especially before I’ve had the most important drink of the day. So, uh, I’m gonna get going now.
Thank you for watching, and, um, I don’t know. I hope, I hope you also enjoy your most important drink of the day too, if you’re the type of person who partakes in important drinks. So, thank you.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.
CREATE INDEX
p2
ON dbo.Posts
(OwnerUserId, Score)
WITH
(MAXDOP = 8, SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);
CREATE INDEX
c2
ON dbo.Comments
(UserId, Score)
WITH
(MAXDOP = 8, SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);
I’ve taken a small bit of artistic license with them.
The crappy thing is… They really do not help and in some cases things get substantially worse.
Original Query
The original query plan is still awful. It is using both of our new indexes.
oh okay
No early aggregation whatsoever. Though yesterday’s takes 23 seconds, and today’s takes 22 seconds, I’d hardly call ourselves indexing victors for the improvement.
Rewrite #1: Manually Aggregate Posts
This one eats it the hardest, again, using both of our new indexes.
we gotta talk.
If one were to appreciate any aspect of this query plan, it’s that the optimizer didn’t choose a parallel merge join plan. Parallel merge joins were a mistake, and have driven me closer to alcohol induced comas than the Red Sox in the 90s.
The total runtime for this query shoots up to about 8 seconds. The biggest change, aside from a serial execution plan being chosen, is that only the Hash Match operator at the very end runs in Batch Mode. All other operators execute in Row Mode.
Rewrite #2: Manually Aggregate Comments
We go back to a parallel plan, but again, in Row Mode. This query now takes 2x as long as yesterday’s Batch Mode plan.
try, try again
Again, both new indexes are in use here. This one is the most disappointing.
Rewrite #3: Manually Aggregate Both
The fun thing about all of these plans is that, aside from the things I’ve been talking about, they all have the same problem as yesterday’s plans: Unless we tell SQL Server to aggregate things, it’s not trying to do that before the joins happen.
wrongo boyo
Again, the entire plan runs in Row Mode, using both new indexes. Though most of the operators are ineligible for Batch Mode, the hash operations are, but… Just don’t use it
It’s not the end of the world for this query. It runs within a few milliseconds of yesterday’s with the old indexes. It’s just disappointing generally.
Rewrite #4: Manually Aggregate Both, Force Join Order
I’m going through the motions a touch with this one, because unlike yesterday’s plan, this one uses the forced join order naturally. It ends up in a similar situation as the above query plan though.
e-mo-shuns
Again, both indexes are in use, but just not helping.
It Seems Strange
Why would SQL Server’s query optimizer decide that, with opportune indexes, Batch Mode just wouldn’t be useful?
Regardless of key column order, the same number of rows are still in play in all of my examples, with or without aggregations. In many cases, the new indexes are also scanned to acquire all of the rows, but also even the seek operators need to acquire all the rows!
There’s no where clause to help things, and only a single one of the Row Mode queries uses a Bit Map operator that can be used to filter some rows out of the joined table early.
Quite a strange brew of things to consider here. But the bottom line is, additional indexes are not always helpful for aggregation queries like this, and may result in really weird plan choices.
If you’re dealing with queries that aggregate a lot of data, and SQL Server isn’t choosing early partial or full aggregations before joining tables together, you’re probably going to have to roll up your sleeves and do it yourself.
Thanks for reading!
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.
I didn’t have a chance to write this post yesterday, because I was in the midst of closing the ticket.
Here’s a short synopsis from the client emergency:
Third party vendor
Merge statement compiled in a DLL file
Called frequently
Uses a Table-Valued Parameter
Merges into one small table…
Which has an indexed foreign key to a huge table
Constantly deadlocking with other calls to the same merge query
To evaluate the foreign key, SQL Server was choosing this execution plan:
suck
I hate the optimizer, some days. I really do.
Whatever year it’s stuck in is an unfortunate one.
Eager Index Spools Suck
Why in the name of Babe the blue-balled Ox would SQL Server’s advanced, intelligent, hyper-awesome, $7k a core query optimizer choose to build an index spool here, on 7 million rows?
Here are some things we tried to get rid of it:
Add a clustered index to the target table
Add a single-column index on the already-indexed clustered index key
Add primary keys to the Table Types
If I had access to the merge statement, I would have torn it to shreds separate insert, update, and delete statements.
But would that have helped with SQL Server’s dumb execution plan choice in evaluating the foreign key? Would a FORCESEEK hint even be followed into this portion of the execution plan?
RCSI wouldn’t help here, because foreign key evaluation is done under Read Committed Locking isolation.
I don’t know. We can’t just recompile DLLs. All I know is that building the eager index spool is slowing this query down just enough to cause it to deadlock.
So, I took a page out of the Ugly Pragmatism handbook. I disabled the foreign key, and set up a job to look for rogue rows periodically.
Under non-Merge circumstances, I may have written a trigger to replace the foreign key. In that very moment, I had some doubts about writing a trigger quickly that would have worked correctly with:
All of Merge’s weirdness
Under concurrency
In reality, the foreign key wasn’t contributing much. The application only ever allows users to put rows in the parent table, and additional information only gets added to the child table by a system process after the original “document” is processed.
So, goodbye foreign key, goodbye eager index spool, goodbye deadlocks.
Thanks for reading!
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.
Of all the cardinality estimation processes that SQL Server has to do, the two that I sympathize the most with are joins and aggregations.
It would be nice if the presence of a foreign key did absolutely anything at all whatsoever to improve join estimation, but that’s just another example of a partial birth feature in SQL Server.
The compatibility level thing can be a real deal-breaker for a lot of people, though I think a safer path forward is to use the legacy cardinality estimator in conjunction with higher compatibility levels, like so:
ALTER DATABASE SCOPED CONFIGURATION
SET LEGACY_CARDINALITY_ESTIMATION = ON;
ALTER DATABASE StackOverflow2013
SET COMPATIBILITY_LEVEL = 160;
At any rate, Cardinality Estimation feedback, at least in its initial implementation, does not work do anything for aggregations.
Team Spirit
One thing I’m always grateful for is the circle of smart folks I can share my demos, problems, ideas, and material with for sanity checking.
While I was talking about the HT waits, this demo query came up, where SQL Server just makes the dumbest possible choice, via Paul White (b|t):
SELECT
PScore = MAX(p.Score),
CScore = MAX(c.Score)
FROM dbo.Posts AS p
JOIN dbo.Comments AS c
ON c.UserId = p.OwnerUserId;
GO
Here’s the query plan, which yes, you’re reading correctly, runs for ~23 seconds, fully joining both tables prior to doing the final aggregation.
poor choice, that
I’m showing you a little extra here, because there are missing index requests that the optimizer asks for, but we’ll talk about those in tomorrow’s post.
The wait stats for this query, since it’s running in Batch Mode, are predictably HT-related.
ouch dude
It’s not that the optimizer isn’t capable of doing early aggregations — in many cases it will do so quite eagerly — it just… Doesn’t here.
Rewrite #1: Manually Aggregate Posts
Part of what I get paid to do is spot this stuff, and figure out how to make queries better.
If I saw this one, I’d probably start by trying something like this:
WITH
p AS
(
SELECT
UserId = p.OwnerUserId,
Score = MAX(p.Score)
FROM dbo.Posts AS p
GROUP BY
p.OwnerUserId
)
SELECT
PScore = MAX(p.Score),
CScore = MAX(c.Score)
FROM p
JOIN dbo.Comments AS c
ON c.UserId = p.UserId;
GO
Which would be a pretty good improvement straight out of the gate.
worth the money
From 23 seconds to 1.2 seconds right off the bat. Pretty good. Note the join placement though, with Posts on the outer, and Comments on the inner side of the join.
Rewrite #2: Manually Aggregate Comments
What if we thought this was maybe a bad situation, and we wanted to try get a different query plan? What if we really didn’t enjoy the 986% overestimate?
WITH
c AS
(
SELECT
UserId = c.UserId,
Score = MAX(c.Score)
FROM dbo.Comments AS c
GROUP BY
c.UserId
)
SELECT
PScore = MAX(p.Score),
CScore = MAX(c.Score)
FROM dbo.Posts AS p
JOIN c
ON c.UserId = p.OwnerUserId;
GO
We could write the query like above, and see if SQL Server does any better. Right? Right.
christmas bonus
Now we’re down under a second. Comments is on the outer, and Posts is on the inner side of the join, and estimates across the board are just about spot-on.
I know what you’re thinking: We should aggregate BOTH first. When we leave it up to SQL Server’s optimizer, it’s still not getting the early aggregation message.
Rewrite #3: Manually Aggregate Both
You might be thinking “I bet if we aggregate both, it’ll take 500 milliseconds”. You’d be thinking wrong. Sorry.
WITH
p AS
(
SELECT
UserId = p.OwnerUserId,
Score = MAX(p.Score)
FROM dbo.Posts AS p
GROUP BY
p.OwnerUserId
),
c AS
(
SELECT
UserId = c.UserId,
Score = MAX(c.Score)
FROM dbo.Comments AS c
GROUP BY
c.UserId
)
SELECT
PScore = MAX(p.Score),
CScore = MAX(c.Score)
FROM p
JOIN c
ON c.UserId = p.UserId;
This is great. Aren’t common table expressions just great? Yay. yay. y?
oh you big dummy
We made things nearly 500 milliseconds worse. I want to take this opportunity to share something annoying with you: it doesn’t matter which order we write our common table expressions in, or which order we join them in when we select data out of it, SQL Server’s optimizer chooses the exact same plan as this one. There’s no point in showing you the other query plans, because they look identical to this one: Posts is on the outer, and Comments is on the inner, side of the join.
Cardinality estimates improve somewhat but not in a meaningful way. We just know we’re gonna have to aggregate both sets before doing the join, so we get table cardinality right, but cardinality estimation for the aggregates are both pretty bad, and the join is screwed.
Rewrite #4: Manually Aggregate Both, Force Join Order
This must be the magic that finally improves things substantially. Right? Wrong.
WITH
p AS
(
SELECT
UserId = p.OwnerUserId,
Score = MAX(p.Score)
FROM dbo.Posts AS p
GROUP BY
p.OwnerUserId
),
c AS
(
SELECT
UserId = c.UserId,
Score = MAX(c.Score)
FROM dbo.Comments AS c
GROUP BY
c.UserId
)
SELECT
PScore = MAX(p.Score),
CScore = MAX(c.Score)
FROM c
JOIN p
ON c.UserId = p.UserId
OPTION(FORCE ORDER);
GO
Doing this will suplex the optimizer into putting Comments on the outer, and Posts on the inner side of the join.
anybody else
This, unfortunately, only gets us in about as good a situation as when we only did a manual aggregate of the Comments table. Given the current set of indexes, the only thing I could find that gave meaningful improvement was to run at DOP 16 rather than DOP 8.
The current set of indexes look like this:
CREATE INDEX
c
ON dbo.Comments
(Score)
INCLUDE
(UserId)
WITH
(MAXDOP = 8, SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);
CREATE INDEX
p
ON dbo.Posts
(Score, OwnerUserId)
INCLUDE
(PostTypeId)
WITH
(MAXDOP = 8, SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);
And remember I said at the beginning: SQL Server’s optimizer is insistent that better indexes would make things better.
In tomorrow’s post, we’ll look at how that goes.
Thanks for reading!
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.
Whether you want to be the next great query tuning wizard, or you just need to learn how to start solving tough business problems at work, you need a solid understanding of not only what makes things fast, but also what makes them slow.
I work with consulting clients worldwide fixing complex SQL Server performance problems. I want to teach you how to do the same thing using the same troubleshooting tools and techniques I do.
I’m going to crack open my bag of tricks and show you exactly how I find which queries to tune, indexes to add, and changes to make. In this day long session, you’re going to learn about hardware, query rewrites that work, effective index design patterns, and more.
Before you get to the cutting edge, you need to have a good foundation. I’m going to teach you how to find and fix performance problems with confidence.
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.
I have a couple posts scheduled next week about aggregates, so now is a good time to continue my series on how you, my dear, sweet, developers, can use indexes in various ways to fix performance issues.
In this one, we’re going to get into a weird cross-section, because I run into a lot of developers who think that modularizing abstracting queries away in various things is somehow better for performance.
It’ll happen with functions a lot where I’ll hear that, but by far, it is more common to hear this about views.
Everything from results getting cached or materialized to metadata lookups being faster has been thrown at me in their defense.
By far the strangest was someone telling me that SQL Server creates views automatically for frequently used queries.
They are borderline unusable. Seriously, read through the list of “things you can’t do” I linked to up there, and note the utter scarcity of “possible workarounds”.
How a $200 billion dollar a year company has an indexed view feature that doesn’t support MIN, MAX, AVG, subqueries (including EXISTS and NOT EXISTS), windowing functions, UNION, UNION ALL, EXCEPT, INTERSECT, or HAVING in the year 2024 is beyond me, and things like this are why many newer databases will continue to eat SQL Server’s lunch, and many existing databases (Oracle, Postgres) which have much richer features available for indexed (materialized) views point and laugh at us.
Anyway, here’s the query.
SELECT
u.Id,
u.DisplayName,
QuestionUpScore =
SUM(CONVERT(bigint, CASE WHEN p.PostTypeId = 1 AND v.VoteTypeId = 2 THEN 1 ELSE 0 END)),
QuestionDownScore =
SUM(CONVERT(bigint, CASE WHEN p.PostTypeId = 1 AND v.VoteTypeId = 3 THEN -1 ELSE 0 END)),
AnswerAcceptedScore =
SUM(CONVERT(bigint, CASE WHEN p.PostTypeId = 2 AND v.VoteTypeId = 1 THEN 1 ELSE 0 END)),
AnswerUpScore =
SUM(CONVERT(bigint, CASE WHEN p.PostTypeId = 2 AND v.VoteTypeId = 2 THEN 1 ELSE 0 END)),
AnswerDownScore =
SUM(CONVERT(bigint, CASE WHEN p.PostTypeId = 2 AND v.VoteTypeId = 3 THEN -1 ELSE 0 END)),
CommentScore =
SUM(CONVERT(bigint, c.Score))
FROM dbo.Users AS u
JOIN dbo.Posts AS p
ON p.OwnerUserId = u.Id
JOIN dbo.Comments AS c
ON c.UserId = u.Id
JOIN dbo.Votes AS v
ON v.PostId = p.Id
WHERE u.Reputation > 500000
AND p.Score > 0
AND c.Score > 0
AND p.PostTypeId IN (1, 2)
AND v.VoteTypeId IN (1, 2, 3)
GROUP BY
u.Id,
u.DisplayName;
It just barely qualifies for indexed view-ness, but only if I add a COUNT_BIG(*) to the select list.
Now, here’s the thing, currently. We gotta look at a query plan first. Because it’s going to drive a lot of what I tell you later in the post.
A Query Plan!
Right now when I execute this query, I get a nice, happy, Batch Mode on Row Store query plan.
It’s not the fastest, which is why I still want to create an indexed view. But stick with me. It runs for about a minute:
ONE MORE MINUTE
What’s particularly interesting is a Batch Mode Compute Scalar operator running for 22 seconds on its own. Fascinating.
Okay, but the important thing here: It takes one minute to run this query.
In Batch Mode.
An Indexed View Definition!
Let’s take a stab at creating an indexed view out of this thing, named after the Republica song that was playing at the time.
CREATE OR ALTER VIEW
dbo.ReadyToGo
WITH SCHEMABINDING
AS
SELECT
u.Id,
u.DisplayName,
QuestionUpScore =
SUM(CONVERT(bigint, CASE WHEN p.PostTypeId = 1 AND v.VoteTypeId = 2 THEN 1 ELSE 0 END)),
QuestionDownScore =
SUM(CONVERT(bigint, CASE WHEN p.PostTypeId = 1 AND v.VoteTypeId = 3 THEN -1 ELSE 0 END)),
AnswerAcceptedScore =
SUM(CONVERT(bigint, CASE WHEN p.PostTypeId = 2 AND v.VoteTypeId = 1 THEN 1 ELSE 0 END)),
AnswerUpScore =
SUM(CONVERT(bigint, CASE WHEN p.PostTypeId = 2 AND v.VoteTypeId = 2 THEN 1 ELSE 0 END)),
AnswerDownScore =
SUM(CONVERT(bigint, CASE WHEN p.PostTypeId = 2 AND v.VoteTypeId = 3 THEN -1 ELSE 0 END)),
CommentScore =
SUM(CONVERT(bigint, c.Score)),
WellOkayThen =
COUNT_BIG(*)
FROM dbo.Users AS u
JOIN dbo.Posts AS p
ON p.OwnerUserId = u.Id
JOIN dbo.Comments AS c
ON c.UserId = u.Id
JOIN dbo.Votes AS v
ON v.PostId = p.Id
WHERE u.Reputation > 500000
AND p.Score > 0
AND c.Score > 0
AND p.PostTypeId IN (1, 2)
AND v.VoteTypeId IN (1, 2, 3)
GROUP BY
u.Id,
u.DisplayName;
GO
CREATE UNIQUE CLUSTERED INDEX
RTG
ON dbo.ReadyToGo
(Id)
WITH
(SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);
GO
This will create successfully but…
An Indexed View Creation Query Plan!
Indexed views can’t be created using Batch Mode. They must be created in Row Mode.
This one takes two hours, almost.
EXCUSE ME WHAT
Yes, please do spend an hour and twenty minutes in a a Nested Loops Join. That’s just what I wanted.
Didn’t have anything else to do with my day.
Of course, with that in place, the query finishes instantly, so that’s nice.
Perils!
So yeah, probably not great that creating the indexed view takes that long. Imagine what that will do to any queries that modify the base data in these tables.
Hellish. And all that to produce a 30 row indexed view. Boy to the Howdy.
This is a bit of a cautionary tale about creating indexed views that span multiple tables. It is probably not the greatest idea, because maintaining them becomes difficult as data is inserted, updated, or deleted. I’m leaving the M(erge) word out of this, because screw that thing anyway.
If we, and by we I mean me, wanted to be smarter about this, we would have taken a better look at the query and taken stock of a couple things and considered some different options.
Maybe the Comments aggregation should be in its own indexed view
Maybe the Posts aggregation should be in its own indexed view (optionally joined to Votes)
Maybe the Comments and Posts aggregations being done in a single indexed view would have been good enough
Of course, doing any of those things differently would change our query a bit. Right now, we’re using PostTypeId to identify questions and answers, but it’s not in the select list otherwise. We’d need to add that, and group by it, too, and we still need to join to Votes to get the VoteTypeId, so we know if something was an upvote, downvote, or answer acceptance.
We could also just live with a query taking a minute to run. If you’re going to sally forth with indexed views, consider what you’re asking them to do, and what you’re asking SQL Server to maintain when you add them across more than one table.
They can be quite powerful tools, but they’re incredibly limited, and creating them is not always fast or straightforward.
Thanks for reading!
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.