Starting SQL: Quick Notes About Heaps In SQL Server (Tables Without Clustered Indexes)

Nein Nein Nein


I know, I said we’d talk about indexes, and heaps are basically the opposite of indexes.

Even if you have a heap with a nonclustered primary key, or nonclustered indexes hanging around, the base table is still a heap.

Why? It’s all about that clustered index.

Do you always need a clustered index? Sometimes not. But if you’re running an OLTP workload, you probably do.

It’s not that heaps can’t be useful, it’s just that their best use-case is still loading the big datas.

Downsides Of Heaps


Heaps have a couple issues that are going to sound a lot like index fragmentation. In fact, they’re a lot like index fragmentation.

  • Forwarded fetches from updates
  • Empty pages from deletes

Forwarded Fetches

Forwarded fetches are a lot like logical fragmentation, which is when pages are out of order. Instead of having clustered index keys to follow, we have pointers from where the row used to be to where it got moved to.

This happens because updates change a value that makes a row not fit on a page anymore.

Heaps compound the issue a little bit though, because you can’t seek into a heap to avoid forwarded fetches (though you can seek into any index created on top of a heap). You can seek into a clustered index to avoid logical fragmentation.

Empty Space

Empty space from deletes goes a little bit further, too. You can end up with entire pages being empty, if queries that delete data don’t successfully escalate locks to the table level. Worse, those empty pages get read when the heap is scanned.

Sure, clustered indexes can end up with a bunch of empty space on pages, but when pages are totally emptied out they will get deallocated. Heaps don’t do that without a table level lock during the delete.

The thing is that both of these only happen to the heap itself. If you’ve got nonclustered indexes on the heap and queries mostly use those, you may not notice either of these problems. Perhaps a little bit if you do bookmark lookups.

When They Might Matter


This is largely a size thing. The bigger your heaps are, the more pronounced these problems can be.

It also depends a bit on what heap tables get used for. Or more importantly, who uses them.

If they’re staging tables that end users don’t touch, fixing them probably won’t solve a big problem for you outside of ETL.

But if they’re transactional tables where end users are complaining about performance, you need to do two things: check the query plans for table scans, and then check the heaps for forwarded fetches and deletes.

To fix them, you can rebuild the heap, or create a clustered index. In either case, be very careful if you’ve got nonclustered indexes on the table, because they’ll need to all get rebuilt to either fix the RIDs in a heap, or add the clustered index keys.

Your next question is: Erik, how do I pick a clustered index? We’ll talk about that tomorrow!

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Starting SQL: Why SQL Server’s Optimizer Doesn’t Always Choose The Same Index

What’s With Widths?


We don’t need to add a ton of columns to our query to have index usage change, but we do need to go back in time a little bit.

Here’s our query now, with just a one day difference in the where clause.

SELECT p.CreationDate,
       COUNT_BIG(*) AS records
FROM dbo.Posts AS p
WHERE p.CreationDate >= '20131015'
AND   p.OwnerUserId = 22656
GROUP BY p.CreationDate;


SELECT p.CreationDate,
       COUNT_BIG(*) AS records
FROM dbo.Posts AS p
WHERE p.CreationDate >= '20131016'
AND   p.OwnerUserId = 22656
GROUP BY p.CreationDate;

Not quite Halloween, but I’ll take the spooky factor.

The other difference is that now we’ve got the OwnerUserId column in there, which isn’t in our nonclustered index.

It’s in the where clause, not the select list, but if we added it there it would have a similar effect on the query. Either way, we have to do something with this new column, and we have to get it from somewhere.

CREATE INDEX CreationDate ON dbo.Posts(CreationDate);

Things Are Looking Up


The query plans for these will look a little bit different.

SQL Server Query Plan
what’s it for

If you’re not familiar with what just happened here, a Key Lookup is a trick the optimizer has up its sleeve. It uses two indexes from the same table to satisfy one query.

We find rows in our nonclustered index that qualify for the filter on CreationDate. That’s a smart start!

Then we join the nonclustered index to the clustered index to find any columns we might need. Remember the clustered index has all the table columns in it.

Stuff like this is made possible by nonclustered indexes inheriting clustered index key columns. Crazy, right?

The Point At This Point


There are many internal details to explore around Key Lookups. There are even some interesting things about how clustered index keys get stored in nonclustered indexes.

What you need to know about Lookups right now is what they are (which we talked about), and that they represent a choice the optimizer has when it comes to index usage.

If you create a narrow index, say one that satisfies some part of the query like the join or where clause, but doesn’t fully contain all of the columns referenced in your query, it may not get used reliably. The usage is decided based on cardinality estimates. The more rows SQL Server expects, the less likely it is that your narrow index will get used.

For example, it may only get used when the value for CreationDate is estimated to return a small-ish number of rows. Parameterization and plan re-use can make this even more confusing.

Next, we’ll look at how we can encourage the optimizer to choose narrow indexes, and the problems we might run into.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Starting SQL: Messing Up A SQL Server Query’s Performance

Your Fault


Yesterday we created an index, and things went pretty well. Good for us. By next week we’ll have six pack abs.

Today we’re going to look at things that I see happen pretty often in queries that mess with effciency.

If one were to sit and think carefully about the way B-Tree indexes are implemented, where key columns define order, one may see these problems coming a mile away.

Then again, one might expect a full formed datababby system to be able to figure some of these things out and use indexes appropriately anyway.

General Anti-Patterns


I’ve posted this on here a number of times, but here it goes again.

Things that keep SQL Server from being able to seek:

  • Function(Column) = …
  • Column + Column = …
  • Column + Value = …
  • Value + Column = …
  • Column = @Value or @Value IS NULL
  • Column LIKE ‘%…’
  • Some implicit conversions (data type mismatches)

Seeks aren’t always necessary, or even desirable, and likewise scans aren’t always bad or undesirable. But if we’re going to give our queries the best chance of running well, our job is most often to give the optimizer every opportunity to make the right decisions. Being the optimizer is hard enough without us grabbing it by the nose and poking it in the eyes.

To fix code, or make code that looks like that tolerable to The Cool DBA Kids™, there are some options like:

  • Computed columns
  • Dynamic SQL
  • Rewrites

Different options work well in different scenarios, of course. And since we’re here, I might as well foreshadow a future post: These patterns are most harmful when applied to the leading key column of an index. When they’re residual predicates that follow seek predicates, they generally make less of a difference. But we’re not quite there yet.

The general idea, though, is that as soon as we write queries in a way that obscure column data, or introduce uncertainty about what we’re searching for, the optimizer has a more difficult time of things.

Do It Again


Let’s compare a couple different ways of writing yesterday’s query. One good, one bad (in that order).

SELECT p.CreationDate,
       COUNT_BIG(*) AS records
FROM dbo.Posts AS p
WHERE p.CreationDate >= '20131225'
GROUP BY p.CreationDate;

SELECT p.CreationDate,
       COUNT_BIG(*) AS records
FROM dbo.Posts AS p
WHERE YEAR(p.CreationDate) = 2013
AND   MONTH(p.CreationDate) = 12
AND   DAY(p.CreationDate) >= 25
GROUP BY p.CreationDate;

The mistake people often make here is that they think these presentation layer functions have some relational meaning. They don’t.

They’re presentation layer functions. Let’s see those execution plans. Maybe then you’ll believe me.

SQL Server Query Plan
horse and carriage

Things are not so hot when we pile a Mess Of Functions™ into the where clause, are they?

I mean, our CPUs are hot, but that’s generally not what we’re after.

The Use Of Indexes


We could still use our index. Many people will talk about functions preventing the use of indexes, but more precisely we just can’t seek into them.

But you know what can prevent the use of nonclustered indexes? Long select lists.

Next time, we’ll look at that.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Streaming Week: Query Performance Weirdness In SQL Server 2019 With Batch Mode

awks ox


Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Streaming Week: What Else Can SQL Server Indexes Do To Improve Performance?

WELL.


Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

IS NULL vs IS NOT NULL In SQL Server Queries

Drop The Verse


Why is IS NULL (not to be confused with ISNULL, the function) considered an equality predicate, and IS NOT NULL considered an inequality (or range) predicate?

It seems like they should be fairly equivalent, though opposite. One tests for a lack of values, and one tests for the presence of values, with no further examination of what those values are.

The trickier thing is that we can seek to either condition, but what happens next WILL SHOCK YOU.

Ze Index


The leading column in this index is NULLable, and has a bunch of NULLs in it.

CREATE INDEX nully_baby 
    ON dbo.Posts(LastEditDate, Score DESC);

Knowing what we know about what indexes do to data, and since the LastEditDate column is sorted ascending, all of the NULL values will be first, and then within the population of NULLs values for Score will be sorted in descending order.

But once we get to non-NULL values, Score is sorted in descending order only within any duplicate date values. For example, there are 4000 some odd posts with a LastEditDate of “2018-07-09 19:34:03.733”.

Why? I don’t know.

But within that and any other duplicate values in LastEditDate, Score will be in descending order.

Proving It


Let’s take two queries!

SELECT TOP (5000) 
    p.LastEditDate,
    p.Score
FROM dbo.Posts AS p
WHERE p.LastEditDate IS NULL
ORDER BY p.Score DESC;

SELECT TOP (5000) 
    p.LastEditDate,
    p.Score
FROM dbo.Posts AS p
WHERE p.LastEditDate IS NOT NULL
ORDER BY p.Score DESC;

Which get very different execution plans.

SQL Server Query Plan
you can’t get it

But Why?


I know, I know. The sort is technically non-deterministic, because Score has duplicates in it. Forget about that for a second.

For the NULL values though, Score is at least persisted in the correct order.

For the NOT NULL values, Score is not guaranteed to be in a consistent order across different date values. The ordering will reset within each group.

We’ll talk about how that works tomorrow.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Should SQL Server’s Optimizer Choose Lookup Query Plans More Often?

Frangible Tuning


Right now, the optimizer’s costing algorithm’s cost lookups as being pretty expensive.

Why? Because it’s stuck in the 90s, and it thinks that random I/O means mechanical doo-dads hopping about on a spinning platter to fetch data.

And look, I get why changes like this would be really hard. Not only would it represent a change to how costs are estimated, which could throw off a whole lot of things, but you also open potentially more queries up to parameter sniffing issues.

Neither of those prospects are great, but I hear from reliable sources that Microsoft “hope[s] to make parameter sniffing less of a problem for customers” in the future.

In the meantime, what do I mean?

Kiss of Death


Scanning clustered indexes can be painful. Not always, of course, but often enough that it’s certainly something to ask questions about in OLTP-ish queries.

Let’s use the example query from yesterday’s blog post again, with a couple minor changes, and an index.

CREATE INDEX unusable 
    ON dbo.Posts(OwnerUserId, Score DESC, CreationDate, LastActivityDate)
    INCLUDE(PostTypeId);

Let’s run this hyper-realistic query, with slightly different dates in the where clause.

SELECT TOP (5000) 
    p.OwnerUserId,
    p.Score,
    ISNULL(p.Tags, N'N/A: Question') AS Tags,
    ISNULL(p.Title, N'N/A: Question') AS Title, 
    p.CreationDate, 
    p.LastActivityDate, 
    p.Body
FROM dbo.Posts AS p
WHERE p.OwnerUserId IS NOT NULL
AND   p.CreationDate >= '20130927'
AND   p.LastActivityDate < '20140101'
ORDER BY p.Score DESC;


SELECT TOP (5000) 
    p.OwnerUserId,
    p.Score,
    ISNULL(p.Tags, N'N/A: Question') AS Tags,
    ISNULL(p.Title, N'N/A: Question') AS Title, 
    p.CreationDate, 
    p.LastActivityDate, 
    p.Body
FROM dbo.Posts AS p
WHERE p.OwnerUserId IS NOT NULL
AND p.CreationDate >= '20130928'
AND p.LastActivityDate < '20140101'
ORDER BY p.Score DESC;

The query plan for the first query looks like this:

SQL Server Query Plan
optimal, sub

We scan the clustered index, and the query as a whole takes around 9 seconds.

Well, okay.

What about the other query plan?

SQL Server Query Plan
mwah

That runs about 7 seconds faster. But why?

Come Clean


There’s one of those ✌tipping points✌ you may have heard about. One day. What a difference, huh?

Let’s back up to the first query.

SELECT TOP (5000) 
    p.OwnerUserId,
    p.Score,
    ISNULL(p.Tags, N'N/A: Question') AS Tags,
    ISNULL(p.Title, N'N/A: Question') AS Title, 
    p.CreationDate, 
    p.LastActivityDate, 
    p.Body
FROM dbo.Posts AS p
WHERE p.OwnerUserId IS NOT NULL
AND   p.CreationDate >= '20130927'
AND   p.LastActivityDate < '20140101'
ORDER BY p.Score DESC;

SELECT TOP (5000) 
    p.OwnerUserId,
    p.Score,
    ISNULL(p.Tags, N'N/A: Question') AS Tags,
    ISNULL(p.Title, N'N/A: Question') AS Title, 
    p.CreationDate, 
    p.LastActivityDate, 
    p.Body
FROM dbo.Posts AS p WITH(INDEX = unusable)
WHERE p.OwnerUserId IS NOT NULL
AND   p.CreationDate >= '20130927'
AND   p.LastActivityDate < '20140101'
ORDER BY p.Score DESC;

There’s no way one day should make this thing 7 seconds slower, so we’re going to hint one copy of it to the use nonclustered index.

How do we do there?

SQL Server Query Plan
i’m lyin’

The much slower plan has a lower cost. The optimizer gave the seek + lookup a higher cost than the scan.

If we look at the subtree cost of the first operator, you’ll see what I mean.

SQL Server Query Plan
pina colada

Zone Out


You may hear people talk about costs, either of query plans, or of operators, that indicate what took the most time. This is unfortunately not quite the case.

Note that there are no “actual cost” metrics that get calculated and added to the plan later. The estimates remain with no counterparts.

You can answer some common questions this way:

  • Why didn’t my index get chosen? The optimizer thought it’d be more work
  • How did it make that choice? Estimated costs of different potential plans
  • Why was the optimizer wrong? Because it’s biased against random I/O.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Why You Can’t Always Rely On SQL Server Missing Index Requests

Greentexting


The problem with relying on any data point is that when it’s not there, it can look like there’s nothing to see.

Missing indexes requests are one of those data points. Even though there are many reasons why they might not be there, sometimes it’s not terribly clear why one might not surface.

That can be annoying if you’re trying to do a general round of tuning on a server, because you can miss some easy opportunities to make improvements.

Here’s an example of a query that, with no indexes in place, probably should generate a missing index request.

SELECT TOP (5000) 
    p.OwnerUserId,
    p.Score,
    ISNULL(p.Tags, N'N/A: Question') AS Tags,
    ISNULL(p.Title, N'N/A: Question') AS Title, 
    p.CreationDate, 
    p.LastActivityDate, 
    p.Body
FROM dbo.Posts AS p
WHERE 1 = 1 
AND p.CreationDate >= '20131230'
AND p.CreationDate < '20140101'
ORDER BY p.Score DESC;

Big Ol’ Blank


Here’s the query plan! It’s like uh. Why wouldn’t you want this to take less than 25 seconds?

SQL Server Query Plan
clap your hands

The posts table is a little over 17 million rows. The optimizer expects around 20k rows to qualify, but doesn’t think an easier way to find those rows would be helpful.

At least not the way we’ve written the query.

Let’s make a small change

Five and Dime


If we quote out the Body column, which is an NVARCHAR(MAX), we get our green text.

SELECT TOP (5000) 
    p.OwnerUserId,
    p.Score,
    ISNULL(p.Tags, N'N/A: Question') AS Tags,
    ISNULL(p.Title, N'N/A: Question') AS Title, 
    p.CreationDate, 
    p.LastActivityDate--, 
    --p.Body
FROM dbo.Posts AS p
WHERE 1 = 1 
AND p.CreationDate >= '20131230'
AND p.CreationDate < '20140101'
ORDER BY p.Score DESC;
SQL Server Query Plan
Who’d want that in an index?

Which is interesting, because the optimizer isn’t always that smart. It’s much easier to tempt it into bad ideas with equality predicates.

Good and Hard


Check this out!

SELECT TOP (5000) *
FROM dbo.Posts AS p
WHERE p.ParentId = 184618;


SELECT TOP (5000) *
FROM dbo.Posts AS p
WHERE p.ParentId > 184617
AND   p.ParentId < 184619;
SQL Server Query Plan
hot cars

The missing index for this is a mistake.

CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[Posts] ([ParentId])
INCLUDE ([AcceptedAnswerId],[AnswerCount],[Body],[ClosedDate],[CommentCount],[CommunityOwnedDate],[CreationDate],[FavoriteCount],[LastActivityDate],[LastEditDate],[LastEditorDisplayName],[LastEditorUserId],[OwnerUserId],[PostTypeId],[Score],[Tags],[Title],[ViewCount])

What Did We Learn?


How we write queries (and design tables) can change how the optimizer feels about our queries. If you’re the kind of person who relies on missing index requests to fix things, you could be missing pretty big parts of the picture.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

SQL Server Windowing Functions: Sorts And Query Performance

Puzzle Pieces


I was investigating a query slowdown recently, and came across something kind of odd with windowing functions and order by.

Let’s talk about these three queries:

SELECT u.DisplayName,
       ROW_NUMBER() OVER (ORDER BY u.UpVotes) AS UpVotesWhatever, --UpVotes first
       ROW_NUMBER() OVER (ORDER BY u.DownVotes) AS DownVotesWhatever --DownVotes second
FROM dbo.Users AS u
WHERE u.Reputation > 100000
ORDER BY UpVotes; --Order by UpVotes

SELECT u.DisplayName,
       ROW_NUMBER() OVER (ORDER BY u.UpVotes) AS UpVotesWhatever, --UpVotes first
       ROW_NUMBER() OVER (ORDER BY u.DownVotes) AS DownVotesWhatever --DownVotes second
FROM dbo.Users AS u
WHERE u.Reputation > 100000
ORDER BY u.DownVotes; --Order by DownVotes

SELECT u.DisplayName,
       ROW_NUMBER() OVER (ORDER BY u.DownVotes) AS DownVotesWhatever, --DownVotes first
       ROW_NUMBER() OVER (ORDER BY u.UpVotes) AS UpVotesWhatever --UpVotes second
FROM dbo.Users AS u
WHERE u.Reputation > 100000
ORDER BY UpVotes; --Order by UpVotes

Goings On


If we’re going to generate row numbers on these columns, we need to sort them.

I know and you know, we can add indexes to put column data in the order we want it in, and that’ll cut down on the amount of work Our Server™ has to do to execute this query. But we can’t just index everything, that’d be insane. I know because I’ve seen your servers, and I’ve seen you try to do that.

Plus, they just get fragmented anyway.

Here are the execution plans. This is a big picture, because I want you to spot the difference.

SQL Server Query Plans
get big

Fascination Street


That first plan has an extra Sort operator in it. See it up there? Right next to the Select operator?

SQL Server Query Plans
shame on you

That sort is ordering by UpVotes ascending, which is a shame because we’ve already done that once. That sort doesn’t occur in the second two plans, because the row number function has already sorted data by them. If the optimizer were a little smarter here, it could reorder the sequence it generates row numbers in to avoid that, but it doesn’t.

If we rewrite the query to do that on our own, the data ends up in the right order. In case you’re wondering, we get the same results referencing the row numbers in the order by instead of the underlying column:

SELECT u.DisplayName,
       ROW_NUMBER() OVER (ORDER BY u.UpVotes) AS UpVotesWhatever, --UpVotes first
       ROW_NUMBER() OVER (ORDER BY u.DownVotes) AS DownVotesWhatever --DownVotes second
FROM dbo.Users AS u
WHERE u.Reputation > 100000
ORDER BY UpVotesWhatever; --Order by UpVotesWhatever

SELECT u.DisplayName,
       ROW_NUMBER() OVER (ORDER BY u.UpVotes) AS UpVotesWhatever, --UpVotes first
       ROW_NUMBER() OVER (ORDER BY u.DownVotes) AS DownVotesWhatever --DownVotes second
FROM dbo.Users AS u
WHERE u.Reputation > 100000
ORDER BY DownVotesWhatever; --Order by DownVotesWhatever

SELECT u.DisplayName,
       ROW_NUMBER() OVER (ORDER BY u.DownVotes) AS DownVotesWhatever, --DownVotes first
       ROW_NUMBER() OVER (ORDER BY u.UpVotes) AS UpVotesWhatever --UpVotes second
FROM dbo.Users AS u
WHERE u.Reputation > 100000
ORDER BY UpVotesWhatever; --Order by UpVotesWhatever

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

SQL Server 2019: When Batch Mode On Rowstore Isn’t Better For Performance

Um Hello


WHAT DO YOU MEAN YOU’RE NOT ON SQL SERVER 2019 YET.

Oh. Right.

That.

Regressed


Look, whenever you make changes to the optimizer, you’re gonna hit some regressions.

And it’s not just upgrading versions, either. You can have regressions from rebuilding or restarting or recompiling or a long list of things.

Databases are terribly fragile places. You have to be nuts to work with them.

I’m not mad at 2019 or Batch Mode On Rowstore (BMOR) or anything.

But if I’m gonna get into it, I’m gonna document issues I run into so that hopefully they help you out, too.

One thing I ran into recently was where BMOR kicked in for a query and made it slow down.

Repro


Here’s my index:

CREATE INDEX mailbag ON dbo.Posts(PostTypeId, OwnerUserId) WITH(DATA_COMPRESSION = ROW);

And here’s my query:

SELECT u.Id, u.DisplayName, u.Reputation,
       (SELECT COUNT_BIG(*) FROM dbo.Posts AS pq WHERE pq.OwnerUserId = u.Id AND pq.PostTypeId = 1) AS q_count,
       (SELECT COUNT_BIG(*) FROM dbo.Posts AS pa WHERE pa.OwnerUserId = u.Id AND pa.PostTypeId = 2) AS a_count
FROM dbo.Users AS u
WHERE u.Reputation >= 25000
ORDER BY u.Id;

It’s simplified a bit from what I ran into, but it does the job.

Batchy

This is the batch mode query plan. It runs for about 2.6 seconds.

SQL Server Query Plan
who would complain?

Rowy

And here’s the row mode query plan. It runs for about 1.3 seconds.

SQL Server Query Plan
oh that’s why.

What Happened?


Just when you think the future is always faster, life comes at you like this.

So why is the oldmode query more than 2x faster than the newhotmode query?

There are a reason, and it’s not very sexy.

Batch Like That

First, the hash joins produce Bitmaps.

SQL Server Query Plan Tool Tip
bitted

You don’t see Bitmaps in Batch Mode plans as operators like you’re used to in Row Mode plans. You have to look at the properties (not the tool tip) of the Hash Join operator.

Even though both plans seek into the index on Posts, it’s only for the PostTypeId in the Batch Mode plan.

It would be boring to show you both, so I’m just going to use the details from the branch where we find PostTypeId = 2.

SQL Server Query Plan Tool Tip
buck fifty

Remember this pattern: we seek to all the values where PostTypeId = 2, and then apply the Bitmap as a residual predicate.

You can pretty easily mentally picture that.

Rowbot

In the row mode plan, the Nested Loops Joins are transformed to Apply Nested Loops:

SQL Server Query Plan Tool Tip
applys and oranges

Which means on the inner side of the join, both the PostTypeId and the OwnerUserId qualify as seek predicates:

SQL Server Query Plan Tool Tip
oh yeah that

Reading Rainbow


The better performance comes from doing fewer reads when indexes are accessed.

SQL Server Query Plan Tool Tip
psychic tv

Though both produce the same number of rows, the Hash Join plan in Batch Mode reads 28 million rows, or about 21 million more rows than the Nested Loop Join plan in row mode. In this case, the double seek does far fewer reads, and even Batch Mode can’t cover that up.

Part of the problem is that the optimizer isn’t psychic.

Fixing It


There are two ways I found to get the Nested Loop Join plan back.

The boring one, using a compat level hint:

SELECT u.Id, u.DisplayName, u.Reputation,
       (SELECT COUNT_BIG(*) FROM dbo.Posts AS pq WHERE pq.OwnerUserId = u.Id AND pq.PostTypeId = 1) AS q_count,
       (SELECT COUNT_BIG(*) FROM dbo.Posts AS pa WHERE pa.OwnerUserId = u.Id AND pa.PostTypeId = 2) AS a_count
FROM dbo.Users AS u
WHERE u.Reputation >= 25000
ORDER BY u.Id
OPTION(USE HINT('QUERY_OPTIMIZER_COMPATIBILITY_LEVEL_140'));

And the more fun one, rewriting the correlated subqueries as outer apply:

SELECT u.Id, u.DisplayName, u.Reputation, q_count, a_count
FROM dbo.Users AS u
    OUTER APPLY(SELECT COUNT_BIG(*) AS q_count FROM dbo.Posts AS pq WHERE pq.OwnerUserId = u.Id AND pq.PostTypeId = 1) AS q_count
    OUTER APPLY(SELECT COUNT_BIG(*) AS a_count FROM dbo.Posts AS pa WHERE pa.OwnerUserId = u.Id AND pa.PostTypeId = 2) AS a_count
WHERE u.Reputation >= 25000
ORDER BY u.Id;

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.