Understand Your Plan: Processing Data With An Aggregate Operator

Smack Dab


Aggregates can be useful for all sorts of things in a query plan, and can show up in many different forms.

It would be tough to cover all of them in a single post, but what I’d like to do is help all you nice folks out there understand some of their finer points.

Streaming and Hashing


The two main types of aggregates you’ll see in SQL Server query plans are:

  • Stream Aggregate (expect ordered data)
  • Hash Aggregate (don’t care about order)

Of course, they break down into more specific types, too.

  • Partial aggregates (an early attempt to reduce rows)
  • Scalar aggregates (returning a sum or count without a group by, for example)
  • Vector aggregates (with a group by)

And if you want to count more analytical/windowing functions, you might also see:

  • Segment/Sequence project (row mode windowing functions)
  • Window Aggregates (batch mode windowing functions)

To learn more about partial aggregates, check out this post.

Aggregation Ruling The Nation


One of the big benefits of aggregates is, of course, making a set of values distinct. This can be particularly help around joins, especially Merge Joins where the many to many type can cause a whole lot of performance issues.

That being said, not every aggregation is productive. For example, some aggregations might happen because the optimizer misjudges the number of distinct values in a set. When this happens, often other misestimates will follow.

Two of the biggest factors for this going awry are:

  • Memory grants for sorts and aggregations across the execution plan
  • Other operators chosen based on estimated row counts being much lower

Here are some examples:

SQL Server Query Plan
creep

This hash match aggregate is the victim of a fairly large misestimation, and ends up spilling out to disk. In this case, the spill is pretty costly from a performance perspective and ends up adding about 5 seconds to the query plan.

SQL Server Query Plan
distorted

Here, a hash match aggregate misestimates again, but now you can see it impact the next operator over, too. The sort just didn’t have enough of a memory grant to avoid spilling. We add another few seconds onto this one.

SQL Server Query Plan
contagious

In extreme cases, those spills can really mess things up. This one carries on for a couple minutes, doing nothing but hashing, spilling, and having a bad time.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Understand Your Plan: Getting Data With A Scan Operator

They Live


I’m not sure why scans got such a bad rap. Perhaps it’s a leftover from the Bad Old Days© when people worried about logical fragmentation and page splits.

What I mean to say is: scans are often treated with a level of revulsion and focus that distracts people from larger issues, and is often due to some malpractice on their part.

Sure, scans can be bad. Sometimes you do need to fix them. But make that decision when you’re looking at an actual execution plan, and not just making meme-ish guesses based on costs and estimates.

You’re better than that.

It Follows


Let’s start with a simple query:

SELECT TOP (1)
    p.*
FROM dbo.Posts AS p;

There’s no where clause, no join, and we’ll get a scan, but it’s driven by the TOP.

If we get look at the actual execution plan, and hit F4/get the Properties of the clustered index scan, we’ll see it does a minimal number of reads.

SQL Server Query Plan
scooting

The four reads to get a single row are certainly not the number of reads it would take to read this entire Posts table.

The reason it’s not 1 read is because the Posts table contains the nvarchar(max) Body column, which leads to reading additional off row/LOB pages.

C.H.U.D


Let’s look at a situation where you might see a clustered index scan(!), a missing index request(!), and think you’ll have the problem solved easily(!), but… no.

SELECT TOP (1000)
    u.DisplayName,
    p.Score,
    _ = dbo.InnocentScan(p.OwnerUserId)
FROM dbo.Posts AS p
JOIN dbo.Users AS u
    ON p.OwnerUserId = u.Id
WHERE p.Score = 1
ORDER BY p.Id;

Costs are really wacky sometimes. Like here.

SQL Server Query Plan
exhibit a

In this really unhelpful query plan (thanks, SSMS, for not showing what the clustered thing is), we have:

  • A clustered index scan of Posts
  • A clustered index seek of Users

The optimizer tells us that adding an index will reduce the estimated effort needed to execute the query by around 18%. Mostly because the estimated effort needed to scan the clustered index is 19% of the total operator costs.

CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[Posts] ([Score])
INCLUDE ([OwnerUserId])

Got it? Missing index requests don’t care about joins.

Geistbusters


If one were to look at the actual execution plan, one may come to the conclusion that an index here would not solve a gosh. darn. thing.

SQL Server Query Plan
plan, actually

21 seconds is spent in the operator that costs 0%, and less than 40 milliseconds is spent between the two operators that make up 100% of the plan cost.

Very sad for you if you thought that missing index request would solve all your problems.

What About Scans?


I know, the title of this post is about retrieving data via an index scan, but we ended up talking about how scans aren’t always the root of performance issues.

It ended up that way because as I was writing this, I had to help a client with an execution plan where the problem had nothing to do with a clustered index scan that they were really worried about.

To call back to a few other points about things I’ve made so far in this series:

  • The operator times in actual execution plans are the most important thing for you to look at
  • If you’re not getting a seek when you think you should, ask yourself some questions:
  • Do I have an index that I can actually seek to data in?
  • Do I have a query written to take advantage of my indexes?

If you’re not sure, hit the link below. This is the kind of stuff I love helping folks out with.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Understand Your Plan: Getting Data With A Seek Operator

Rumble, Young Man, Rumble


People. People complain. People complain about SQL Server. That’s probably why I get paid to deal with it, but whatever.

One complaint I get to hear week in and week out is that SQL Server isn’t using someone’s index, or that there are too many index scans and they’re slow.

That might actually be a composite of like twelve complaints, but let’s not start counting.

Usually when we start examining the queries, query plans, and indexes for these renegades, the reasons for the lack of a seek become apparent.

  • There’s no good index to seek to
  • The query is written in a way that seeks can’t happen
  • A predicate is on two columns

Desperately Seeking Susan


In a query that doesn’t have any of those problems, you’ll naturally get a seek and feel really good about yourself.

CREATE INDEX v ON dbo.Votes(PostId)
WITH(SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);

SELECT
    c = COUNT_BIG(*)
FROM dbo.Votes AS v
WHERE v.PostId = 194812;

We’re set for success! And look how happy things are. Happy little query plans.

SQL Server Query Plan
rookie card

Sargy Bargy


You’re smart people. You’re on top of things. You know that without an index on PostId, the query up above wouldn’t have anything to seek to.

Useful indexes are half the battle. The other half of the battle is not screwing things up.

I’m going to use dynamic SQL as shorthand for any parameterized query. I should probably add that using a local variable would only make things worse.

Don’t do that. Or this.

DECLARE
    @sql nvarchar(MAX) = N'',
    @PostId int = 194812;

SELECT
    @sql = N'
SELECT
    c = COUNT_BIG(*)
FROM dbo.Votes AS v
WHERE (v.PostId = @PostId 
OR     @PostId IS NULL);';

EXEC sys.sp_executesql
    @sql,
    N'@PostId int',
    @PostId;

While we’re on the topic, don’t do this either.

DECLARE
    @sql nvarchar(MAX) = N'',
    @PostId int = 194812;

SELECT
    @sql = N'
SELECT
    c = COUNT_BIG(*)
FROM dbo.Votes AS v
WHERE v.PostId = ISNULL(@PostId, v.PostId);';

EXEC sys.sp_executesql
    @sql,
    N'@PostId int',
    @PostId;

Here’s the query plans for these:

SQL Server Query Plan
practice makes

We end up not only scanning the entire index unnecessarily, but the second one gets a really unfavorable cardinality estimate.

It’s amazing how easy it is to ruin a perfectly good seek with lazy query writing, isn’t it?

Joint Venture


When you compare two columns in tables, you might not always see a seek, even if you have the best index ever.

Let’s use this as an example:

CREATE INDEX p ON dbo.Posts(OwnerUserId, LastEditorUserId)
WITH(SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);

SELECT
    c = COUNT_BIG(*)
FROM dbo.Posts AS p
WHERE p.OwnerUserId = p.LastEditorUserId;

We have an index that matches our predicate, and you might think that having all that data very nicely in order would make SQL Server’s job really easy to match those columns up.

But no. No in every language.

SQL Server Query Plan
scrab

Big ol’ index scan. Even if you try to force the matter, SQL Server says  no nein nyet non and all the rest.

SELECT
    c = COUNT_BIG(*)
FROM dbo.Posts AS p WITH(FORCESEEK)
WHERE p.OwnerUserId = p.LastEditorUserId;
Msg 8622, Level 16, State 1, Line 56
Query processor could not produce a query plan because of the hints defined in this query. 
Resubmit the query without specifying any hints and without using SET FORCEPLAN.

Who can deal with all that?

Hail Mary


Let’s really dig our heels in and try to make this work. We’ll create an index on both columns individually and see how things go.

CREATE INDEX p ON dbo.Posts(OwnerUserId)
WITH(SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);

CREATE INDEX pp ON dbo.Posts(LastEditorUserId)
WITH(SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);
SQL Server Query Plan
woof

Just to be clear, this is horrible. You don’t want this to happen. This sucks, and if you like it you should jump in a deep, deep hole.

Ends Well


Seeks are often the best possible outcome when a small number of rows are sought. OLTP workloads are the prime candidate for seeking seeks. A seek that reads a large portion of the table isn’t necessarily agreeable.

For everyone else, there’s nothing wrong with scans. Especially with column store indexes, you shouldn’t expect seeking to a gosh darn thing.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Understand Your Plan: Reading Operator Times In SQL Server Execution Plans

Wiseacre


When I was sitting down to figure out what to write about this, one of the topics that I thought would be interesting was the width of lines/arrows in query plans.

I sat here for half a day pecking here and there getting the post started, writing some demos, and… realized I couldn’t remember the last time I really looked at them. I could explain it, but… I’d be lying to you if I said I really cared.

Unless you have an actual execution plan, line thickness is based on estimates. They could be totally wrong, and chances are that if you’ve got a query plan in front of you it’s not because it’s running well.

If you’re going to run it and look at the actual execution plan anyway, you’d have to be seven levels obtuse to look at line widths instead of operator times.

They’re your best friends these days, and help you focus on the parts of the plan that need help.

Looney Tunes


At this point in 2022, most people are going to be looking at plans running only in row mode. As folks out there start to embrace:

  • Column store
  • SQL Server 2019 Enterprise Edition
  • Compatibility level 150+

We’ll start to see more batch mode operators, especially from batch mode on row store. I only mention this because row mode and batch mode operators track time differently, and you need to be really careful when analyzing operator times in actual execution plans.

In plans that contain a mix of row mode and batch mode operators, timing might look really funny in some places.

Let’s chat about that!

Row Mode


As of this writing, all row mode query plan operators accumulate time, going from right to left. That means each operator tracks it’s own time, along with all of the child operators under it.

Where this is useful is for following time accumulation in a query plan to where it spikes. I sort of like this because it makes tracking things down a bit easier than the per-operator times in batch mode plans.

Let’s use this plan as an example:

SQL Server Query Plan
oh i forgot it’s summertime

Yes, it’s intentionally bad. I made it that way. You’re welcome. There are two things I’d like to point out here:

  • Just about every line looks equally thick
  • Costing is a bit weird, aside from the Eager Index Spool

But you know what? If you look at operator times, you can get a pretty good idea about where things went wrong.

  • Maybe that Key Lookup wasn’t the greatest use of time
  • Boy howdy, that eager index spool took about 90 seconds

I’m not gonna go into how to fix this, I just want you to understand where time is taken, and how it adds up across operators.

Batch Mode


In batch mode plans, the story is a whole lot different.

SQL Server Query Plan
BMOR than you can be

Times for operators are all over the place. They’re not wrong, they’re just different. Each operator that executes in batch mode tracks it’s own time, without the child operator times.

Nothing adds up until the very end, when you hit the Gather Streams operator. Right now, none of the parallel exchange operators can run in batch mode, so they’ll add up all of the child operator times.

If you’ve got a plan with a lot of operators mixed between row and batch mode, things will look even weirder.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

No, Really: Don’t Optimize For Ad Hoc Workloads As A Best Practice

Better Late


A few weeks back, my friend Randolph (B|T) and I collaborated a bit on a blog post about Optimize For Ad Hoc Workloads.

Now that I’m a Microsoft Approved Blogger™️ and you all have to take me Very Seriously™️, I’d like to make a few of the points here in my own words, though I highly suggest reading Randolph’s post because it’s much better.

Randolph writes for a living. I’m just a consultant with some free time.

The result you want from Optimize For Ad Hoc Workloads is probably closer to what Forced Parameterization does


When people complain about lots of “single use” plans, they probably want there to be fewer of them. Turning on Forced Parameterization will promote plan reuse. Turning on Optimize For Ad Hoc Workloads won’t do that.

  • With Forced Parameterization, literal values are replaced (when possible) with parameters, which promotes plan re-use
  • Turning on Optimize For Ad Hoc Workloads compiles a new plan for queries with literal values, and then just caches a stub

Parameterization, and further Parameter Sniffing, has an ominous meaning for many of you.

I see it all the time, folks jumping through every hoop in the world to not have a parameter get sniffed. Recompiling, Unknown-ing, Local Variable-ing. Sad. Very sad.

Bottom line: If you want fewer single use plans, parameterize your damn queries.

Turning Optimize For Ad Hoc Workloads on can make workload analysis difficult


You know how sometimes you have a performance issue, and then you go look in the plan cache, and queries don’t have execution plans? This setting can make that happen.

Why? Because stubs aren’t query plans. Never have been. Never will be.

If we run these queries, they’ll both get a stub, because each one has a different literal value. These are “single use” plans, even though SQL Server compiled and use the “same” plan for each one.

SELECT
    c = COUNT_BIG(*)
FROM dbo.Votes AS v
WHERE v.VoteTypeId = 2
AND 1 = (SELECT 1); --Skip a Trivial Plan/Simple Parameterization
GO

SELECT
    c = COUNT_BIG(*)
FROM dbo.Votes AS v
WHERE v.VoteTypeId = 4
AND 1 = (SELECT 1); --Skip a Trivial Plan/Simple Parameterization
GO

Here are the plans:

SQL Server Query Plan
tell me why

But here’s why they’re “different” and each one gets a stub:

hashtastic

Even though the Query Hash and Query Plan Hash are identical, each one has a different SQL Handle.

I’m using a simple demo to show you the end result, but you can probably see why this would be more difficult to deal with in real life.

The “same” query might run more than once, but if a literal in the where clause is different, you’ll have two stubs


Thinking about the situation above:

  • Did we really want to compile the same execution plan twice?
  • What’s the point of having two stubs for what amounts to the exact same query and plan?

If we had turned on Forced Parameterization, that’s what would have happened: The where clause would have had the literal 4 and 2 values replaced with a parameter, and we would have gotten a cached and re-used plan.

The problem we hit is that we got two different SQL Handles because the 4 and 2 hash out differently.

That’s a problem that gets solved with parameterization, forced or fixed in code.

You’ll still compile full plans for queries to execute with


This is another misconception I run into a lot. For every time a query runs that SQL Server can’t find a matching SQL Handle for, you’ll get a new plan generated, even if only a stub is cached.

If you have a lot of big, complicated queries that take a long time to compile, that can be unpleasant.

And look, I’m not afraid of recompiling plans. In the right place, that can be awesome.

This can also be annoying if you have queries constantly coming in and compiling new plans.

It’s been a while since I’ve seen this scenario cause CPU to hit 100% (or even close to it), but in today’s Unfortunate Cloud Age© any CPU reduction you can make to end up on smaller, less expensive instances can make you look like a Bottom Line Hero™️.

Constant query compilation, and long query compilation, can contribute to that.

The stubs still count towards the total number of plans you can have cached


By default, you have a limit of 160,036 total plans allowed in the cache, and other limits in total size based on available memory.

Plan stubs still contribute towards the total count. Lots of single use stubs can still cause “bloat”.

The questions you have to ask here, are:

  • How much larger are my query plans than the stubs?
  • Is that savings worth it to not have plans in cache available to analyze?
  • Do you capture query plans in a different way? (Query Store, monitoring tool, etc.)

Lots of stubs aren’t necessarily more useful.

When you end up with “the same” unparameterized query running over and over again, you might miss out on aggregate resource usage for it


One way that it can be useful to look at your server’s workload is seeing which queries:

  • Execute the most
  • Use the most total CPU
  • Are attached to missing index requests

A lot of plan cache analysis scripts (even ones I’ve written and worked on) will group by or correlate on SQL Handle for different things.

If you have a different SQL Handle for every execution of “the same” query with different literals, you’ll miss out on them.

You’ll have to look by something else, like Query Hash, Query Plan Hash, or Plan Handle instead.

You can run into this regardless of Optimize For Ad Hoc Workloads being enabled, but a lot of folks out there tend to ignore rows returned by analysis scripts that don’t also have a query plan.

Stubs Ahoy.

The Setting Is Poorly Named


Last but not least: this is a poorly named setting.

I still run into folks who think that enabling this gives the optimizer some special powers for ad hoc queries.

It doesn’t. The only thing it does is cache a plan stub instead of the entire plan on “first execution”.

This isn’t a documentation problem either, this is a not-reading-the-documentation problem.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Getting The Top Value Per Group With Multiple Conditions In SQL Server: Row Number vs. Cross Apply With MAX

Throat Music


In yesterday’s post, we compared a simple situation trying to find the post scoring post for each user.

In today’s post, we’re going to add another condition: we want the highest scoring post for each type of post someone has made.

typos

Now look, most people don’t get involved with any of these things, but whatever. It just poses an interesting and slightly more complicated problem.

Slightly Different Index


Since we’re going to be using PostTypeId in the window function, we need it in the key of our index:

CREATE INDEX p ON dbo.Posts(OwnerUserId, PostTypeId, Score DESC)
WITH(SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);

Now our query looks like this:

SELECT
    u.DisplayName,
    u.Reputation,
    p.PostTypeId,
    p.Score
FROM dbo.Users AS u
JOIN
(
    SELECT
        p.*,
        n = 
            ROW_NUMBER() OVER
            (
                PARTITION BY
                    p.OwnerUserId,
                    p.PostTypeId --This is new!
                ORDER BY
                    p.Score DESC
            )
    FROM dbo.Posts AS p
) AS p
    ON  p.OwnerUserId = u.Id
    AND p.n = 1
WHERE u.Reputation > 50000
ORDER BY 
    u.Reputation DESC,
    p.Score DESC;

S’good? S’good. Let’s go.

Row Number Query Plan


Similar to yesterday’s plan, this one is rather slow, rather serial, and generally not how we want to be spending our precious time.

SQL Server Query Plan
scam

Let’s look at the apply method, because we have to change our query a little bit to accomplish the same thing.

Cross Apply With MAX


Rather than go with TOP (1), we’re going to GROUP BY OwnerUserId and PostTypeId, and get the MAX(Score).

SELECT
    u.DisplayName,
    u.Reputation,
    p.PostTypeId,
    p.Score
FROM dbo.Users AS u
CROSS APPLY
(
    SELECT 
        p.OwnerUserId,
        p.PostTypeId,
        Score = MAX(p.Score)
    FROM dbo.Posts AS p
    WHERE p.OwnerUserId = u.Id
    GROUP BY 
        p.OwnerUserId, 
        p.PostTypeId
) AS p
WHERE u.Reputation > 50000
ORDER BY 
    u.Reputation DESC,
    p.Score DESC;

This will give us the same results, but a lot faster. Again.

Cross Apply Query Plan


Like I was saying…

SQL Server Query Plan
time is extremely valuable

Down to ~400ms now. Not bad, right?

Lower Selectivity


If we take those same queries and lower the reputation filter from 50,000 to 1, some interesting changes to the query plans happen.

SQL Server Query Plan
years ago

Repartition streams and I have had some problems in the past. It’s not necessarily “to blame”, it just has a tough time with some data distributions, especially, it seems, when it’s order preserving.

The cross apply with aggregation works really well. It’s kinda neat that both queries get slower by the same amount of time, but the ROW_NUMBER query is still much, much slower.

All of this is interesting and all, but you know what? We haven’t look at batch mode. Batch mode fixes everything.

Sort of. Don’t quote me on that. It’s just really helpful in the kind of BIG QUERY tuning that I end up doing.

Batch Mode


This is the only thing that makes the ROW_NUMBER query competitive in this scenario, owing to the fact that batch mode often removes Repartition Streams, and we’re eligible for the Window Aggregate operator.

SQL Server Query Plan
further reductions

I’m dumping these results to #temp tables because I don’t want to wait for SSMS to render the large result set, but you can still see the positive overall effect.

The poorly performing ROW_NUMBER query is now very competitive with the CROSS APPLY query.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Getting The Top Value Per Group In SQL Server: Row Number vs. Cross Apply Performance

Introductions


There are many ways to express queries in SQL. How different rewrites perform will largely be a function of:

  • You not doing anything ridiculous
  • Queries having good indexes in place
  • The optimizer not seeing through your tricks and giving you the same query plan

The first rule of rewrites is that they have to produce the same results, of course. Logical equivalency is tough.

In today and tomorrow’s posts I’m going to compare a couple different scenarios to get the top value.

There are additional ways to rewrite queries like this, of course, but I’m going to show you the most common anti-pattern I see, and the most common solution that tends to work better.

Right And Proper Indexing


For today’s post, we’re going to use this index:

CREATE INDEX p ON dbo.Posts(OwnerUserId, Score DESC) INCLUDE(PostTypeId)
WITH(SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);

Which is going to give this query proper support. Sure, we could also add an index to the Users table, but the one scan is trivially fast, and probably not worth it here.

SELECT
    u.DisplayName,
    u.Reputation,
    p.PostTypeId,
    p.Score
FROM dbo.Users AS u
JOIN
(
    SELECT
        p.*,
        n = 
            ROW_NUMBER() OVER
            (
                PARTITION BY
                    p.OwnerUserId
                ORDER BY
                    p.Score DESC
            )
    FROM dbo.Posts AS p
) AS p
    ON  p.OwnerUserId = u.Id
    AND p.n = 1
WHERE u.Reputation > 50000
ORDER BY 
    u.Reputation DESC,
    p.Score DESC;

The general idea is to find all users with a reputation over 50,000, along with their highest scoring post.

I know, you’re looking at this and thinking “jeez Erik, why are you selecting * here? don’t you know how bad and dumb you are for that?”

Well, SQL Server is smart enough to ignore that and only deal with the columns in the outer select.

The Query Plan


If you create the index and run this, the query plan looks something like this:

SQL Server Query Plan
pity

The majority of the time spent in this plan is the ~11 seconds between the scan of the Posts table and the Filter operator.

The filter is there to remove rows where the result of the ROW_NUMBER function are greater than 1.

I guess you could say less than 1, too, but ROW_NUMBER won’t produce rows with 0 or negative numbers naturally. You have to make that happen by subtracting.

A Better Query Pattern?


Since only ~2400 rows are leaving the Users table, and we have a good index on the Posts table, we want to take advantage of it.

Rather than scan the entire Posts table, generate the ROW_NUMBER, apply the filter, then do the join, we can use CROSS APPLY to push things down to where we touch the Posts table.

SELECT
    u.DisplayName,
    u.Reputation,
    p.PostTypeId,
    p.Score
FROM dbo.Users AS u
CROSS APPLY
(
    SELECT TOP (1)
        p.*
    FROM dbo.Posts AS p
    WHERE p.OwnerUserId = u.Id
    ORDER BY p.Score DESC
) AS p
WHERE u.Reputation > 50000
ORDER BY 
    u.Reputation DESC,
    p.Score DESC;

This is logically equivalent, but it blows the other query out of the water, performance-wise.

A Better Query Plan?


Here’s the query plan for the cross apply query:

SQL Server Query Plan
nice nice

Why Is This better?


In this case, having a good index to use, and a small outer result from the Users table, the cross apply query is way better.

This is also due to the Id column of Users being the Primary Key of the table. For this sort of one to many join, it works beautifully. If it were a many to many scenario, it could be a toss up, or ROW_NUMBER could blow it out of the water.

The way this type of Nested Loops Join works (Apply Nested Loops), is to take each row from the outer input (Users table) and seek to it in the Posts table.

Without that good index up here, this would likely be a disaster with an Eager Index Spool in the plan. We definitely don’t want that, here.

But you know, there are many different types of posts. We might want to know someone’s high score for questions, answers, and more.

In tomorrow’s post, we’ll look at how to do that, and performance tune the query.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

MAXDOP: The T-Shirt

Buttery Biscuits


In yesterday’s post, I showed you the promo video and abstract for my SQLBits precon.

In today’s post, I’m gonna offer you a little bribery to show up.

First, all attendees will get one of these lovely T-Shirts. I have options for MAXDOP 8 and 0 available. Because those are the only right answers. Ha ha ha.

I’m also throwing in free access to my SQL Server training library, which includes a video recap of the precon material.

See you there!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Going To SQLBits? Come To My Precon: The Professional Performance Tuning Blueprint!

Learn-A-Lot


Check out my Events page for the full details, and of course head over to SQLBits.com to register.

 

Searching the internet for every problem isn’t cutting it. You need to be more proactive and efficient when it comes to finding and solving database performance fires.

I work with consulting customers around the world to put out SQL Server performance fires. In this day of learning, I will teach you how to find and fix your worst SQL Server problems using the same modern tools and techniques which I use every week.

You’ll learn tons of new and effective approaches to common performance problems, how to figure out what’s going on in your query plans, and how indexes really work to make your queries faster. Together, we’ll tackle query rewrites, batch mode, how to design indexes, and how to gather all the information you need to analyze performance.

This day of learning will teach you cutting edge techniques which you can’t find in training by folks who don’t spend time in the real world tuning performance. Performance tuning mysteries can easily leave you stumbling through your work week, unsure if you’re focusing on the right things. You’ll walk out of this class confident in your abilities to fix performance issues once and for all.

If you want to put out SQL Server performance fires, this is the precon you need to attend. Anyone can have a plan, it takes a professional to have a blueprint.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Software Vendor Mistakes With SQL Server: Wrap Up – For Now!

Finger On It


All good things must come to an end. When I started this series, I wasn’t entirely sure where that would be.

Turns out, seven weeks about covers all the major stuff I see regularly when working with clients.

And this is just the code and indexes; it doesn’t even start on the SQL Server settings, maintenance practices, table design, and hardware configurations I see getting in the way.

Perhaps there is a part 2, but I’m going to let this stand for now.

Happy clicking! Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.