The Fastest Way To Get The Highest Value In SQL Server Part 2

Whistle Whistle


In yesterday’s post, we looked at four different ways to get the highest value per use with no helpful indexes.

Today, we’re going to look at how those same four plans change with an index.

This is what we’ll use:

CREATE INDEX ix_whatever
    ON dbo.Posts(OwnerUserId, Score DESC);

Query #1

This is our MAX query! It does really well with the index.

SELECT u.Id,
       u.DisplayName,
	   u.Reputation,
	   ca.Score
FROM dbo.Users AS u
CROSS APPLY
(
    SELECT MAX(Score) AS Score
	FROM dbo.Posts AS p
	WHERE p.OwnerUserId = u.Id

) AS ca
WHERE u.Reputation >= 100000
ORDER BY u.Id;

It’s down to just half a second.

SQL Server Query Plan
Phoney

Query #2

This is our TOP 1 query with an ORDER BY.

SELECT u.Id,
       u.DisplayName,
	   u.Reputation,
	   ca.Score
FROM dbo.Users AS u
CROSS APPLY
(
    SELECT TOP (1) p.Score
	FROM dbo.Posts AS p
	WHERE p.OwnerUserId = u.Id
	ORDER BY p.Score DESC

) AS ca
WHERE u.Reputation >= 100000
ORDER BY u.Id;
SQL Server Query Plan
Aliveness

This finished about 100ms faster than MAX in this run, but it gets the same plan.

Who knows, maybe Windows Update ran during the first query.

Query #3

This is our first attempt at row number, and… it’s not so hot.

SELECT u.Id,
       u.DisplayName,
	   u.Reputation,
	   ca.Score
FROM dbo.Users AS u
CROSS APPLY
(
    SELECT p.Score,
	       ROW_NUMBER() OVER (ORDER BY p.Score DESC) AS n
	FROM dbo.Posts AS p
	WHERE p.OwnerUserId = u.Id
) AS ca
WHERE u.Reputation >= 100000
AND ca.n = 1
ORDER BY u.Id;

While the other plans were able to finish quickly without going parallel, this one does go parallel, and is still about 200ms slower.

SQL Server Query Plan
Bad Bed

Query #4

Is our complicated cross apply. The plan is simple, but drags on for almost 13 seconds now.

SELECT u.Id,
       u.DisplayName,
	   u.Reputation,
	   ca.Score
FROM dbo.Users AS u
CROSS APPLY
(
    SELECT * 
	FROM 
	(
        SELECT p.OwnerUserId,
	           p.Score,
	           ROW_NUMBER() OVER (PARTITION BY p.OwnerUserId 
			                      ORDER BY p.Score DESC) AS n
	    FROM dbo.Posts AS p
	) AS p
	WHERE p.OwnerUserId = u.Id
	AND p.n = 1
) AS ca
WHERE u.Reputation >= 100000
ORDER BY u.Id;
SQL Server Query Plan
Wrong One

Slip On


In this round, row number had a tougher time than other ways to express the logic.

It just goes to show you, not every query is created equal in the eyes of the optimizer.

Now, initially I was going to do a post with the index columns reversed to (Score DESC, OwnerUserId), but it was all bad.

Instead, I’m going to do future me a favor and look at how things change in SQL Server 2019.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

The Fastest Way To Get The Highest Value In SQL Server Part 1

Expresso


Let’s say you wanna get the highest thing. That’s easy enough as a concept.

Now let’s say you need to get the highest thing per user. That’s also easy enough to visualize.

There are a bunch of different ways to choose from to write it.

In this post, we’re going to use four ways I could think of pretty quickly, and look at how they run.

The catch for this post is that we don’t have any very helpful indexes. In other posts, we’ll look at different index strategies.

Query #1

To make things equal, I’m using CROSS APPLY in all of them.

The optimizer is free to choose how to interpret this, so WHATEVER.

SELECT u.Id,
       u.DisplayName,
	   u.Reputation,
	   ca.Score
FROM dbo.Users AS u
CROSS APPLY
(
    SELECT MAX(Score) AS Score
	FROM dbo.Posts AS p
	WHERE p.OwnerUserId = u.Id

) AS ca
WHERE u.Reputation >= 100000
ORDER BY u.Id;

The query plan is simple enough, and it runs for ~17 seconds.

SQL Server Query Plan
Big hitter

Query #2

This uses TOP 1.

SELECT u.Id,
       u.DisplayName,
	   u.Reputation,
	   ca.Score
FROM dbo.Users AS u
CROSS APPLY
(
    SELECT TOP (1) p.Score
	FROM dbo.Posts AS p
	WHERE p.OwnerUserId = u.Id
	ORDER BY p.Score DESC

) AS ca
WHERE u.Reputation >= 100000
ORDER BY u.Id;

The plan for this is also simple, but runs for 1:42, and has one of those index spool things in it.

SQL Server Query Plan
Unlucky

Query #3

This query uses row number rather than top 1, but has almost the same plan and time as above.

SELECT u.Id,
       u.DisplayName,
	   u.Reputation,
	   ca.Score
FROM dbo.Users AS u
CROSS APPLY
(
    SELECT p.Score,
	       ROW_NUMBER() OVER (ORDER BY p.Score DESC) AS n
	FROM dbo.Posts AS p
	WHERE p.OwnerUserId = u.Id
) AS ca
WHERE u.Reputation >= 100000
AND ca.n = 1
ORDER BY u.Id;
SQL Server Query Plan
Why send me silly notes?

Query #4

Also uses row number, but the syntax is a bit more complicated.

The row number happens in a derived table inside the cross apply, with the correlation and filtering done outside.

SELECT u.Id,
       u.DisplayName,
	   u.Reputation,
	   ca.Score
FROM dbo.Users AS u
CROSS APPLY
(
    SELECT * 
	FROM 
	(
        SELECT p.OwnerUserId,
	           p.Score,
	           ROW_NUMBER() OVER (PARTITION BY p.OwnerUserId 
			                      ORDER BY p.Score DESC) AS n
	    FROM dbo.Posts AS p
	) AS p
	WHERE p.OwnerUserId = u.Id
	AND p.n = 1
) AS ca
WHERE u.Reputation >= 100000
ORDER BY u.Id;

This is as close to competitive with Query #1 as we get, at only 36 seconds.

SQL Server Query Plan
That’s a lot of writing.

Wrap Up


If you don’t have helpful indexes, the MAX pattern looks to be the best.

Granted, there may be differences depending on how selective data in the table you’re aggregating is.

But the bottom line is that in that plan, SQL Server doesn’t have to Sort any data, and is able to take advantage of a couple aggregations (partial and full).

It also doesn’t spend any time building an index to help that one.

In the next couple posts, we’ll look at different ways to index for queries like this.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

How Query Complexity Hurts SQL Server Performance

Fingerless


I get why these things happen. You’re the <new person> somewhere, and someone asks for you to add something to a report, or something

You look at the original query, and it’s like 1000 lines long.

There’s dozens of joins, and a half-mile where clause full of ands and ors.

There’s no way you’re messing with that. You just tack your left join on and walk away.

Fine.

Everything’s Eventual


Don’t get me wrong. Though some combination of skill, luck, hardware, or size of data, this might work for a while.

SQL Server might even help you out with a parallel plan. They’re sort of the Great Equalizer™ for performance.

Optimizer thinks this is gonna be a doozy? Have some more CPU!

Be my guest. They’re free, right?

Eventually, though, this will get slower and slower.

This is usually about the time someone gives me a call.

Chewy and Chompy


See, when a query is big and complicated to you, there’s a pretty good chance you’re gonna get a big and complicated query plan, because it’s big and complicated to the optimizer, too.

This isn’t to say the optimizer is dumb or bad or ugly; it’s just that there’s only so long it’s willing to spend coming up with a plan.

Remember, cheap plan fast. Not perfect, not great, maybe good enough.

Cheap and fast.

Even worse, the bigger a query plan is, the less likely it is to be helpful to analyze.

Costs get so spread out, it’s hard to focus on what might make a difference.

Hatchet Act


When I have to tune a query like this, there’s some stuff I’ll try out first to get a feel for what’s going on, but ultimately your best friend is breaking things up.

The optimizer is just like you and me. The more chances and choices we have, the more likely we are to screw one up.

Really big queries usually have some logical stopping points, that you might wanna try materializing by sticking them in a #temp table.

  • CTEs
  • Derived tables
  • Subqueries
  • UNION/UNION ALL
  • Initial Inner Joins

The last point there might be a little unclear. I mean that usually your query starts off with some inner joins, then people start tacking left joins on.

If you grab the most restrictive stuff first, that’s sometimes a good starting place.

But really, all of those things are valid. It’s easier to tune a bunch of small queries than one big query.

The Hounds Of Hinterville


This is also where I’m a big fan of hints — not because I want them to stay, but because I wanna see how the plan changes. 

Join and aggregate hints, recompile, trying to force a parallel plan, FAST 1, etc. are all valid experiments to see if there’s something the optimizer isn’t figuring out on its own.

Figuring out why is harder, but hey, the only way to get good at that is to keep tuning.

Hints are great to learn from, and sometimes the only way to get the plan you want.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Live SQL Server Q&A!

ICYMI


Last week’s thrilling, stunning, flawless episode of whatever-you-wanna-call-it.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

See you there!

How OPTIMIZE FOR SEQUENTIAL KEY Improves Highly Concurrent Insert Performance

Everyone Loves A SQL



Thanks for watching!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

OPTIMIZE FOR SEQUENTIAL KEY In SQL Server 2019

OPTIMIZE FOR YOUR MOM



Thanks for watching!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

How SQL Server’s Missing Index Requests Can Hurt Performance

DON’T THROW EGGS



Thanks for watching!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Live SQL Server Q&A!

ICYMI


Last week’s thrilling, stunning, flawless episode of whatever-you-wanna-call-it.

Thanks for watching!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

SQL Server 2019: If You Want Adaptive Joins, You Need Wider Indexes

What You Make Of It


We’ve got this query. Handsome devil of a query.

You can pretend it’s in a stored procedure, and that the date filter is a parameter if you want.

SELECT u.DisplayName, p.Score
FROM dbo.Users AS u
JOIN dbo.Posts AS p
ON p.OwnerUserId = u.Id
WHERE u.CreationDate >= '20131201';

A long time ago, when we migrated Stack Overflow from Access to SQL Server 2000, we created indexes.

This one has worked alright.

CREATE INDEX ix_whatever 
    ON dbo.Posts(OwnerUserId);

But Now We’re On 2019


And we’ve, like, read a lot about Adaptive Joins, and we think this’ll be cool to see in action.

Unfortunately, our query doesn’t seem to qualify.

SQL Server Query Plan
Shame shame shame

Now, there’s an Extended Event that… Used to work.

These days it just stares blankly at me. But since I’ve worked with this before, I know the problem.

It’s that Key Lookup — I’ll explain more in a minute.

Index Upgrade


First, let’s get rid of the Lookup so we can see the Adaptive Join happen.

CREATE INDEX ix_adaptathy 
    ON dbo.Posts(OwnerUserId, Score);
SQL Server Query Plan
New new new

As We Proceed


Let’s think about what Adaptive Joins need:

  • An index on the column(s) you’re joining

This gives us a realistic choice between using a Nested Loops join to do efficient Seeks, or an easy scan for a Hash Join.

  • That index has to cover the query

Without a covering index, there’s too much for the optimizer to think about.

It’s not just making a choice between a Nested Loops or Hash Join, it’s also factoring in the cost of a Lookup.

This used to trigger the XE on eajsrUnMatchedOuter, meaning the outer table didn’t have an index that matched the query.

Why Revisit This?


When SQL Server 2019 comes out, people are gonna have really high hopes for their workloads automagickally getting faster.

While there are lots of things that it’ll likely help, it’s going to take a lot of work on your part to make sure your queries and indexes allow for the automagick to kick in.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.