CPU & RAM Don’t Lie: Query Metrics I Care About For Tuning

Discarded


There are metrics that I care and don’t care about when I’m looking for queries to tune.

Metrics I don’t care about:

  • Logical Reads
  • Costs

If a query does “a lot” of reads or has a high “cost”, I generally don’t care as long as they run quickly. Doing consistent physical reads is a slightly different story, but would probably fall more under server tuning or fixing memory grants.

Metrics I do care about:

  • CPU (taking parallelism into consideration)
  • Duration (compared to CPU)
  • Memory Grants (particularly when they’re not being fully utilized)
  • Writes (especially if it’s just a select)
  • Executions (mostly to track down scalar UDFs)

CPU and Duration


These two metrics get lumped together because they need to be compared in order to figure out what’s going on. First, you need to figure out what the minimum runtime of a query is that you want to tune.

In general, as query execution time gets faster, getting it to be much faster gets more difficult.

  • Bringing a query from 1 second to 100 milliseconds might be a small matter
  • Bringing that same query from 100 milliseconds to 1 millisecond might take more time than it’s worth

I say that because unless someone is querying SQL Server directly, smaller durations tend to be less detectable to end users. By the time they hit a button, send the request, receive the data, and have the application render it etc. they’re probably not aware of a 99 millisecond difference.

Of course, not everything is end-user centric. Other internal operations, especially any loop processing, might benefit greatly from reductions on the smaller side of things.

  • If duration and CPU are acceptable, leave it alone
  • If either is unacceptable, tune the darn thing
  • If CPU is much higher than duration, you have a parallel plan, and tuning is optional
  • If duration is much higher than CPU, you have blocking or another contention issue, and the query you’re looking at probably isn’t the problem
  • If duration and CPU are roughly equivalent, you either have a functional serial plan or a really crappy parallel plan

I give these the highest priority because reducing these is what makes queries faster, and reduces the surface area (execution time) of a query where something crappy might happen, like blocking, or deadlocks, or other resource contention.

Memory Grants


Using these as a tuning metric can have a lot of positive effects, depending on what kind of shape the system is in.

Consider a few scenarios:

  • PAGEIOLATCH_XX waits are high because large memory grants steal significant buffer pool space
  • RESOURCE_SEMAPHORE waits are high because queries suck up available memory space and prevent other queries from using it
  • Queries are getting too low of a memory grant and spilling significantly, which can slow them down and cause tempdb contention under high concurrency

Fixing memory grant issues can take many forms:

  • Getting better cardinality estimates for better overall grant estimates
  • Indexing to influence operator choices away from memory consumers
  • Using more appropriate string lengths to reduce memory grants
  • Fixing parallel skew issues that leaves some threads with inadequate memory
  • Rewriting the query to not ask for ordered data
  • Rewriting the query to ask for ordered data in smaller chunks
  • Rewriting the query to convert strings to better fitting byte lengths

That’s just some stuff I end up doing off the top of my head. There are probably more, but blog posts are only useful up to a certain length.

Like all other strings.

Writes and Selects


Modification queries are going to do writes. This seems intuitive and not at all shocking. If you have queries that are doing particularly large modifications, you could certainly look into tuning those, but it would be a standard exercise in query or index tuning.

Except that your index tuning adventure would most likely lead you to dropping unused and overlapping indexes to reduce the number of objects that you need to write to than to add an index.

But who knows. Boring anyway. I hear indexes tune themselves in the cloud.

When select queries do a large number of writes, then we’re talking about a much more interesting scenario.

  • Spills
  • Spools
  • Stats updates

Of course, stats updates are likely a pretty small write, but the read portion can certainly halt plan compilation for a good but on big tables.

Spills and Spools are going to be the real target here. If it’s a spill, you may find yourself tracking back to the memory grant section up above.

Spools, though! What interesting little creatures. I wrote a longer post about them here:

Understand Your Plan: Operators That Write Data (Spools, Spools, Spools)

It has a bit of a link roundup of other posts on my site and others that talk about them, too.

But since we’re living in this now, let’s try to be present. Here’s the short story on spools that we might try to fix:

  • The Spools we typically care about are Table or Index
  • They can be eager or lazy
  • They’ll show up on the inner side of Nested Loops
  • SQL Server uses them as a temporary cache for data
  • They are a good indicator that something is amok with your query or indexes

For eager index spools, the story is pretty simple around creating a better index for SQL Server to use.

For lazy table spools, you have more options:

  • Give SQL Server unique data to work with
  • Get the optimizer to not choose nested loops
  • Use the NO_PERFORMANCE_SPOOL hint to test the query without spools

Of course, there are times where you’re better off with a spool than without. So don’t walk away feeling disheartened if that’s the case.

Executions


These are on the opposite end of the spectrum from most of the queries I go after. If a query runs enough, and fast enough, to truly rack up a high number of executions, there’s probably not a ton of tuning you could do.

Sure, sometimes there’s an index you could add or a better predicate you could write, but I’d consider it more beneficial to get the query to not run so much.

That might result in:

  • Rewriting functions as inline table valued functions
  • Handing the queries off to app developers for caching

To learn how I rewrite functions, check out this video

I know, you can’t rewrite every single function like this, but it’s a wonderful thing to do when you can.

Anything Other Than


Again, metrics I don’t ever look at are logical reads or costs.

  • Doing reads doesn’t necessarily mean that queries are slow, or that there’s anything you can fix
  • Costs are a meme metric that should be removed from query plans in favor of operator times

Well, okay, maybe not completely removed, but they shouldn’t be front and center anymore.

There are many other more reliable metrics to consider that are also far more interesting.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Getting Parameter Values From A SQL Server Query Plan For Performance Tuning

Property Management


I’m a really big fan of using operator properties for a lot of things, at least visually. Where things sort of fall down for that is copying and pasting things out.

For some stuff, you still need to head down to the XML.

Let’s say you have a stored procedure that accepts a bunch of parameters. The rest of this one isn’t important, but here you go:

CREATE OR ALTER PROCEDURE 
    dbo.AwesomeSearchProcedure 
( 
    @OwnerUserId int = NULL, 
    @CreationDate datetime = NULL, 
    @LastActivityDate datetime = NULL,
    @PostTypeId int = NULL,
    @Score int = NULL,
    @Title nvarchar(250) = NULL, 
    @Body nvarchar(MAX) = NULL 
)

A Plan Appears


Let’s say we grab a query plan for this thing from the plan cache or query store. We can get the properties of the select operator and see compile time values:

EXEC dbo.AwesomeSearchProcedure 
    @OwnerUserId = 35004,
    @CreationDate = '20130101', 
    @LastActivityDate = '20140101',
    @Title = N'SQL Server';

We get this back:

SQL Server Query Plan
visio

Again — nice visually — but it doesn’t do much for us if we want to recreate executing the stored procedure to get an actual execution plan.

It’s also not terrible helpful if we want to simulate a parameter sniffing situation, because we only have the compile time values, not the run time values.

Bummer. But whatever.

XML Time!


If we right click and select “show execution plan XML”, we can scroll way down to the bottom to find the XML fragment that holds what the properties display:

<ParameterList>
  <ColumnReference Column="@iTitle" ParameterDataType="nvarchar(250)" ParameterCompiledValue="N'SQL Server'" />
  <ColumnReference Column="@iLastActivityDate" ParameterDataType="datetime" ParameterCompiledValue="'2014-01-01 00:00:00.000'" />
  <ColumnReference Column="@iCreationDate" ParameterDataType="datetime" ParameterCompiledValue="'2013-01-01 00:00:00.000'" />
  <ColumnReference Column="@iOwnerUserId" ParameterDataType="int" ParameterCompiledValue="(35004)" />
</ParameterList>

This still isn’t awesome, because we have to do some surgery on the XML itself to get values out.

It’s even worse if we have a parameterized application query, because not only do we need to make a DECLARE to assign values to these variables but we need to turn the query itself into dynamic SQL.

If we don’t do that, we’ll fall victim to a common pitfall: testing queries with local variables.

Passwords


For most things, I absolutely adore using operator properties. For some things, you still need the XML.

It’d be nice if there were some fancy copy and paste magic that would do that for you, but so far it doesn’t exist.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Getting The Top Value Per Group With Multiple Conditions In SQL Server: Row Number vs. Cross Apply With MAX

Throat Music


In yesterday’s post, we compared a simple situation trying to find the post scoring post for each user.

In today’s post, we’re going to add another condition: we want the highest scoring post for each type of post someone has made.

typos

Now look, most people don’t get involved with any of these things, but whatever. It just poses an interesting and slightly more complicated problem.

Slightly Different Index


Since we’re going to be using PostTypeId in the window function, we need it in the key of our index:

CREATE INDEX p ON dbo.Posts(OwnerUserId, PostTypeId, Score DESC)
WITH(SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);

Now our query looks like this:

SELECT
    u.DisplayName,
    u.Reputation,
    p.PostTypeId,
    p.Score
FROM dbo.Users AS u
JOIN
(
    SELECT
        p.*,
        n = 
            ROW_NUMBER() OVER
            (
                PARTITION BY
                    p.OwnerUserId,
                    p.PostTypeId --This is new!
                ORDER BY
                    p.Score DESC
            )
    FROM dbo.Posts AS p
) AS p
    ON  p.OwnerUserId = u.Id
    AND p.n = 1
WHERE u.Reputation > 50000
ORDER BY 
    u.Reputation DESC,
    p.Score DESC;

S’good? S’good. Let’s go.

Row Number Query Plan


Similar to yesterday’s plan, this one is rather slow, rather serial, and generally not how we want to be spending our precious time.

SQL Server Query Plan
scam

Let’s look at the apply method, because we have to change our query a little bit to accomplish the same thing.

Cross Apply With MAX


Rather than go with TOP (1), we’re going to GROUP BY OwnerUserId and PostTypeId, and get the MAX(Score).

SELECT
    u.DisplayName,
    u.Reputation,
    p.PostTypeId,
    p.Score
FROM dbo.Users AS u
CROSS APPLY
(
    SELECT 
        p.OwnerUserId,
        p.PostTypeId,
        Score = MAX(p.Score)
    FROM dbo.Posts AS p
    WHERE p.OwnerUserId = u.Id
    GROUP BY 
        p.OwnerUserId, 
        p.PostTypeId
) AS p
WHERE u.Reputation > 50000
ORDER BY 
    u.Reputation DESC,
    p.Score DESC;

This will give us the same results, but a lot faster. Again.

Cross Apply Query Plan


Like I was saying…

SQL Server Query Plan
time is extremely valuable

Down to ~400ms now. Not bad, right?

Lower Selectivity


If we take those same queries and lower the reputation filter from 50,000 to 1, some interesting changes to the query plans happen.

SQL Server Query Plan
years ago

Repartition streams and I have had some problems in the past. It’s not necessarily “to blame”, it just has a tough time with some data distributions, especially, it seems, when it’s order preserving.

The cross apply with aggregation works really well. It’s kinda neat that both queries get slower by the same amount of time, but the ROW_NUMBER query is still much, much slower.

All of this is interesting and all, but you know what? We haven’t look at batch mode. Batch mode fixes everything.

Sort of. Don’t quote me on that. It’s just really helpful in the kind of BIG QUERY tuning that I end up doing.

Batch Mode


This is the only thing that makes the ROW_NUMBER query competitive in this scenario, owing to the fact that batch mode often removes Repartition Streams, and we’re eligible for the Window Aggregate operator.

SQL Server Query Plan
further reductions

I’m dumping these results to #temp tables because I don’t want to wait for SSMS to render the large result set, but you can still see the positive overall effect.

The poorly performing ROW_NUMBER query is now very competitive with the CROSS APPLY query.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Getting The Top Value Per Group In SQL Server: Row Number vs. Cross Apply Performance

Introductions


There are many ways to express queries in SQL. How different rewrites perform will largely be a function of:

  • You not doing anything ridiculous
  • Queries having good indexes in place
  • The optimizer not seeing through your tricks and giving you the same query plan

The first rule of rewrites is that they have to produce the same results, of course. Logical equivalency is tough.

In today and tomorrow’s posts I’m going to compare a couple different scenarios to get the top value.

There are additional ways to rewrite queries like this, of course, but I’m going to show you the most common anti-pattern I see, and the most common solution that tends to work better.

Right And Proper Indexing


For today’s post, we’re going to use this index:

CREATE INDEX p ON dbo.Posts(OwnerUserId, Score DESC) INCLUDE(PostTypeId)
WITH(SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);

Which is going to give this query proper support. Sure, we could also add an index to the Users table, but the one scan is trivially fast, and probably not worth it here.

SELECT
    u.DisplayName,
    u.Reputation,
    p.PostTypeId,
    p.Score
FROM dbo.Users AS u
JOIN
(
    SELECT
        p.*,
        n = 
            ROW_NUMBER() OVER
            (
                PARTITION BY
                    p.OwnerUserId
                ORDER BY
                    p.Score DESC
            )
    FROM dbo.Posts AS p
) AS p
    ON  p.OwnerUserId = u.Id
    AND p.n = 1
WHERE u.Reputation > 50000
ORDER BY 
    u.Reputation DESC,
    p.Score DESC;

The general idea is to find all users with a reputation over 50,000, along with their highest scoring post.

I know, you’re looking at this and thinking “jeez Erik, why are you selecting * here? don’t you know how bad and dumb you are for that?”

Well, SQL Server is smart enough to ignore that and only deal with the columns in the outer select.

The Query Plan


If you create the index and run this, the query plan looks something like this:

SQL Server Query Plan
pity

The majority of the time spent in this plan is the ~11 seconds between the scan of the Posts table and the Filter operator.

The filter is there to remove rows where the result of the ROW_NUMBER function are greater than 1.

I guess you could say less than 1, too, but ROW_NUMBER won’t produce rows with 0 or negative numbers naturally. You have to make that happen by subtracting.

A Better Query Pattern?


Since only ~2400 rows are leaving the Users table, and we have a good index on the Posts table, we want to take advantage of it.

Rather than scan the entire Posts table, generate the ROW_NUMBER, apply the filter, then do the join, we can use CROSS APPLY to push things down to where we touch the Posts table.

SELECT
    u.DisplayName,
    u.Reputation,
    p.PostTypeId,
    p.Score
FROM dbo.Users AS u
CROSS APPLY
(
    SELECT TOP (1)
        p.*
    FROM dbo.Posts AS p
    WHERE p.OwnerUserId = u.Id
    ORDER BY p.Score DESC
) AS p
WHERE u.Reputation > 50000
ORDER BY 
    u.Reputation DESC,
    p.Score DESC;

This is logically equivalent, but it blows the other query out of the water, performance-wise.

A Better Query Plan?


Here’s the query plan for the cross apply query:

SQL Server Query Plan
nice nice

Why Is This better?


In this case, having a good index to use, and a small outer result from the Users table, the cross apply query is way better.

This is also due to the Id column of Users being the Primary Key of the table. For this sort of one to many join, it works beautifully. If it were a many to many scenario, it could be a toss up, or ROW_NUMBER could blow it out of the water.

The way this type of Nested Loops Join works (Apply Nested Loops), is to take each row from the outer input (Users table) and seek to it in the Posts table.

Without that good index up here, this would likely be a disaster with an Eager Index Spool in the plan. We definitely don’t want that, here.

But you know, there are many different types of posts. We might want to know someone’s high score for questions, answers, and more.

In tomorrow’s post, we’ll look at how to do that, and performance tune the query.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Software Vendor Mistakes With SQL Server: Dealing With Bad Parameter Sniffing

Bad Wrap


When people hear the words “parameter sniffing”, there’s almost a universally bad reaction. They lose their minds and start raving about how to “fix” it.

  • Recompiling everything
  • Using optimize for unknown hints
  • Using local variables
  • Clearing the plan cache

In today’s post, I’m gonna show you two videos from my paid training:

  • Intro to parameter sniffing
  • Parameter sniffing recap

Those explain why parameter sniffing is so tough to deal with, and why all the stuff up there in that list isn’t really the greatest idea.

There’s a whole bunch of stuff in between those two videos where I’ll teach you specifics about fixing parameter sniffing problems.

If that’s the content you’re after, hit the link at the very end of the post for 75% off my entire training catalog.

Intro To Parameter Sniffing


Parameter Sniffing Recap


Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Software Vendor Mistakes With SQL Server: Writing And Optimizing Paging Queries

Paging Doctor Darling


Words I’ll never hear, at least in reference to me. I’m sure there’s a Darling out there who is a doctor. A real doctor, too. Not one of those “I went to college for 7 extra years” doctors who don’t know anything about the human body.

But I digress! Today we’re gonna learn about paging queries. Why? Because I see people screw them up all the gosh darn ding-a-ling time.

Far and away, I think the worst problem starts at the beginning — the SELECT list — where people try to get every single possible column. There’s a much better query pattern available to you, if you’re open to writing a little bit more code.

Unfortunately, I haven’t seen any ORMs able to handle this method natively. Hard to believe developers developed the same performance-killing method in C# that they did in T-SQL.

Okay, maybe not so hard to believe. But follow along to see a much better way of doing it in the video below. It’s just a small part of my paid training, and if you like what you see there’s a link at the end of the post for 75% off the whole package.

Optimizing Paging Queries Video


Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Software Vendor Mistakes With SQL Server: Handling Optional Parameters

Optionally Yours


You’ve got tables. So many tables And columns. So many columns.

Users — demanding as they are — might wanna see, search, and order by all sorts of things in those many columns in those many tables. Writing a query to do that is hard at first, especially if you’re afraid of dynamic SQL.

Over the years, developers have come up with all sorts of “workarounds” for handling optional search parameters. The problem is that none of them work, whether it’s:

  • column = @parameter or @parameter is null
  • column = isnull(@parameter, column)
  • column = coalesce(@parameter, column, ‘magic’)

Or any variation thereof. Doing this can (and will!) screw up query performance in all sorts of ways that sneak up on you.

  • Bad cardinality estimates
  • Scanning instead of Seeking in indexes
  • Using the “wrong” indexes

In the video below, I’ll show you how to use dynamic SQL the right way to handle optional parameter search scenarios. This video is a small part of my paid training. If you like what you see, there’s a link for 75% off the entire package at the bottom of the post.

Optional Parameters Video


Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Software Vendor Mistakes With SQL Server: IF Branching In Stored Procedures

What Goes Wrong


I still see people calling into this trap when writing stored procedures. What no one seems to realize until it’s too late data grows past a toy data base size, is that SQL Server’s query optimizer doesn’t respect IF branches as boundaries in the way you’d expect.

Sure, the control-flow logic boundaries are respected, but the big surprise to most people comes from how the optimizer treats query plans in IF branches: It compiles them all no matter what.

In the video below, I’ll show you exactly what that means, and how it can really screw up performance. This video is a small part of my paid training offering, and if you enjoy it, there’s a link at the end of the post to get 75% off.

IF Branching Video!


Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Software Vendor Mistakes With SQL Server: Using Left Joins To Find Rows That Don’t Exist

Natural Language


In SQL Server’s T-SQL, and in the more general ANSI-Standard SQL, you’re supposed to write queries in a way that mimics how you’d ask the same question — just don’t call it a query — in English. Because of that, there are some ways to phrase a query that are more natural than others.

Some are more intuitive once you get them down, and others can bewitch you for decades. For example, I’m still not always awesome at reasoning out INTERSECT and EXCEPT queries, or even thinking of them first when writing a query where they’d be useful.

Maybe someday.

Dirty Pig Latin


Bad advice exists in many places in the world. I don’t mean to single out the internet, though it certainly has made the publishing and proliferation of bad advice much more accessible.

I do a lot of reading about databases in general, and SQL Server specifically, to see what other folks are out there writing about and teaching others. One of my favorite arrondissements of that world is the stuff that attracts beginners.

After all, that’s when you develop habits, good or bad. Much of this series focuses on the bad habits learned in that time, and how they muck up performance.

One of those bad habits I see over and over again is using LEFT JOINs to find rows that don’t exist. The reason I focus on these is because of the effect they have on query plans, due to the way that queries have to be logically processed.

Let’s look at that, first.

Who’s Not On First?


A quite common looking query for doing this (if you want to do it wrong) would look something like this:

SELECT
   c = COUNT_BIG(*)
FROM dbo.Users AS U
LEFT JOIN dbo.Comments AS C
    ON  C.UserId = U.Id
    AND C.Score > 0
WHERE C.Id IS NULL;

The important part of the query plan is right around here:

SQL Server Query Plan
slug

If you’re looking extra closely, without any leading or prompting by me whatsoever, you’ll notice that after the join operation bring the two tables we’re querying togethers — Users and Comments — which is expressed as a left outer join of course, then and only then do we filter out rows where the Id column in Comments is NULL.

The problem is that all this is after the join, and in some scenarios this is far less efficient. Both from the perspective that you have to join many more matching rows together, and from the perspective that the optimizer can sometimes have a weird time ordering outer joins, especially when there are a lot of them.

Note that, for various reasons, this query runs for around 4 seconds total.

Who Doesn’t Exist On First?


An often better way of expressing this sort of query is using the — wait for it — natural expression of the Structured Query Language.

SELECT
   c = COUNT_BIG(*)
FROM dbo.Users AS U
WHERE NOT EXISTS
      (
          SELECT
              1/0
          FROM dbo.Comments AS C
          WHERE  C.UserId = U.Id
          AND    C.Score > 0
      );

This query is logically equivalent to the last one. It may even be arguably better looking. More elegant, as the developers say when they want to describe well-formatted code that no one really understands.

The query plan looks like this now:

SQL Server Query Plan
but better

Now we get this crazy little thing called an Anti Semi Join. That means rows are filtered out at the join rather than at some point later on in an explicit Filter operator.

To highlight things a little further, look at the actual number of rows that pass through the filter in the original query and the join in the second query:

SQL Server Query Plan
a number!

See there? The filter in the LEFT JOIN query reduces he working row set to the same number as the NOT EXISTS query does at the join.

In most cases, you’re better off writing queries this way. It may depend a bit on available indexes, batch mode, and server settings. But writing a blog post that takes every single one of those things into account would likely leave you bored out of your gourd.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Software Vendor Mistakes With SQL Server: Joining On OR Conditions

Choose One Once


Everyone has that person in their life. I’m not gonna call them a “friend”, but they might be. Whoever they are, you know them by their indecisiveness. Because when you interact with them, you know you’re gonna have a whole bunch of decisions to make before you come up with a plan.

Those decisions are usually proposed to you in a series of “or we could” statements that you barely have time to assess before the next one hits. It makes life difficult.

Don’t be that person to SQL Server’s optimizer. I mean, don’t be that person in general, but especially don’t be that person to an expensive piece of software that makes business critical transactions and decisions possible for you.

Be Kind, Rewind


Remember a couple things that have been mentioned at various points in this series:

  • Anything that makes your job easier makes the optimizer’s job harder
  • Query data the way it’s stored, and store data the way you query it

What’s a real shame is that most of the things that fall into the first category involve you typing fewer characters once. Isn’t that a funny thing? Normally you type a bunch of code once, and after that it’s a matter of executing it.

Sometimes typing more once really pays off.

Let’s look at an example of that!

Orrible


In this query, we’re just going to use one OR predicate in a join, and as the saying goes, “this is Jackass”.

SELECT
   c = COUNT_BIG(*)
FROM dbo.Users AS U
JOIN dbo.Comments AS C
    ON  C.UserId = U.Id
    OR  C.UserId = U.AccountId;

Since I’d like to keep your attention here, I’m just gonna show you the end result query plan, and how long it takes.

You’re not misreading that it takes around 15 seconds.

SQL Server Query Plan
beretta

Most of that time is spent in ridiculous Nested Loops Joins. And the really nice part is that you can’t hint hash or merge joins; you get a query processor error.

Union Dues


Let’s compare that to a query where we keep things simple(r). Sure, we type more, and part of our brain had to wake up and do some thinking.

But think of the optimizer. If we save just one optimizer, people, it’s worth it.

SELECT
    c = SUM(x.c)
FROM 
(
    SELECT
       c = 1
    FROM dbo.Users AS U
    JOIN dbo.Comments AS C
        ON  C.UserId = U.Id

    UNION ALL 

    SELECT
       c = 1
    FROM dbo.Users AS U
    JOIN dbo.Comments AS C
        ON  C.UserId =  U.AccountId
    	AND C.UserId <> U.Id
) AS x;

To shortcut to the important part of the plan, this version runs in a little under two seconds.

SQL Server Query Plan
little under

More typing. Fewer problems. Good and good. You might be okay with accepting this rewrite and explanation, and be off to test how a similar rewrite might improve your queries. If so, great! Get to it, Dear Reader.

If you want to dig further into why, follow me along to the next section.

Why OR Conditions Turn Our Poorly


Here’s the “full” query plan for the OR condition join query. There are a couple operators at the far left side that I’ve omitted because they don’t help with the explanation.

SQL Server Query Plan
by the numbers
  1. We scan the Users table. There nothing we can Seek to, so that’s fine, but note the number of r0ws (2,465,713) that come out
  2. All rows from the Users table are emitted via a constant scan operator for the Id column
  3. All rows from the Users table are emitted via a constant scan operator for the AccountId column

I don’t think the optimizer makes better guesses for constant scans, so even though all rows in the table are escorted forth, none of the estimates reflect anything close to that.

The full set of these rows combined (4,931,426) get sorted together rather than as two separate sets, and then a meager attempt to reduce overlapping values is applied at the merge interval. That set is only reduced to 4,931,419; a whopping 7 rows removed.

All of those rows are fed into an Apply Nested Loops join that hits the Comments table and searches the UserId column for values that fall between lower and upper bounds.

SQL Server Query Plan
surely

Even with a good index to seek to these values in, the sheer number of executions gets us stuck pretty badly. In all, ~14 of the ~15 seconds of execution time is spent in this exercise in futility. This query pattern becomes even worse with less useful indexes on the join columns.

At the moment, SQL Server’s query optimizer isn’t able to unwind OR predicates like this to come up with a better execution plan.

A Stronger Union


In the UNION ALL query, you’ll see far less futility, and a more reasonable set of operators presented for the number of rows being processed.

SQL Server Query Plan
trees

There are of course circumstances where this might be less efficient, like if one or more tables isn’t able to fit in the buffer pool and disk I/O sucks (you’re in the cloud, for example). But in general, splitting queries into factions that express precise logic is a good thing.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.