Memory Grants Archives | Page 3 of 3

SQL Server’s Partial Aggregate Query Plan Operators CAN Quit You

Posted on December 20, 2019May 14, 2022 by Erik Darling

This Machine Makes Seltzer

Let’s say you have a parallel query running at DOP 4. The final logic of the query is some aggregate: COUNT, SUM, MIN, MAX, whatever.

Sure, the optimizer could gather all the streams, and then calculate one of those for all four of them, but why do that?

We have a Partial Aggregate operator that allows an aggregate per thread to be locally aggregated, then a final global aggregate can be more quickly calculated from the four locally aggregated values.

There are a couple odd things about Partial Aggregates though:

They ask for a fixed amount of memory, which is usually quite small
When they run out of memory, they don’t spill, they just stop aggregating

Which is why for identical executions of identical queries, you may see different numbers of rows come out of them.

SQL Server Query Plan — Everyone I know is sick to death of you.

We start with the same number of rows coming out of the Hash Join, which is expected.

We ran the same query.

However, the Partial Aggregate emits different numbers of rows.

It doesn’t matter much, because the global aggregate later in the plan will still be able to figure things out, albeit slightly less efficiently.

If we look at the spills in the Hash Match Aggregates from both of the above plans, the warnings are slightly different.

Hardly anything to worry about here, of course. But definitely something to be aware of.

No, SQL Server isn’t leaking memory, or full of bugs. It’s just sensitive.

Thanks for reading!

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

In SQL Server 2019 Memory Grants Are Higher With Batch Mode For Row Store Queries

Posted on December 12, 2019January 20, 2026 by Erik Darling

Because I Got

Video Summary

In this video, I delve into the intricacies of SQL Server memory grants and compatibility levels, particularly focusing on how they behave differently between SQL Server 2017 and 2019. I demonstrate a practical example where I compare the memory requests for identical queries run at different compatibility levels to highlight the significant increase in memory grant sizes with SQL Server 2019. This comparison underscores the challenges faced by users of Standard Edition, especially when dealing with limited RAM resources or virtual machine constraints. By walking through these examples, I aim to provide insights and potential workarounds for those navigating the complexities of SQL Server memory management on a budget.

Full Transcript

Hello everyone, Erik Darling here with Erik Darling Data. I was so happy this week I got brand new stickers in, which can either be apparently a 45 pound weight plate or an LP, if you’re into that sort of thing. Whichever one you’re into more you can pretend that sticker is. Now, in case you can’t tell, I’m a bit under the weather. I’m not well, I’m sick. I got sick earlier this week. When I found out that Standard Edition of 2019 is still only allowed to have 128 gigs of RAM for the buffer pool. That’s how strongly I feel about what a bad decision that is. What a terrible decision that is. It made me physically ill. And, you know, I was starting to feel a little better. I was on the mend. I was taking my vitamins and drinking my OJ. And then, I started testing some of my old demos on 2019. I just got sick all over again. Now, the thing is that despite the fact that, you know, SQL Server 2019 only has 128 gigs of RAM for the buffer pool, you can use memory above that for queries, but those queries are now, they’re asking for more memory.

And this is going to be a real problem if you can’t have more than 128 gigs of RAM in your server for some reason. Or, like, you know, you’re on a VM where the host is small, or, you know, your boss is a cheapskate and won’t buy you memory. You know, just like lots of reasons, or like you’re in the cloud where there’s just not an instance size that has like a sane amount of processors and memory for SQL Server Standard Edition. Because there’s a whole lot of that going around. So what I want to do is I want to show you exactly what I mean. I have two query windows set up side by side. One of them is compat level 140, which will give us the 2017 behavior.

And the other one is compat level 150, which will give us the SQL Server 2019 behavior. And I’ve got two, I’ve got the same query on both sides. And what this query does is it forces SQL Server to sort data. I don’t have an index on reputation that will help me sort the data that I, in a way that’s meaningful to me. So SQL Server is going to have to break out its tiny little baby hands and sort that data for us.

So here’s what happens. SQL Server 2014. Run this. It executes. We get the execution plan. We see we have a sort here. And we see that this sort asks for 166.528 megs of memory.

All right. Cool. What about that same query and compat level 150? What now? What do you have to say for yourself now, SQL Server, especially when I turn on execution plans?

For some reason, I thought that would be global. I don’t know why I’m that stupid. Let’s run that. I can look again. 186 megs of memory. So about 10% more memory there, I think. At least I think it is. I’m not very good at math.

So that could be right. It could be wrong. So let’s call this 186. And that’s no fun. Now let’s take out a couple contenders here. Let’s bring display name into the mix.

And let’s see what happens now. On 2014, that’s 298 megs of memory. All right. We’re running that. Okay. We’ll go run it over here. What’s the big reveal? On SQL Server 2014, that is still 298.968. Okay.

And in SQL Server 2019 mode, compatibility level 150, 334. Eee. Sweet. Sweet. Summer rain. All right. So that was 334 there. All right. 334. All right. Let’s look.

Let’s add the website URL column in. Let’s see what happens when we do this. All right. We’ll come over here. We’ll run this one. Wait for this to finish. All right. There we go. And we’ll come over here and wait for that to finish. While we do that, we’ll see that this is still at 906.968.

So about 900 megs there. I’m still… Okay. Well, we’ll be fair. We’ll say that’s 906. That is what it was. We come over here and look. What do we have? Uh-oh. We are up over a gig. We are at just about… Okay. So like… I don’t know.

I don’t want to get into like too many decimal places. So I’m going to call this one gig even. All right. We’ll change this to reflect that. So we’re up at one… Well, one gig. Now we have to change it. 1.0 gigabytes. As some of my friends from Eastern Europe say.

Gigabytes. All right. Location. Let’s add this in because now I’m curious. How much is this going to go up? This is… This should be bafflingly fun. All right. That one’s done. We’ll come over here. Run this. All right.

What do we got? 1.2 gigs. Just as we expected. All right. Good stuff there. Happy, happy, happy. What do you ask for? SQL Server 2019 mode. 1.3 and a half gigs. So this went up a bit too.

So we’ll say that went from 1.2 to 1.3 gigs. Now this one asks for 9.7 gigs. 9.7. That’s nearly 10 gigs of memory. To do this. So let’s run this one. And this will run a little bit.

And then this will run this one. We’ll go look over here. We’ll see this. Memory grant. Oh, that one actually went down a little bit. I don’t know. I wonder why that was. Oh. Maybe because I have this VM running. Yeah.

That’s probably it. When I have memory from other things taking up space SQL Server, sometimes it deflates its memory grants. But we go over here. And apparently this is the max memory grant that we can give out right now because I have memory touching other things. So that’s good to know that SQL Server is kind to Camtasia and lets it use memory without trying to take it away.

But anyway. So these were both the same at just around 9 gigs. Now, the one thing that I want to point out is that if we run this query a couple times. We’re already in Compat level 150.

But let’s just make double, triple extra sure. If I run this query once and, ooh, CPU fans kicked in a little bit there. Run this query once. We asked for the 9 gigs of memory. All right. And we have this warning down here. Excessive grant, yada, yada, yada.

And if I run this again in 2019 compatibility level, I will get more sane memory grants. SQL Server will reduce the memory grant because it has this thing called memory grant feedback. And that’s really cool and nice.

But that’s not in Standard Edition either. Yeah. Yeah. So if you need help with Standard Edition, please call me. Because apparently Microsoft is not helping you with Standard Edition. So good job there.

Anyway, my name’s Eric. This thing, whatever. It might be a flying saucer. Who knows? Maybe when I take over the world, this will be the new currency. This will be how people pay each other with Darling Data stickers. That’d be nice.

I’ve always wanted to be on currency. Anyway. Thanks for watching. I’ll see you in the next video. Goodbye. Bye. Bye. Bye. Thank you.

Going Further

All The Performance Problems With Select * Queries In SQL Server

Posted on December 10, 2019May 16, 2022 by Erik Darling

This was originally posted by me as an answer here. I’m re-posting it locally for posterity.

The two reasons that I find the most compelling not to use SELECT * in SQL Server are

Memory Grants
Index usage

Memory Grants

When queries need to Sort, Hash, or go Parallel, they ask for memory for those operations. The size of the memory grant is based on the size of the data, both row and column wise.

String data especially has an impact on this, since the optimizer guesses half of the defined length as the ‘fullness’ of the column. So for a VARCHAR 100, it’s 50 bytes * the number of rows.

Using Stack Overflow as an example, if I run these queries against the Users table:

SELECT TOP 1000 
       u.DisplayName 
FROM dbo.Users AS u 
ORDER BY u.Reputation;


SELECT   TOP 1000
         u.DisplayName,
         u.Location
FROM     dbo.Users AS u
ORDER BY u.Reputation;

DisplayName is NVARCHAR 40, and Location is NVARCHAR 100.

Without an index on Reputation, SQL Server needs to sort the data on its own.

But the memory it nearly doubles.

DisplayName:

DisplayName, Location:

This gets much worse with SELECT *, asking for 8.2 GB of memory:

It does this to cope with the larger amount of data it needs to pass through the Sort operator, including the AboutMe column, which has a MAX length.

Index Usage

If I have this index on the Users table:

CREATE NONCLUSTERED INDEX ix_Users ON dbo.Users ( CreationDate ASC, Reputation ASC, Id ASC );

And I have this query, with a WHERE clause that matches the index, but doesn’t cover/include all the columns the query is selecting…

SELECT   u.*,
         p.Id AS PostId
FROM     dbo.Users AS u
JOIN     dbo.Posts AS p
    ON p.OwnerUserId = u.Id
WHERE    u.CreationDate > '20171001'
AND      u.Reputation > 100
AND      p.PostTypeId = 1
ORDER BY u.Id;

The optimizer may choose not to use the narrow index with a key lookup, in favor of just scanning the clustered index.

You would either have to create a very wide index, or experiment with rewrites to get the narrow index chosen, even though using the narrow index results in a much faster query.

CX:

SQL Server Execution Times: CPU time = 6374 ms, elapsed time = 4165 ms.

NC:

SQL Server Execution Times: CPU time = 1623 ms, elapsed time = 875 ms.

Thanks for reading!

Going Further

In SQL Server, Does Query Parallelism Change Query Memory Grants?

Posted on December 9, 2019May 16, 2022 by Erik Darling

This was originally posted as an answer by me here, I’m re-posting it locally for posterity

Sup?

For SQL Server queries that require additional memory, grants are derived for serial plans. If a parallel plan is explored and chosen, memory will be divided evenly among threads.

Memory grant estimates are based on:

Number of rows (cardinality)
Size of rows (data size)
Number of concurrent memory consuming operators

If a parallel plan is chosen, there is some memory overhead to process parallel exchanges (distribute, redistribute, and gather streams), however their memory needs are still not calculated the same way.

Memory Consuming Operators

The most common operators that ask for memory are

Sorts
Hashes (joins, aggregates)
Optimized Nested Loops

Less common operators that require memory are inserts to column store indexes. These also differ in that memory grants are currently multiplied by DOP for them.

Memory needs for Sorts are typically much higher than for hashes. Sorts will ask for at least estimated size of data for a memory grant, since they need to sort all result columns by the ordering element(s). Hashes need memory to build a hash table, which does not include all selected columns.

Examples

If I run this query, intentionally hinted to DOP 1, it will ask for 166 MB of memory.

SELECT *
FROM 
     (  
        SELECT TOP (1000) 
               u.Id 
        FROM dbo.Users AS u
        ORDER BY u.Reputation
     ) AS u
OPTION(MAXDOP 1);

If I run this query (again, DOP 1), the plan will change, and the memory grant will go up slightly.

SELECT *
FROM (  
        SELECT TOP (1000) 
               u.Id
        FROM dbo.Users AS u
        ORDER BY u.Reputation
     ) AS u
JOIN (
        SELECT TOP (1000) 
               u.Id
        FROM dbo.Users AS u
        ORDER BY u.Reputation
     ) AS u2
ON u.Id = u2.Id
OPTION(MAXDOP 1);

There are two Sorts, and now a Hash Join. The memory grant bumps up a little bit to accommodate the hash build, but it does not double because the Sort operators cannot run concurrently.

If I change the query to force a nested loops join, the grant will double to deal with the concurrent Sorts.

SELECT *
FROM (  
        SELECT TOP (1000) 
               u.Id
        FROM dbo.Users AS u
        ORDER BY u.Reputation
     ) AS u
INNER LOOP JOIN ( --Force the loop join
        SELECT TOP (1000) 
               u.Id
        FROM dbo.Users AS u
        ORDER BY u.Reputation
     ) AS u2
ON u.Id = u2.Id
OPTION(MAXDOP 1);

The memory grant doubles because Nested Loop is not a blocking operator, and Hash Join is.

Size Of Data Matters

This query selects string data of different combinations. Depending on which columns I select, the size of the memory grant will go up.

The way size of data is calculated for variable string data is rows * 50% of the column’s declared length. This is true for VARCHAR and NVARCHAR, though NVARCHAR columns are doubled since they store double-byte characters. This does change in some cases with the new CE, but details aren’t documented.

Size of data also matters for hash operations, but not to the same degree that it does for Sorts.

SELECT *
FROM 
     (  
        SELECT TOP (1000) 
                 u.Id          -- 166MB (INT)
               , u.DisplayName -- 300MB (NVARCHAR 40)
               , u.WebsiteUrl  -- 900MB (NVARCHAR 200)
               , u.Location    -- 1.2GB (NVARCHAR 100)
               , u.AboutMe     -- 9GB   (NVARCHAR MAX)
        FROM dbo.Users AS u
        ORDER BY u.Reputation
     ) AS u
OPTION(MAXDOP 1);

But What About Parallelism?

If I run this query at different DOPs, the memory grant is not multiplied by DOP.

SELECT *
FROM (  
        SELECT TOP (1000) 
               u.Id
        FROM dbo.Users AS u
        ORDER BY u.Reputation
     ) AS u
INNER HASH JOIN (
        SELECT TOP (1000) 
               u.Id
        FROM dbo.Users AS u
        ORDER BY u.Reputation
     ) AS u2
ON u.Id = u2.Id
ORDER BY u.Id, u2.Id -- Add an ORDER BY
OPTION(MAXDOP ?);

There are slight increases to deal with more parallel buffers per exchange operator, and perhaps there are internal reasons that the Sort and Hash builds require extra memory to deal with higher DOP, but it’s clearly not a multiplying factor.

Thanks for reading!

Going Further

How Bad Cardinality Estimates Lead To Bad Query Plan Choices

Posted on May 21, 2019May 16, 2022 by Erik Darling

Let’s Run A Cruddy Query

We’ve got no supporting indexes right now. That’s fine.

The optimizer is used to not having helpful indexes. It can figure things out.

    SELECT   p.*
    FROM     dbo.Posts AS p
    JOIN     dbo.Votes AS v
        ON p.Id = v.PostId
    WHERE    p.PostTypeId = 2
    AND      p.CreationDate >= '20131225'
    ORDER BY p.Id;

So uh. We got a merge join here. For some reason. And a query that runs for 27 seconds.

The optimizer was all “no, don’t worry, we’re good to sort 52 million rows. We got this.”

[You don’t got this — ED]

Choices, Choices

Since we have an order by on the Id column of the Posts table, and that column is the Primary Key and Clustered index, it’s already in order.

The optimizer chose to order the PostId column from the Votes table, and preserve the index order of the Id column.

Merge Joins expect ordered input on both sides, don’tcha know?

It could have chosen a Hash Join, but then the order of the Id column from the Posts table wouldn’t have been preserved on the other side.

Merge Joins are order preserving, Hash Joins aren’t. If we use a Hash Join, we’re looking at ordering the results of the join after it’s done.

But why?

Going into the Merge Join, we have a Good Guess™

Coming out of the Merge Join, we have a Bad Guess™

Thinking back to the Sort operator, it only has to order the PostId column from the Votes table.

That matters.

Hash It Up

To compare, we need to see what happens with a Hash Join.

Okay, ignore the fact that this one runs for 2.6 seconds, and the other one ran for 27 seconds.

Just, like, put that aside.

Here’s why:

This Sort operator is different. We need to sort all of the columns in the Posts table by the Id column.

Remember that the Id column is now out of order after the Hash Join.

Needing to sort all those columns, including a bunch of string columns, along with an NVARCHAR(MAX) column — Body — inflates the ever-weeping-Jesus out of the memory grant.

The Hash Join plan is not only judged to be more than twice as expensive, but it also asks for a memory grant that’s ~3x the size of the Merge Join plan.

Finish Strong

Let’s tally up where we’re at.

Both queries have identical estimated rows.

The optimizer chooses the Merge Join plan because it’s cheaper.

The Merge Join plan runs for 27 seconds, asks for 3.3GB of RAM, and spills to disk.
The Hash Join plan runs for 3 seconds, asks for 9.7GB of RAM and doesn’t spill, but it only uses 188MB of the memory grant.

That has impacted the reliability.

In a world where memory grants adjust between executions, I’ll take the Hash Join plan any day of the week.

But this is SQL Server 2017, and we don’t get that without Batch Mode, and we don’t get Batch Mode without playing some tricks.

There are lots of solutions if you’re allowed to tune queries or indexes, but not so much otherwise.

In the next couple posts, I’ll look at different ways to approach this.

Thanks for reading!

Going Further

Does Using DISTINCT In SQL Server Queries Cause Performance Problems?

Posted on May 20, 2019May 16, 2022 by Erik Darling

Footnote

I have two queries. They return the same number of rows.

The only difference is one column in the select list.

This query has the Id column, which is the primary key and clustered index of the Posts table.

    SELECT   DISTINCT
             p.Id, p.AcceptedAnswerId, p.AnswerCount, p.Body, p.ClosedDate,
             p.CommentCount, p.CommunityOwnedDate, p.CreationDate,
             p.FavoriteCount, p.LastActivityDate, p.LastEditDate,
             p.LastEditorDisplayName, p.LastEditorUserId, p.OwnerUserId,
             p.ParentId, p.PostTypeId, p.Score, p.Tags, p.Title, p.ViewCount
    FROM     dbo.Posts AS p
    JOIN     dbo.Votes AS v
        ON p.Id = v.PostId
    WHERE    p.PostTypeId = 2
    AND      p.CreationDate >= '20131225'
    ORDER BY p.Id;

The query plan for it looks like this:

Notice that no operator in this plan performs any kind of aggregation.

There’s no Hash Match Aggregate, no Stream Aggregate, no Distinct Sort, NADA!

It runs for ~1.9 seconds to return about 25k rows.

Lessen

Watch how much changes when we remove that Id column from the select list.

    SELECT   DISTINCT
             p.AcceptedAnswerId, p.AnswerCount, p.Body, p.ClosedDate,
             p.CommentCount, p.CommunityOwnedDate, p.CreationDate,
             p.FavoriteCount, p.LastActivityDate, p.LastEditDate,
             p.LastEditorDisplayName, p.LastEditorUserId, p.OwnerUserId,
             p.ParentId, p.PostTypeId, p.Score, p.Tags, p.Title, p.ViewCount
    FROM     dbo.Posts AS p
    JOIN     dbo.Votes AS v
        ON p.Id = v.PostId
    WHERE    p.PostTypeId = 2
    AND      p.CreationDate >= '20131225';

This is what the query plan now looks like:

Zooming in a bit…

After we Scan the Posts table, we sort about 47k rows.

After the join to Votes, we aggregate data twice. There are two Stream Aggregate operators.

What do we sort?

We Sort every column in the table by every column in the table.

In other words, we order by every column we’ve selected.

What do we aggregate?

Everything. Twice.

What Does It All Mean?

When selecting distinct rows, it can be beneficial to include a column that the optimizer can guarantee is unique in the set of selected columns. Think of a primary key, or another column with a uniqueness constraint on it.

Without that, you can end up doing a lot of extra work to create a distinct result set.

Of course, there are times when that changes the logic of the query.

Thanks for reading!

Going Further

Memory Grants For The SQL Variant Data Type In SQL Server

Posted on April 10, 2019May 16, 2022 by Erik Darling

Great Question, You

During my (sold out, baby!) Madison precon, one attendee asked a great question while we were talking about memory grants.

Turns out, if you use the SQL Variant datatype, the memory grants function a lot like they do for any long string type.

From the documentation, which hopefully won’t move or get deleted:

sql_variant can have a maximum length of 8016 bytes. This includes both the base type information and the base type value. The maximum length of the actual base type value is 8,000 bytes.

Since the optimizer needs to plan for your ~~laziness~~ ~~indecisiveness~~ ~~lack of respect for human life~~ inexperience, you can end up getting some rather enormous memory grants, regardless of the type of data you store in variant columns.

Ol’ Dirty Demo

Here’s a table with a limited set of columns from the Users table.

CREATE TABLE dbo.UserVariant 
( 
    Id SQL_VARIANT, 
    CreationDate SQL_VARIANT, 
    DisplayName SQL_VARIANT,
    Orderer INT IDENTITY
);

INSERT dbo.UserVariant WITH(TABLOCKX)
( Id, CreationDate, DisplayName )
SELECT u.Id, u.CreationDate, u.DisplayName
FROM dbo.Users AS u

In all, about 2.4 million rows end up in there. In the real table, the Id column is an integer, the CreationDate column is a DATETIME, and the DisplayName column is an NVARCHAR 40.

Sadly, no matter which column we select, the memory grant is the same:

SELECT TOP (101) uv.Id
FROM dbo.UserVariant AS uv
ORDER BY uv.Orderer;

SELECT TOP (101) uv.CreationDate
FROM dbo.UserVariant AS uv
ORDER BY uv.Orderer;

SELECT TOP (101) uv.DisplayName
FROM dbo.UserVariant AS uv
ORDER BY uv.Orderer;

SELECT TOP (101) uv.Id, uv.CreationDate, uv.DisplayName
FROM dbo.UserVariant AS uv
ORDER BY uv.Orderer;

It’s also the maximum memory grant my laptop will allow: about 9.6GB.

Get’em!

As if there aren’t enough reasons to avoid sql_variant, here’s another one.

Thanks for reading.

Going Further

A SQL Server Query Plan Memory Grant Bug?

Posted on March 26, 2019May 14, 2022 by Erik Darling

Yes, That’s A Question

I have no idea if this is a bug or not, but I thought it was interesting. Looking at information added to spills in SQL Server 2016…

If you open the linked-to picture, you’ll see (hopefully) that the full memory grant for the query was 108,000KB.

But the spill on the Sort operator lists a far larger grant: 529,234,432KB.

This is in the XML, and not an artifact of Plan Explorer.

Whaddya think, Good Lookings? Should I file a bug report?

Thanks for reading!

Going Further

A SQL Server Memory Grant Helper Query For The Sentry One Repository

Posted on March 25, 2019May 14, 2022 by Erik Darling

SEMA4

While working with a client, I came up with a query against the SentryOne repository.

The point of it is to find queries that waited more than a second to get a memory grant. I wrote it because this information is logged but not exposed in the GUI yet.

It will show you basic information about the collected query, plus:

How long it ran in seconds
How long it waited for memory in seconds
How long it ran for after it got memory

SELECT   HostName,
         CPU,
         Reads,
         Writes,
         Duration,
         StartTime,
         EndTime,
         TextData,
         TempdbUserKB,
         GrantedQueryMemoryKB,
         DegreeOfParallelism,
         GrantTime,
         RequestedMemoryKB,
         GrantedMemoryKB,
         RequiredMemoryKB,
         IdealMemoryKB,
         Duration / 1000. AS DurationSeconds,
         DATEDIFF(SECOND, StartTime, GrantTime) AS SecondsBetweenQueryStartingAndMemoryGranted,
         (Duration - DATEDIFF(MILLISECOND, StartTime, GrantTime)) / 1000. AS HowFastTheQueryRanAfterItGotMemory
FROM     PerformanceAnalysisTraceData
WHERE DATEDIFF(SECOND, StartTime, GrantTime) > 1
ORDER BY SecondsBetweenQueryStartingAndMemoryGranted DESC

The results I saw were surprising! Queries that waited 10+ seconds for memory, but finished instantly when they finally got memory.

If you’re a Sentry One user, you may find this helpful. If you find queries waiting a long time for memory, you may want to look at if you’re hitting RESOURCE_SEMAPHORE waits too.

Thanks for reading!

Going Further

SQL Server Indexes Sort Data For Us

Posted on January 29, 2019May 14, 2022 by Erik Darling

Yelling and Screaming

If you saw my post about parameterized TOPs, one thing you may have immediately hated is the index I created.

And rightfully so — it was a terrible index for reasons we’ll discuss in this post.

If that index made you mad, congratulations, you’re a smart cookie.

CREATE INDEX whatever ON dbo.Votes(CreationDate DESC, VoteTypeId)
GO

Yes, my friends, this index is wrong.

It’s not just wrong because we’ve got the column we’re filtering on second, but because there’s no reason for it to be second.

Nothing in our query lends itself to this particular indexing scenrio.

CREATE OR ALTER PROCEDURE dbo.top_sniffer (@top INT, @vtid INT)
AS
BEGIN

    SELECT   TOP (@top) 
             v.Id, 
             v.PostId, 
             v.UserId, 
             v.BountyAmount, 
             v.VoteTypeId, 
             v.CreationDate
    FROM     dbo.Votes AS v
    WHERE    v.VoteTypeId = @vtid
    ORDER BY v.CreationDate DESC;

END;

We Index Pretty

The reason I sometimes see columns appear first in an index is to avoid having to physically sort data.

If I run the stored procedure without any nonclustered indexes, this is our query plan:

EXEC dbo.top_sniffer @top = 1, @vtid = 1;

A sort, a spill, kablooey. We’re not having any fun, here.

With the original index, our data is organized in the order that we’re asking for it to be returned in the ORDER BY.

This caused all sorts of issues when we were looking for VoteTypeIds that were spread throughout the index, where we couldn’t satisfy the TOP quickly.

There was no Sort in the plan when we had the “wrong” index added.

B-Tree Equality

We can also avoid having to sort data by having the ORDER BY column(s) second in the key of the index, because our filter is an equality.

CREATE INDEX whatever ON dbo.Votes(VoteTypeId, CreationDate DESC)
GO

Having the filter column first also helps us avoid the longer running query issue when we look for VoteTypeId 4.

EXEC dbo.top_sniffer @top = 5000, @vtid = 4;

Table 'Votes'. Scan count 1, logical reads 2262

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 148 ms.

Solving for Sorts

If you’ve been following my blogging for a while, you’ve likely seen me say this stuff before, because Sorts have some issues.

They’re locally blocking, in that every row has to arrive before they can run
They require additional memory space to order data the way you want
They may spill to disk if they don’t get enough memory
They may ask for quite a bit of extra memory if estimations are incorrect
They may end up in a query plan even when you don’t explicitly ask for them

There are plenty of times when these things aren’t problems, but it’s good to know when they are, or when they might turn into a problem.

Thanks for reading!