Indexing Archives | Page 9 of 21

Deduplicating SQL Server Missing Index Requests Part 2

Posted on January 7, 2021May 16, 2022 by Erik Darling

Deedoo

In yesterday’s post, we talked a little about different ways to approach looking at missing index requests, and how their benefit is calculated in sp_BlitzIndex.

If you’re on SQL Server 2019, you may be able to get some idea which queries are generating missing index requests. It’s not documented yet, but that’s never stopped anyone from using anything in production.

Let’s look at the queries asking for them.

Natural Ruckus

Let’s take a look at what queries are causing those missing index requests.

Since I’m running them, I don’t have to do any work.

SELECT TOP (10)
    p.*
FROM dbo.Posts AS p
WHERE p.OwnerUserId = 22656
ORDER BY p.Score DESC;

SELECT TOP (10) 
    p.*
FROM dbo.Posts AS p
WHERE p.OwnerUserId = 22656
AND   p.CreationDate >= '20130101'
ORDER BY p.Score DESC;

SELECT TOP (10)
    p.*
FROM dbo.Posts AS p
WHERE p.OwnerUserId = 22656
AND   p.PostTypeId = 1
ORDER BY p.Score DESC;

SELECT TOP (10)
    p.*
FROM dbo.Posts AS p
WHERE p.OwnerUserId = 22656
AND   p.LastActivityDate >= '20130101'
ORDER BY p.Score DESC;

SELECT TOP (10) 
    p.*
FROM dbo.Posts AS p
WHERE p.OwnerUserId = 22656
AND   p.Score > 0
ORDER BY p.Score DESC;

Without any helpful indexes, all of these take about the same amount of time — between 600 and 700ms, and most of the time is spent scanning the clustered index on Posts, which is around 17 million rows, just to return ~27k rows..

That’s Not “Slow”

Usually when people are complaining about slow queries, they’re talking about stuff that drags on for several seconds or longer.

At just about half a second, most people wouldn’t getting a running start to jump through hoops to make these faster.

But they all have a “high cost” of a touch over 3000 query bucks. If you’re the type of person who only focuses on one metric, you might be overlooking a whole lot of things that need attention.

SQL Server Query Plan — spuriously yours

If we made this query 99% faster, it would mean a rather miniscule improvement in elapsed time. The flip side of the one-metric thing is using duration alone as a metric. Queries might have a very long duration because they’re blocked, but use minimal resource when they’re finally allowed to run by the gods of ACID compliance.

My favorite queries to find and tune are ones that:

Unnecessarily run for a long time, using a lot of CPU
Unnecessarily ask for large memory grants
Have issues with parameter sniffing

Let’s pretend there might be some value to tuning these, though. Maybe we’re upset that they’re going parallel. Maybe we have something against scanning clustered indexes. Maybe we just don’t have anything else to do.

In tomorrow’s post, we’ll look at how to take all of those requests and come up with one good enough-index for our queries.

Just like how the optimizer comes up with one “good enough” plan for all queries.

Thanks for reading!

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Deduplicating SQL Server Missing Index Requests Part 1

Posted on January 6, 2021May 16, 2022 by Erik Darling

Church

I often find myself reviewing missing index requests during consulting engagements. Not because they’re so awesome, but because they’re often just good enough to provide some quick relief before more fine-tuned efforts are explored.

More to the point: if someone has no idea which queries they need to tune, and everything is pretty slow, this is a good starting place.

Given sufficient server uptime, of course.

World Tour

Sometimes you’ll see that slam-dunk missing index request with lots of uses, and you can tie it to a query that you know is bad. Of course, I’m quite partial to using sp_BlitzIndex to analyze indexes. There are a few different places that missing indexes will be detailed in.

Mode 0: the most important stuff
Mode 4: anything and everything
Mode 3: just missing index requests
Table Mode: analyzing just one table

The easiest way to find examples like I’ll be talking about is to look at just one table. In this case, the Posts table.

EXEC sp_BlitzIndex @TableName = 'Posts';

If you have missing index requests for a table, they’ll look something like this:

That estimated benefit number is pretty big here, so it jumps out a bit. Normally I don’t start really paying attention until that number is >5 million. That’s not terribly scientific, but you have to draw the line somewhere.

Of course, one very sneaky thing to consider is when you have a set of duplicative requests with low-ish estimated benefit individually, but combined they just might add up to something quite useful.

Just A Kid

The estimated benefit number is just a function of the three feedback metrics that get logged with missing index requests: uses * impact * average query cost.

Uses is a fairly reliable metric, but impact and average query cost are a little more hand-wavy. Even high-cost queries can be very fast. It doesn’t mean that they can’t be tuned or don’t need indexes, but they might not be your worst-performers.

In tomorrow’s post, we’ll look at that, and how you can come up with a good-enough index for a bunch of similar queries.

Thanks for reading!

Going Further

Defeating Parameter Sniffing With Dynamic SQL In SQL Server

Posted on December 24, 2020May 15, 2022 by Erik Darling

Enjoy!

Thanks for watching!

Going Further

Using Plan Guides To Get Around Query Hints In SQL Server

Posted on December 23, 2020May 15, 2022 by Erik Darling

Prophesy As

According to Not-Australians, there used to be a trace flag that would get queries to ignore any supplied hints. It doesn’t work anymore, which sucks, kinda.

Because people do lots of stupid things with hints. Real stupid things. Things you wouldn’t believe the stupid of.

Let’s say, for example, hypothetically of course, that your front end application would add an index hint to every query.

That index hint may or not be helpful to your query in any way. But there it is.

Let’s also posit, using the very depths of our imaginations, that the front end developer was unlikely to change that behavior.

Planning Fields

We’ve got a couple indexes:

CREATE INDEX r 
    ON dbo.Users(Reputation) 
WITH(MAXDOP = 8, SORT_IN_TEMPDB = ON);

CREATE INDEX c 
    ON dbo.Users(CreationDate) 
WITH(MAXDOP = 8, SORT_IN_TEMPDB = ON);

And we’ve got a query that, via an index hint, is being forced to use the wrong index.

DECLARE @Reputation int = 2;
EXEC sp_executesql N'SELECT * FROM dbo.Users WITH (INDEX  = c) WHERE Reputation = @Reputation;',
                   N'@Reputation int',
                   @Reputation;

The ensuing query plan makes no sense whatsoever.

The things are all backwards. We scan the entire nonclustered index, and do a lookup to the clustered index just to evaluate the @Reputation predicate.

The idea is bad. Please don’t do the idea.

Guiding Bright

There are two things we could do here. We could hint the query to use the index we want, sure.

But what if we change something about this index, or add another one to the table? We might want the optimizer to have a bit more freedom to choose.

I mean, I know. That has its own risks, but whatever.

We can add a plan guide that looks like this:

EXEC sp_create_plan_guide
@name = N'dammit',
@stmt = N'SELECT * FROM dbo.Users WITH (INDEX  = c) WHERE Reputation = @Reputation;',
@type = N'SQL',
@module_or_batch = NULL,
@params = N'@Reputation int',
@hints =  N'OPTION(TABLE HINT(dbo.Users))';

If we were writing proper queries where tables are aliased, it’d look like this:

EXEC sp_create_plan_guide
@name = N'dammit',
@stmt = N'SELECT u.* FROM dbo.Users AS u WITH (INDEX  = c) WHERE u.Reputation = @Reputation;',
@type = N'SQL',
@module_or_batch = NULL,
@params = N'@Reputation int',
@hints =  N'OPTION(TABLE HINT(u))';

When we re-run our query, things look a lot better:

Thanks for reading!

Going Further

A Suggestion To Make Key Lookups Less Of A Performance Problem In SQL Server

Posted on December 18, 2020May 15, 2022 by Erik Darling

Odor Of Gas

One problem with Lookups, aside from the usual complaints, is that the optimizer has no options for when the lookup happens.

If the optimizer decides to use a nonclustered index to satisfy some part of the query, but the nonclustered index doesn’t have all of the columns needed to cover what the query is asking for, it has to do a lookup.

Whether the lookup is Key or RID depends on if the table has a clustered index, but that’s not entirely the point.

The point is that there’s no way for the optimizer to decide to defer the lookup until later in the plan, when it might be more opportune.

Gastric Acid

Let’s take one index, and two queries.

CREATE INDEX p
    ON dbo.Posts(PostTypeId, Score, CreationDate)
    INCLUDE(OwnerUserId);

Stop being gross.

SELECT TOP (1000)
    u.DisplayName,
    p.*
FROM dbo.Posts AS p
JOIN dbo.Users AS u
    ON p.OwnerUserId = u.Id
WHERE p.PostTypeId = 1
AND   p.Score > 5
ORDER BY p.CreationDate DESC;

SELECT TOP (1000)
    u.DisplayName,
    p.*
FROM dbo.Posts AS p
JOIN dbo.Users AS u
    ON p.OwnerUserId = u.Id
WHERE p.PostTypeId = 1
AND   p.Score > 6
ORDER BY p.CreationDate DESC;

The main point here is not that the lookup is bad; it’s actually good, and I wish both queries would use one.

If we hint the first query to use the nonclustered index, things turn out better.

SELECT TOP (1000)
    u.DisplayName,
    p.*
FROM dbo.Posts AS p WITH(INDEX = p)
JOIN dbo.Users AS u
    ON p.OwnerUserId = u.Id
WHERE p.PostTypeId = 1
AND   p.Score > 5
ORDER BY p.CreationDate DESC;

Running a full second faster seems like a good thing to me, but there’s a problem.

Ingest

Whether we use the lookup or scan the clustered index, all of these queries ask for rather large memory grants, between 5.5 and 6.5 GB

SQL Server Query Plan Tool Tips — bigsort4u

The operator asking for memory is the Sort — and while I’d love it if we could index for every sort — it’s just not practical.

So like obviously changing optimizer behavior is way more practical. Ahem.

The reason that the Sort asks for so much memory in each of these cases is that it’s forced to order the entire select output from the Posts table by the CreationDate column.

Detach

If we rewrite the query a bit, we can get the optimizer to sort data long before we go get all the output columns:

SELECT TOP (1000)
    u.DisplayName,
    p2.*
FROM dbo.Posts AS p
JOIN dbo.Posts AS p2
    ON p.Id = p2.Id
JOIN dbo.Users AS u
    ON p.OwnerUserId = u.Id
WHERE p.PostTypeId = 1
AND   p.Score > 5
ORDER BY p.CreationDate DESC;

SELECT TOP (1000)
    u.DisplayName,
    p2.*
FROM dbo.Posts AS p
JOIN dbo.Posts AS p2
    ON p.Id = p2.Id
JOIN dbo.Users AS u
    ON p.OwnerUserId = u.Id
WHERE p.PostTypeId = 1
AND   p.Score > 6
ORDER BY p.CreationDate DESC;

In both cases, we get the same query plan shape, which is what we’re after:

Seek into the nonclustered index on Posts
Sort data by CreationDate
Join Posts to Users first
Join back to Posts for the select list columns

Because the Sort happens far earlier on in the plan, there’s less of a memory grant needed, and by quite a stretch from the 5+ GB before.

Thanks for reading!

Going Further

Recompile Hints Can’t Fix Every SQL Server Query Performance Problem

Posted on December 17, 2020May 15, 2022 by Erik Darling

Fast 1

After blogging recently (maybe?) about filters, there was a Stack Exchange question about a performance issue when a variable was declared with a max type.

After looking at it for a minute, I realized that I had never actually checked to see if a recompile hint would allow the optimizer more freedom when dealing with them.

CREATE INDEX u 
    ON dbo.Users(DisplayName);

DECLARE @d nvarchar(MAX) = N'Jon Skeet';

SELECT 
    COUNT_BIG(*) AS records
FROM dbo.Users AS u
WHERE u.DisplayName = @d;

SELECT 
    COUNT_BIG(*) AS records
FROM dbo.Users AS u
WHERE u.DisplayName = @d
OPTION(RECOMPILE);

Turns out that it won’t, which is surprising.

Even though both plans have sort of a weird seek, the filter operator remains as a weird sort of residual predicate.

Thanks for reading!

Going Further

Annoyances When Indexing For Windowing Functions In SQL Server

Posted on December 11, 2020May 15, 2022 by Erik Darling

One Day

I will be able to not care about this sort of thing. But for now, here we are, having to write multiple blogs in a day to cover a potpourri of grievances.

Let’s get right to it!

First, without a where clause, the optimizer doesn’t think that an index could improve one single, solitary metric about this query. We humans know better, though.

WITH Votes AS 
(
    SELECT
        v.Id,
        ROW_NUMBER() 
            OVER(PARTITION BY 
                     v.PostId 
                 ORDER BY 
                     v.CreationDate) AS n
    FROM dbo.Votes AS v
)
SELECT *
FROM Votes AS v
WHERE v.n = 0;

The tough part of this plan will be putting data in order to suit the Partition By, and then the Order By, in the windowing function.

Without any other clauses against columns in the Votes table, there are no additional considerations.

Two Day

What often happens is that someone wants to add an index to help the windowing function along, so they follow some basic guidelines they found on the internet.

What they end up with is an index on the Partition By, Order By, and then Covering any additional columns. In this case there’s no additional Covering Considerations, so we can just do this:

CREATE INDEX v2 ON dbo.Votes(PostId, CreationDate);

If you’ve been following my blog, you’ll know that indexes put data in order, and that with this index you can avoid needing to physically sort data.

Three Day

The trouble here is that, even though we have Cost Threshold For Parallelism (CTFP) set to 50, and the plan costs around 195 Query Bucks, it doesn’t go parallel.

Creating the index shaves about 10 seconds off the ordeal, but now we’re stuck with this serial calamity, and… forcing it parallel doesn’t help.

Our old nemesis, repartition streams, is back.

Even at DOP 8, we only end up about 2 seconds faster. That’s not a great use of parallelism, and the whole problem sits in the repartition streams.

This is, just like we talked about yesterday, a row mode problem. And just like we talked about the day before that, windowing functions generally do benefit from batch mode.

Thanks for reading!

Going Further

An Overlooked Benefit Of Batch Mode For Parallel Query Plans In SQL Server

Posted on December 10, 2020May 15, 2022 by Erik Darling

Make It Count

When queries go parallel, you want them to be fast. Sometimes they are, and it’s great.

Other times they’re slow, and you end up staring helplessly at a repartition streams operator.

Sometimes you can reduce the problem with higher DOP hints, or better indexing, but overall it’s a crappy situation.

Snap To

Let’s admire a couple familiar looking queries, because that’s been working really well for us so far.

WITH Comments AS 
(
    SELECT
        ROW_NUMBER() 
            OVER(PARTITION BY 
                     c.UserId
                 ORDER BY 
                     c.CreationDate) AS n
    FROM dbo.Comments AS c
)
SELECT *
FROM Comments AS c
WHERE c.n = 0
OPTION(USE HINT('QUERY_OPTIMIZER_COMPATIBILITY_LEVEL_140'));

WITH Comments AS 
(
    SELECT
        ROW_NUMBER() 
            OVER(PARTITION BY 
                     c.UserId
                 ORDER BY 
                     c.CreationDate) AS n
    FROM dbo.Comments AS c
)
SELECT *
FROM Comments AS c
WHERE c.n = 0
OPTION(USE HINT('QUERY_OPTIMIZER_COMPATIBILITY_LEVEL_150'));

One is going to run in compatibility level 140, the other in 150, as foretold by ancient alien prophecy.

The two query plans will have a bit in common, but…

The second query, which runs in batch mode, runs about 15 seconds faster. One big reason why is that we skip that most unfortunate repartition streams operator.

It’s a cold sore. An actual factual cold sore.

The only ways I’ve found to fix it completely are:

Induce batch mode
Use the parallel apply technique

But the parallel apply technique doesn’t help much here, because of local factors.

In this case, me generating the largest possible result set and then filtering it down to nothing at the end.

Thanks for reading!

Going Further

An Overlooked Benefit Of Batch Mode With Windowing Functions In SQL Server

Posted on December 9, 2020May 15, 2022 by Erik Darling

Lavender

If you ask people who tune queries why batch mode is often much more efficient with windowing functions, they’ll tell you about the window aggregate operator.

That’s all well and good, but there’s another, often sneaky limitation of fully row mode execution plans with windowing functions.

Let’s go take a look!

Global Aggregates

One thing that causes an early serial zone in execution plans is if you use a windowing function that only has the order by

For example, let’s look at the plans for these two queries:

WITH Comments AS 
(
    SELECT
        ROW_NUMBER() 
            OVER(ORDER BY 
                     c.CreationDate) AS n
    FROM dbo.Comments AS c
)
SELECT *
FROM Comments AS c
WHERE c.n = 0;

WITH Comments AS 
(
    SELECT
        ROW_NUMBER() 
            OVER(PARTITION BY 
                     c.UserId
                 ORDER BY 
                     c.CreationDate) AS n
    FROM dbo.Comments AS c
)
SELECT *
FROM Comments AS c
WHERE c.n = 0;

The resulting estimated plans look like this, using the 140 compatibility level:

In the top plan, where the windowing function only has an order by, the serial zone happens immediately before the Segment operator. In the second plan, the parallel zone carries on until right before the select operator.

If you’re wondering why we’re only looking at estimated plans here, it’s because repartition streams ruins everything.

In The Year 2000

In compatibility level 150, things change a bit (yes, a window aggregate appears):

And the window aggregate appears within the parallel zone. The parallel zone does end before the filter operator, which may or may not be a disaster depending on how restrictive your filter is, and how many rows end up at it.

Also note the distinct lack of a repartition streams operator ruining everything. We’ll talk about that tomorrow.

Thanks for reading!

Going Further

Batch Mode On Row Store vs Batch Mode Tricks For Performance Tuning SQL Server Queries

Posted on December 8, 2020May 15, 2022 by Erik Darling

Quiet Time

I think Batch Mode is quite spiffy for the right kind of query, but up until SQL Server 2019, we had to play some tricks to get it:

Do a funny join to an empty table with a column store index
Create a filtered column store index with no data in it

If you’re on SQL server 2019 Enterprise Edition, and you’ve got your database in compatibility level 150, you may heuristically receive Batch Mode without those tricks.

One important difference between Batch Mode Tricks™ and Batch Mode On Rowstore (BMOR) is that the latter allows you to read from row mode tables using Batch Mode, while the former doesn’t.

Tricks have limits, apparently.

Squish Squish

To cut down on typing, I’ll often create a helper object like this:

CREATE TABLE dbo.t
(
    id int NULL,
    INDEX c CLUSTERED COLUMNSTORE
);

If you read this post, you’ll understand more why.

Now, let’s compare these two queries:

SELECT 
    p.OwnerUserId,
    COUNT_BIG(*) AS records
FROM dbo.Posts AS p
LEFT JOIN dbo.t
    ON 1 = 0
WHERE p.Score < 50
GROUP BY p.OwnerUserId
HAVING COUNT_BIG(*) > 2147483647
OPTION(USE HINT('QUERY_OPTIMIZER_COMPATIBILITY_LEVEL_140'), MAXDOP 8);

SELECT 
    p.OwnerUserId,
    COUNT_BIG(*) AS records
FROM dbo.Posts AS p
WHERE p.Score < 50
GROUP BY p.OwnerUserId
HAVING COUNT_BIG(*) > 2147483647
OPTION(USE HINT('QUERY_OPTIMIZER_COMPATIBILITY_LEVEL_150'), MAXDOP 8);

One executes in compatibility level 140, the other in 150.

Splish Splash

There are a couple interesting things, here.

Even though both queries have operators that execute in Batch Mode (Filter, Hash Match), only the second query can read from the row store clustered index in Batch Mode. In this case, that shaves a couple hundred milliseconds off the seek.

There is likely some additional invisible benefit to not having to convert the row mode seek to a batch mode hash join at the next operator, since one executes for 501ms, and the other executes for 278ms. There’s nothing in the query plan to signal that happening, so you’ll just have to use your imagination.

Thanks for reading!