Execution Plans Archives | Page 14 of 31

Starting SQL: How Parameters Can Change Which Indexes SQL Server Chooses

Posted on September 24, 2020May 15, 2022 by Erik Darling

Down And Out

There are lots of different ways that parameter sniffing can manifest in both the operators chosen, the order of operators chosen, and the resources acquired by a query when a plan is compiled. At least in my day-to-day consulting, one of the most common reasons for plans being disagreeable is around insufficient indexes.

One way to fix the issue is to fix the index. We’ll talk about a way to do it without touching the indexes tomorrow.

Let’s say we have this index to start with. Maybe it was good for another query, and no one ever thought twice about it. After all, you rebuild your indexes every night, what other attention could they possible need?

CREATE INDEX v 
    ON dbo.Votes(VoteTypeId, CreationDate);

If we had a query with a where clause on those two columns, it’d be be able to find data pretty efficiently.

But how much data will it find? How many of each VoteTypeId are there? What range of dates are we looking for?

Well, that depends on our parameters.

Cookie Cookie

Here’s our stored procedure. There’s one column in it that isn’t in our index. What a bummer.

CREATE OR ALTER PROCEDURE dbo.VoteCount (@VoteTypeId INT, @YearsBack INT)
AS
BEGIN

SELECT TOP (1000) 
    x.VoteTypeId,
    x.PostId,
    x.TotalPosts,
    x.UniquePosts
FROM 
    (
        SELECT v.VoteTypeId,
               v.PostId,
               COUNT_BIG(v.PostId) AS TotalPosts,
               COUNT_BIG(DISTINCT v.PostId) AS UniquePosts
        FROM dbo.Votes AS v
        WHERE v.CreationDate >= DATEADD(YEAR, (-1 * @YearsBack), '2014-01-01')
        AND   v.VoteTypeId = @VoteTypeId
        GROUP BY v.VoteTypeId,
                 v.PostId
    ) AS x
ORDER BY x.TotalPosts DESC;

END;

That doesn’t matter for a small amount of data, whether it’s encountered because of the parameters used, or the size of the data the procedure is developed and tested against. Testing against unrealistic data is a recipe for disaster, of course.

Cookie Cookie

What can be tricky is that if the sniffing is occurring with the lookup plan, the optimizer won’t think enough of it to request a covering index, either in plan or in the index DMVs. It’s something you’ll have to figure out on your own.

SQL Server Query Plan Tool Tip — oh yeah that

So we need to add that to the index, but where? That’s an interesting question, and we’ll answer it in tomorrow’s post.

Thanks for reading!

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Starting SQL: Why Is My SQL Server Query Suddenly Slower Than It Was Yesterday?

Posted on September 23, 2020May 15, 2022 by Erik Darling

Pick A Database, Any Database

Alright, maybe not any database. Let’s stick with SQL Server. That’s the devil we know.

At some point in your life, you’re going to construct a query that takes user input, and that input is likely going to come in the form of a parameter.

It could be a stored procedure, dynamic SQL, or something from your application. But there it is.

Waiting. Watching.

Sniffing.

Defining A Problem

When we use parameters, we re-use execution plans, at least until a Qualifying Event™ occurs.

What’s a qualifying event?

Recompile hint
Stats update
Temp table modification threshold
Plan eviction
Server restart

Now, it might be reasonable to think that a mature optimizer — and it is an optimizer, not just a planner — would be able to do something a bit more optimal. After all, why would anyone think it would take the same amount of work to get through 100 rows as it would take to get through 1,000,000 rows? It’s a fundamentally different approach.

Doing a run to the grocery store to replenish a few things requires a far different mindset from going to prepare for a large family meal. You have to choose between a basket or a cart, whether you can jump right to the couple spots you need or you need to walk up and down every aisle, and even if you might need to write down a list because it doesn’t fit into brain memory.

One might also have the expectation that if a significant inaccuracy is detected at runtime, the strategy might change. While that does sort of happen with Adaptive Joins, it’s not a full plan rewrite.

Detecting A Problem

The plan cache usually sucks for this, unless you’re saving the data off to more stable tables. Why? Because most people only figure out they’ve been sniffed after a plan changes, which means it’s not in the cache anymore. You know, when end users start complaining, the app goes unresponsive, you can’t connect to the server, etc.

You could set your watch to it.

But sometimes it’s there. Some funny looking little plan that looks quite innocent, but seems to do a lot of work when you bang it up against other DMVs.

If you have the luxury, Query Store is quite a better tool for detecting plan changes. It’s even got reports built in just for that.

SQL Server Query Store — how nice of you.

For the extra fancy amongst you, I pray that your expensive monitoring tool has a way to tell you when query plans change, or when normally fast plans deviate from that.

Deciphering A Problem

This is where things can get difficult, unless you’re monitoring or logging information. You typically need a few different combinations of parameter values to feed in to your query, so you can see what changed and when. Quite often, there’s no going back easily.

Let’s say you had a plan, and it was a good plan. Then one of those pesky qualifying events comes along, and it’s decided that you need a new plan.

And what if… that new plan is worse? No matter how much you recompile or update stats or toggle with cardinality estimation, you just can’t go back to the way things were without lots of hints or changes to the query? Maybe that’s not parameter sniffing. Maybe that’s parameter snuffing. I’m gonna copyright that.

Most parameter sniffing will result in a plan with a set of bad choices for different amounts of data, which will result in something like this:

This isn’t a “bad estimate” — it was a pretty good estimate for the first parameter value. It just wasn’t a good estimate for the second parameter value.

And to be honest, quite a bit of parameter sniffing issues come from Nested Loops. Not because it’s bad, but because it’s bad for large amount of data, especially in a serial plan. It’s a pretty easy way to gum up a query, though. Make it get stuck in a loop for 13 million rows. It wasn’t fast? No kidding. Poof, be gone.

But then opposite-land isn’t good, either.

This plan probably makes plenty of sense for a big chunk of data. One big scan, one big hash, one big sort. Done.

Of course, for a small amount of data, we go from taking 1ms to taking 2s. Small amount of data people will likely not be happy with that. Your server might not be either, what with all the extra CPU resources we’re using in this here parallel plan all the time now.

Tomorrow, we’ll look at how sometimes you can fix parameter sniffing with better indexes.

Thanks for reading!

Going Further

Starting SQL: What Happens When You Don’t Parameterize SQL Server Queries?

Posted on September 22, 2020May 17, 2022 by Erik Darling

Nothing, Nothing, Nothing

Aside from the obvious perils of SQL Injection, parameterizing queries can be helpful in other ways.

Not to downplay SQL Injection at all, it’s just that fixing code is only one part of the equation. Vendor applications often need to do some pretty administrative things, and require elevated permissions.

Just last week I was looking at an application that had a stored procedure which took a parameter that was the name of an executable. There were no checks on the name. You could put in quite not-figuratively anything you wanted, and xp_cmdshell would run it.

All the parameterization in the world won’t help that.

Well, you get what you get.

Of Parameters And Plans

This can happen in three common ways:

You have an interface that accepts user-entered values
You have an ORM where you haven’t explicitly defined parameter types and precisions
You have dynamic SQL where values are concatenated into strings

Since I spend all my time in the database, I’m going to show you the third one. I wouldn’t normally format a query like this, but I’m trying to keep it web-friendly.

DECLARE @StartDate DATETIME = '20130101';
WHILE @StartDate < '20140101'
BEGIN
DECLARE @NoParams4u NVARCHAR(MAX) = 
N'
SELECT SUM(c.Score) AS TotalScore
FROM dbo.Comments AS c
WHERE c.CreationDate 
    BETWEEN CONVERT(DATETIME, ''' + RTRIM(@StartDate) + ''') 
    AND CONVERT(DATETIME, ''' + RTRIM(DATEADD(DAY, 11, @StartDate)) + ''')
AND 1 = (SELECT 1);
';

EXEC sys.sp_executesql @NoParams4u;
RAISERROR('%s', 0, 1, @NoParams4u);

SET @StartDate = DATEADD(DAY, 11, @StartDate);

END
GO

To make things interesting, I’ve created a non-covering index on the Comments table:

CREATE INDEX c ON dbo.Comments(CreationDate);

I’ve also had to use an oddly specific number of day increment in order to get some good plan variety because of that index. If you ever wonder why some blog posts take three hours to write, the most likely cause is finding the right number.

The first thing you should notice is that even using sp_executesql doesn’t help when our code isn’t parameterized. The second thing you’ll probably notice is 1 = (SELECT 1) at the end of the query. Past me has answers to all those questions, if you click on the links.

Bouquet

If I run that loop, I get back 34 results. Each query gets an individually compiled query plan, though there are only three “different” plans used.

Notice that each plan has different literal date values passed in to it, and different estimates. If we created a covering index, we’d get plan stability across executions, but then we’d still have to compile it when these literal values get passed in. I did it like this to reinforce my point.

We can validate that by looking in the plan cache and surrounding DMVs, using sp_BlitzCache.

Planimal Activist

If you have code like this, one good way to find culprits is by running it like this:

EXEC sp_BlitzCache @SortOrder = 'query hash';

A zoomed-in snapshot of the part of the results that prove my point looks about like this:

This is just the top 10 results. You can see a warning about multiple plans, and that each query has a single execution.

I mentioned before that if we add a covering index, we’ll get plan stability. That’s true, but specific to this demo on SQL Server 2019, there’s an alternate plan available for executions that qualify for Batch Mode On Rowstore (BMOR):

In prior versions of SQL Server in row mode only plans, we could only have a stream aggregate. But even getting the same plan most of the time, we still need to compile it every time. SQL Server still thinks each of these queries is “new” and needs to get a plan compiled.

Fixing It

There are two options for fixing this. If you need to do it quickly, at scale, the database level option FORCED PARAMETERIZATION can take care of most of these problems. Just make sure you read up on the limitations. To fix it for just a couple problem queries, you need to fix the dynamic SQL.

DECLARE @NoParams4u NVARCHAR(MAX) = 
N'
SELECT SUM(c.Score) AS TotalScore
FROM dbo.Comments AS c
WHERE c.CreationDate BETWEEN @StartDate AND DATEADD(DAY, 11, @StartDate)
AND 1 = (SELECT 1);
';

EXEC sys.sp_executesql @NoParams4u, N'@StartDate DATETIME', @StartDate;

I’m not concatenating values into the string anymore, and I’m passing the @StartDate value in when I execute the dynamic SQL.

One point I want to make is that it’s generally safe to do date math on the parameter. I’m not doing date math on the column, which would generally be a bad idea.

But anyway, now our plan gets used 34 times.

In this case, plan reuse works out well. Every query is looking at a sufficiently narrow range of data to have it not matter, and the longest running execution is around 31ms.

But what about when that doesn’t work out? When can parameterization backfire? We’ll find out tomorrow!

Thanks for reading!

Going Further

Starting SQL: What’s The Difference Between Parameters And Local Variables In SQL Server?

Posted on September 21, 2020May 17, 2022 by Erik Darling

Parameter Positive

There are many good reasons to parameterize a query, but there are also trade-offs. There’s no such thing as a free parameter, as they say.

In this post, we’re going to discuss what is and isn’t a parameter, and some of the pros and cons.

What’s important to keep in mind is that good indexing can help avoid many of the cons, but not all. Bad indexing, of course, causes endless problems.

There are many good reasons to parameterize your queries, too. Avoiding SQL injection is a very good reason.

But then!

What’s Not A Parameter

It can be confusing to people who are getting started with SQL Server, because parameters and variables look exactly the same.

They both start with @, and feel pretty interchangeable. They behave the same in many ways, too, except when it comes to cardinality estimation.

To generalize a bit, though, something is a parameter if it belongs to an object. An object can be an instance of:

A stored procedure
A function
Dynamic SQL

Things that aren’t parameters are things that come into existence when you DECLARE them. Of course, you can pass things you declare to one of the objects above as parameters. For example, there’s a very big difference between these two blocks of code:

DECLARE @VoteTypeId INT = 7;

SELECT *
FROM dbo.Votes AS v
WHERE v.VoteTypeId = @VoteTypeId;

DECLARE @sql NVARCHAR(MAX) = N'
SELECT *
FROM dbo.Votes AS v
WHERE v.VoteTypeId = @VoteTypeId;
'
EXEC sp_executesql @sql, N'@VoteTypeId INT', @VoteTypeId;

But it’s not obvious until you look at the query plans, where the guess for the declared variable is god awful.

Then again, if you read the post I linked to up there, you already knew that. Nice how that works.

If you’re too lazy to click, I’m too lazy to repeat myself.

What’s the point? Variables, things you declare, are treated differently from parameters, things that belong to a stored procedure, function, or dynamic SQL.

Parameter Problems

The problem with parameterization is one of familiarity. It not only breeds contempt, but… sometimes data just grows apart.

Really far apart.

SELECT 
    v.VoteTypeId, 
    COUNT_BIG(*) AS records
FROM dbo.Votes AS v
GROUP BY v.VoteTypeId
ORDER BY records;

Natural Selection

When you parameterize queries, you give SQL Server permission to remember, and more importantly, to re-use.

What it re-uses is the execution plan, and what it remembers are cardinality estimates. If we do something like this, we don’t get two different execution plans, or even two different sets of guesses, even though the values that we’re feeding to each query have quite different distributions in our data.

The result is two query plans that look quite alike, but behave quite differently.

One takes 23 milliseconds. The other takes 1.5 seconds. Would anyone complain about this in real life?

Probably not, but it helps to illustrate the issue.

Leading Miss

Where this can get confusing is when you’re trying to diagnose a performance problem.

If you look in the plan cache, or in query store, you’ll see the plan that gets cached for the very first parameter. It’ll look simple and innocent, sure. But the problem is with a totally different parameter that isn’t logged anywhere.

You might also face a different problem, where the query recompiles because you restarted the server, updated stats, rebuilt indexes, or enough rows in the table changed to trigger an automatic stats update. If any of those things happen, the optimizer will wanna come up with a new plan based on whatever value goes in first.

If the roles get reversed, the plan will change, but they’ll both take the same amount of time now.

DECLARE @VoteTypeId INT;

SET @VoteTypeId = 16

DECLARE @sql NVARCHAR(MAX) = N'
SELECT *
FROM dbo.Votes AS v
WHERE v.VoteTypeId = @VoteTypeId;
';
EXEC sp_executesql @sql, N'@VoteTypeId INT', @VoteTypeId;

SET @VoteTypeId = 7;

SET @sql = N'
SELECT *
FROM dbo.Votes AS v
WHERE v.VoteTypeId = @VoteTypeId;
';
EXEC sp_executesql @sql, N'@VoteTypeId INT', @VoteTypeId;

Deal With It ?

In the next few posts, we’ll talk about what happens when you don’t parameterize queries, and different ways to deal with parameter sniffing.

A recompile hint can help, it might not always be appropriate depending on execution frequency and plan complexity
Optimize for unknown hints will give you the bad variable guess we saw at the very beginning of this post

We’re going to need more clever and current ways to fix the issue. If you’re stuck on those things recompiling or unknown-ing, you’re stuck not only on bad ideas, but outdated bad ideas.

Like duck l’orange and Canadian whiskey.

Thanks for reading!

Going Further

Starting SQL: Query Plan Round Up

Posted on September 18, 2020May 15, 2022 by Erik Darling

Put On

There are lots of things that can be fine in one query plan, and horrible in another.

Likewise, a query plan might be great for one set of parameters, and terrible for another. That’s called parameter sniffing, and that’s what we’re going to talk about next week.

What I want you to take away from the last couple weeks of posts is some of the improvements that have come into query plans over the years, and some of the shortcomings that still exist.

Despite the shortcomings, query plans offer an important view into things. When viewed alongside metrics in other DMVs, the text of the query, indexes that are available, and any parameters passed to the query, you can get a fairly complete view of things.

Major Pain

Of course, you can’t get rabbit-holed on this stuff. As much detail as gets captured in actual plans and in the other sources I mentioned, they don’t tell you about important things like blocking or resource contention.

You can infer this a bit by looking at waits on the server, or by comparing CPU to duration.

If you see lots of LCK, THREADPOOL, or RESOURCE_SEMAPHORE waits, it might not be an issue of tuning a single query.

Likewise, if you see a lot of PAGEIOLATCH_xx waits, you may need to look at your RAM to data ratio (among other things).

We’re at the point now where we’re ready to dive into a tougher subject, and start answering questions like “why did my query suddenly get slower?”

Thanks for reading!

Going Further

Starting SQL: Things Your SQL Server Execution Plan Should Warn You About

Posted on September 17, 2020May 15, 2022 by Erik Darling

Mild Best Wishes

Microsoft has chosen some odd things to warn us about in query plans. For estimated and cached plans, I totally understand some of the limitations.

Though the DMVs for queries associated with query plans (both in the plan cache and query store) log some additional metrics about memory grants, spills, CPU, duration, reads, and more, none of that additional information ends up as feedback in the plan XML.

I’m not complaining, either. Modifying XML is stupid.

Bellyachin’

What could be in the XML is an extension of the current set of warnings, with maybe a bit more in Actual plans.

Query plans could warn if:

A non-SARGable predicate caused a scan
A non-SARGable/complex predicate caused a Filter
A residual predicate did a lot of reads
Eager Index Spools are over a certain number of rows
Performance Spools have an unfavorable rebind/rewind ratio
If modes switch between row and batch
Why an index wasn’t used
When estimated and actual rows or executions are way off
When joins have an OR clause
When parallel row distributions are uneven
Louder warnings for when something forces a query to run serially
Show multiple missing index requests when present
When operators execute more than once
When a statistic used has significant modifications
A roll up of per-operator I/O at the root node

Many of these things could be in both estimated and actual plans, and would really help people understand both why the things they do when writing queries can mess them up, and also when there might be a better way to do things.

A Professional With Standards

Is that a lot of stuff? Yes. But think about the warnings we get now, and the last time they were really helpful to you.

If they ever were, I can guarantee that they’ve been misleading more often.

All in all, people need a more informative query plan, and building this sort of instrumentation for end users also gives the robots in Azure, and future Intelligent Query Processing features some nice feedback.

Thanks for reading!

Going Further

Starting SQL: Things Your SQL Server Execution Plan Might Warn You About

Posted on September 16, 2020May 15, 2022 by Erik Darling

Oh look, XML

One of my favorite pieces of SQL Server documentation is the XML schema for query plans.

Whenever I’ve needed to figure something out to query the plan cache, or when I’ve been looking for stuff that might be interesting to pull out of it, this is where I’ve gone.

Especially when plan XML, it helps me rest a bit easier to know I’ve covered all the documented possibilities.

Of course, there’s no distinction in the documentation between what can be in the Estimated plan, and what’s reserved for Actual plans.

Such Document

While that’s inconvenient, you can figure most things out by separating plan elements into two groups:

Optimization time metrics
Runtime metrics

Here are the documented warnings available in query plans.

SQL Server Showplan XML — what was the robot’s name?

Things like spills and memory usage can only be known at runtime, when the optimizer’s estimates are put into play.

For many other things, the optimizer likely knows about and has to account for the warnings while it’s coming up with a query plan.

Warnings About Warnings

One thing to keep in mind about many of these warnings, whether they’re optimization-time or run-time, is that they’re trigger for things that might not matter, or even necessarily be true.

I’ve written in the past about silly execution plan warnings. I’m not going to re-write all those here.

But I do want you to reinforce a couple points I’ve made over the course of the series:

Estimated and cached plans miss details that make troubleshooting easier
Not every metric and warning is a critical datapoint

For example, I’ve seen people focus on small spills in execution plans many times only to have them not be the cause of a performance problem.

Operator times make distinguishing this easier, of course. Prior to that, even actual plans could be misleading and unforgiving.

Thanks for reading!

Going Further

Starting SQL: How Eager Index Spool Operators Hurt Query Performance In SQL Server

Posted on September 14, 2020May 15, 2022 by Erik Darling

Hot To Trot

Most of the time when the optimizer thinks an index will make a big enough difference, it’ll tell you.

Sure, it’s not perfect, but it can get you where you’re going fast enough. If you’re relying on that sort of feedback in query plans, or in the missing index DMVs, you’ll hate this.

Not only does SQL Server create an index for you, it doesn’t really tell you about it. There’s no loud warning here.

It also throws that index away when the query is done executing. It only exists in tempdb while the query executes, and it’s only available to the query that builds it.

And boy, they sure can take a long time to build.

Let’s take a closer look!

Skidding Out

Eager index spools can occur on the inner side of Nested Loops joins to reduce the amount of work that needs to be done there, by creating a more opportune index for the loop to work off of.

That all sounds very good, but there are some problems:

The index gets created single-threaded
The way data is loaded into the spool is very inefficient
The spool is disposed of when the query is finished,
It’ll get built over and over again on later executions
There’s no missing index request for the spool anywhere

I’m not suggesting that the query would be faster without the spool. Reliable sources tell me that this thing runs for over 6 hours without it. My suggestion is that when you see Eager Index spools, you should pay close attention.

Let’s talk about how you can do that.

Mountainous

In some cases, your only option is to look at the Eager Index Spool to see what it’s doing, and create an index on your own to mimic it.

If you’re the kind of person who likes free scripts, sp_BlitzCache will look for Eager Index Spools in your query plans and do that for you. You’re welcome.

If you’re a more manual type, here’s what you do: Look at the Eager Index Spool.

The Seek Predicate(s) are they key columns, and the Output List is the included columns.

CREATE INDEX spool_b_gone 
ON dbo.Posts(ParentId) INCLUDE (Score);

Since the Id column is the clustered index, we don’t explicitly need it in the index definition — remember that nonclustered indexes inherit them. It’ll end up as a “hidden” key column, after all.

Human Touch

In most cases, this will be good enough. The performance difference will be night and day, if the build source for the Eager Index Spool was fairly large, or if your query built the same Eager Index Spool multiple times.

Though just like missing index requests, Eager Index Spools don’t always come up with the *best* index.

Thinking through our query, we may want to move Score up to the key of the index.

SELECT pq.OwnerUserId, pq.Score, pq.Title, pq.CreationDate,
       pa.OwnerUserId, pa.Score, pa.CreationDate
FROM dbo.Posts AS pa
INNER JOIN dbo.Posts AS pq
    ON pq.Id = pa.ParentId
WHERE pq.PostTypeId = 1
AND   pq.CommunityOwnedDate IS NULL
AND   pq.AnswerCount > 1
AND   pa.PostTypeId = 2
AND   pa.OwnerUserId = 22656
AND   pa.Score >
(
    SELECT MAX(ps.Score)
    FROM dbo.Posts AS ps
    WHERE ps.ParentId = pa.ParentId 
    AND   ps.Id <> pa.Id
)
ORDER BY pq.Id;

See that subquery at the very end, where we’re aggregating on Score? Having Score in the key of the index will put the data in order, which makes a Stream Aggregate pretty painless. Remember that Stream Aggregates expect sorted input.

That’s, like, how they stream.

CREATE INDEX spool_b_gone 
    ON dbo.Posts(ParentId, Score);

Letter To The Query Editor

There are some cases where changing a query is a lot less painful than changing or adding indexes.

You might already have a lot of indexes, or you might have a really big table, or you might be on Standard Edition, which is a lot like being on one of those airplanes where the entire thing is economy class.

In this case, we can rewrite the query in a way that avoids the Eager Index Spool entirely:

SELECT pq.OwnerUserId, pq.Score, pq.Title, pq.CreationDate,
        pa.OwnerUserId, pa.Score, pa.CreationDate
FROM dbo.Posts AS pa
INNER JOIN dbo.Posts AS pq
    ON pq.Id = pa.ParentId
WHERE pq.PostTypeId = 1
AND   pq.CommunityOwnedDate IS NULL
AND   pq.AnswerCount > 1
AND   pa.PostTypeId = 2
AND   pa.OwnerUserId = 22656
AND NOT EXISTS
(
    SELECT 1/0
    FROM dbo.Posts AS ps
    WHERE ps.ParentId = pa.ParentId
    AND   ps.Id <> pa.Id
    AND   ps.Score >= pa.Score
)
ORDER BY pq.Id;

Which gets us a different plan. And you can see why we’d want one.

Avoiding the need for an Eager Index Spool reduces the query time from over a minute to under a second.

Thanks for reading!

Going Further

Starting SQL: Blind Spots In SQL Server’s Missing Index Requests

Posted on September 11, 2020May 15, 2022 by Erik Darling

Underfoot

There are all sorts of things that might artificially keep the optimizer from suggesting a helpful index while you’re writing or tuning a query.

But go figure, there are also totally sensible things that will make it less likely that you’ll get one, and there are definitely things that the requests don’t consider as heavily.

That doesn’t mean they’re awful and wrong, it just means you should practice looking at the requests that get generated with execution plans to see if there’s anything you can tweak to make things run a little bit more smoothly.

Anyway, let’s learn some important lessons about missing index requests.

Lesson One: Inequalities Are Iffy

In my copy of the Stack Overflow database, both of these queries return the same single row.

There is only one comment with 1270(!!!) upvotes, so either query works to locate the data.

/*Inequality predicate*/
SELECT c.CreationDate,
       c.PostId,
       c.Score,
       c.Text,
       c.UserId
FROM dbo.Comments AS c
WHERE c.Score >= 1270 --Hello I'm here
AND   c.CreationDate >= '20110101'
AND   c.CreationDate <  '20120101'
ORDER BY c.CreationDate DESC;

/*Equality predicate*/
SELECT c.CreationDate,
       c.PostId,
       c.Score,
       c.Text,
       c.UserId
FROM dbo.Comments AS c
WHERE c.Score = 1270 --Hello I'm here
AND   c.CreationDate >= '20110101'
AND   c.CreationDate <  '20120101'
ORDER BY c.CreationDate DESC;

Strangely, only one query qualifies for a missing index request, and it’s not the first one.

The index request look like this:

CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[Comments] ([Score],[CreationDate])
INCLUDE ([PostId],[Text],[UserId])

Which is likely fine, as long as you’re okay with the Text column being in there. It’s an NVARCHAR(700), which makes me a little hesitant.

Not adding it, of course, means the optimizer would have to make a choice about Key Lookups vs. Clustered Index Scans, depending on predicates.

Databases are hard.

Lesson Two: Joins Don’t Get A Lot Of Love

If we add a join to the Posts table, the missing index request doesn’t change.

SELECT c.CreationDate,
       c.PostId,
       c.Score,
       c.Text,
       c.UserId
FROM dbo.Comments AS c
JOIN dbo.Posts AS p
    ON p.Id = c.PostId
    AND p.OwnerUserId = c.UserId
WHERE c.Score = 1270 --Hello I'm here
AND   c.CreationDate >= '20110101'
AND   c.CreationDate <  '20120101'
AND 1 = (SELECT 1)
ORDER BY c.CreationDate DESC;

We get the same missing index request as above, which might strike you as a little odd.

savage

Sometimes it’s nice to have join columns in the key of an index. Orderly data tends to join more easily, and can give the optimizer more efficient ways to do it.

It probably doesn’t matter a ton for Hash Joins, but Merge Joins expect data in order. If it’s not, the optimizer will introduce a Sort to get it in order. Likewise, Nested Loops without indexes can be painful.

Having columns as includes means it’s not stored in an order that might be helpful. In fact, it’s not stored in any order at all.

Lesson Three: Neither Does Order By

In our original query, the CreationDate column, by coincidence, was both in the WHERE and ORDER BY clauses.

It ended up in the key columns of the index because of the WHERE clause, though. If we remove it from there, it’ll move to the included columns.

SELECT c.CreationDate,
       c.PostId,
       c.Score,
       c.Text,
       c.UserId
FROM dbo.Comments AS c
WHERE c.Score = 1270 --Hello I'm here
ORDER BY c.CreationDate DESC;

SQL Server Missing Index Request — now you’re getting sued

Why Are We Talking About Indexes Again?

If I do enough drilling, hopefully you’ll understand how important they are to performance.

What I want you to take away from this post is that when you’re looking specifically at missing index requests, whether they’re in query plans or in the missing index DMVs:

They’re not going to consider columns outside of the WHERE clause to be in the key of the index
If your where clause doesn’t have an equality predicate, it’s a lot harder to get missing index requests
Columns that would be helpful to have in index order won’t end up in the key if they’re not in the WHERE
Even the order of columns suggested for being in the key of the index isn’t scientific

I covered a lot of the “why” in these posts:

Without them, it may look weird to spend time talking about the limitations here. But this is why databases still need humans to care for them.

As much as I look forward to a day where they don’t, little things like which columns should be in the key of an index, and the order they should be in, are still important considerations.

You know. Little things.

Thanks for reading!

Going Further

Starting SQL: Why Wouldn’t An Index Help Your SQL Server Query Go Faster?

Posted on September 10, 2020May 14, 2022 by Erik Darling

Pennies From Redmond

We’re spoiled over here in SQL Server land. When we run queries, the optimizer will suggest indexes that might help it run faster. I say might, because sometimes it’s wrong. Like I mentioned before, these suggestions are a bit hasty.

If you read the limitations, the first thing they say is “It is not intended to fine tune an indexing configuration”.

And they’re not. But you know what? If you don’t have anyone who is fine tuning an indexing configuration, they’re a heck of a lot better than a whole lot of nothing.

Are You There, Optimizer?

Starting with no nonclustered indexes, and a clustered index on an unrelated column, this query is a prime candidate for an index. And yet, the optimizer has forsaken us in our time of need.

SELECT COUNT_BIG(*) AS records
FROM dbo.Comments AS c
WHERE COALESCE(c.UserId, 0) = 0;

This isn’t specific to COALESCE, any ol’ function — built-in or otherwise — would do it. But the optimizer doesn’t think an index would make this query any faster. That’s a real bummer.

You may hear people say things like “you can’t use indexes with functions”, but that sentence is incomplete. What’s meant is “you can’t use indexes to seek with functions”.

Going Boing

This goes back to the concept of SARGability that I talked about before. If we do a little bit more typing, the optimizer is our friend again. I know, typing more sucks, but you’ll thank me later.

SELECT COUNT_BIG(*) AS records
FROM dbo.Comments AS c
WHERE ( c.UserId = 0 
        OR c.UserId IS NULL );

The optimizer is back to having good ideas for us. But notice that both of these queries end up taking the same amount of time and doing the same amount of work.

If you were just getting started writing queries, performance tuning, or looking at query plans, you might see this and assume that query performance would remain identical.

It’s sort of a matter of opportunity, here. Like I said before, when you wrap predicates in functions, you lose the ability to seek into indexes.

Sure, you could use an index on UserId in either query, but it doesn’t usually fix a whole lot if you need to scan the entire thing.

Copying In

If we add in the index on UserId and compare plans, the difference is obvious.

Our clearly-expressed query does a lot better here. It’s able to seek, and even running with a serial plan is much faster than the parallel plan for the poorly-expressed query. These things do matter to performance. Functions, even built-in ones, have no relational meaning to the optimizer, and can’t be reasoned with.

When it comes to writing queries, it usually pays to over-communicate. When you take shortcuts, you might not get the best possible outcome.

Tomorrow, we’ll look at hidden missing index requests.

Thanks for reading!