Starting SQL: Fixing Parameter Sensitivity Problems With SQL Server Queries

Mostoftentimes


There is a good solution to alleviating parameter sniffing. Ones I commonly use are:

  • Temp tables
  • Changing indexes
  • Dynamic SQL

There isn’t really a good place to put a temp table in this stored proc, though I did blog about a similar technique before.

It would certainly be a good candidate for index changes though, because the first thing we need to address is that key lookup.

It’s a sensitive issue.

King Index


We’re going to walk through something I talked about what seems like an eternity ago. Why? Because it has practical application here.

When you look at the core part of the query, PostId is only in the select list. Most advice around key lookups (including, generally, my own) is to consider putting columns only in the output into the includes of the index.

SQL Server Quer Plan Tool Tip
and that’s where this is

But we’re in a slightly different situation, here.

SELECT v.VoteTypeId,
       v.PostId,
       COUNT_BIG(v.PostId) AS TotalPosts,
       COUNT_BIG(DISTINCT v.PostId) AS UniquePosts
FROM dbo.Votes AS v
WHERE v.CreationDate >= DATEADD(YEAR, (-1 * @YearsBack), '2014-01-01')
AND   v.VoteTypeId = @VoteTypeId
GROUP BY v.VoteTypeId,
         v.PostId

We’re getting a distinct count, and SQL Server has some choices for coming up with that.

If we follow the general advice here and create this index, we’ll end up in trouble:

CREATE INDEX v 
    ON dbo.Votes(VoteTypeId, CreationDate) INCLUDE(PostId)
    WITH (DROP_EXISTING = ON);
SQL Server Query Plan
growling intensifies

Since the Stream Aggregate expects ordered data, and PostId isn’t in order in the index (because includes aren’t in any particular order), we need to sort it. For a small amount of data, that’s fine. For a large amount of data, it’s not.

There is a second Sort in the plan further down, but it’s on the count expression, which means we can’t index it without adding in additional objects, like an indexed view.

SELECT TOP (1000) 
    x.VoteTypeId,
    x.PostId,
    x.TotalPosts,
    x.UniquePosts
FROM 
    (
        SELECT v.VoteTypeId,
               v.PostId,
               COUNT_BIG(v.PostId) AS TotalPosts, -- this is the expression
               COUNT_BIG(DISTINCT v.PostId) AS UniquePosts
        FROM dbo.Votes AS v 
        WHERE v.CreationDate >= DATEADD(YEAR, (-1 * @YearsBack), '2014-01-01')
        AND   v.VoteTypeId = @VoteTypeId
        GROUP BY v.VoteTypeId,
                 v.PostId
    ) AS x
ORDER BY x.TotalPosts DESC; -- this is the ordering

What’s An Index To A Non-Believer?


A better index in this case looks like this:

CREATE INDEX v 
    ON dbo.Votes(VoteTypeId, PostId, CreationDate)
    WITH (DROP_EXISTING = ON);

It will shave about 6 seconds off the run time, but there’s still a problem when the “big” data doesn’t go parallel:

SQL Server Query Plan
big data differences

When the plan goes parallel, it’s about 4x faster than the serial version. Now I know what you’re thinking, here. We could use OPTIMIZE FOR to always get the plan for the big value. And that’s not a horrible idea — the small data parameter runs very quickly re-using the parallel plan here — but there’s another way.

Let’s look at our data.

Don’t Just Stare At It


Let’s draw an arbitrary line. I think a million is a popular number. I wish it was a popular number in my bank account, but you know.

SQL Server Query Results
unwritten law

I know we’re ignoring the date column data, but this is good enough for now. There’s only so much I can squeeze into one blog post.

The point here is that we’re going to say that anything under a million rows is okay with using the small plan, and anything over a million rows needs the big plan.

Sure, we might need to refine that later if there are outliers within those two groups, but this is a blog post.

How do we do that? We go dynamic.

Behike 54


Plan ol’ IF branches plan ol’ don’t work. We need something to get two distinct plans that are re-usable.

Here’s the full procedure:

CREATE OR ALTER PROCEDURE dbo.VoteCount (@VoteTypeId INT, @YearsBack INT)
AS
BEGIN

DECLARE @sql NVARCHAR(MAX) = N'';

SET @sql += N'
SELECT TOP (1000) 
    x.VoteTypeId,
    x.PostId,
    x.TotalPosts,
    x.UniquePosts
/*dbo.VoteCount*/
FROM 
    (
        SELECT v.VoteTypeId,
               v.PostId,
               COUNT_BIG(v.PostId) AS TotalPosts,
               COUNT_BIG(DISTINCT v.PostId) AS UniquePosts
        FROM dbo.Votes AS v 
        WHERE v.CreationDate >= DATEADD(YEAR, (-1 * @YearsBack), ''2014-01-01'')
        AND   v.VoteTypeId = @VoteTypeId '

IF @VoteTypeId IN (2, 1, 3, 5, 10, 6)
BEGIN
    SET @sql += N'
        AND 1 = (SELECT 1)'
END

IF @VoteTypeId IN (16, 15, 11, 8, 9, 7, 12, 4)
BEGIN
    SET @sql += N'
        AND 2 = (SELECT 2)'
END

SET @sql += N'
        GROUP BY v.VoteTypeId,
                 v.PostId
    ) AS x
ORDER BY x.TotalPosts DESC;
';

RAISERROR('%s', 0, 1, @sql) WITH NOWAIT;

EXEC sys.sp_executesql @sql, 
                       N'@VoteTypeId INT, @YearsBack INT', 
                       @VoteTypeId, @YearsBack;

END;

There’s a bit going on in there, but the important part is in the middle. This is what will give use different execution plans.

IF @VoteTypeId IN (2, 1, 3, 5, 10, 6)
BEGIN
    SET @sql += N'
        AND 1 = (SELECT 1)'
END

IF @VoteTypeId IN (16, 15, 11, 8, 9, 7, 12, 4)
BEGIN
    SET @sql += N'
        AND 2 = (SELECT 2)'
END

Sure, there are other ways to do this. You could even selectively recompile if you wanted to. But some people complain when you recompile. It’s cheating.

Because the SQL Server Query Optimizer typically selects the best execution plan for a query, we recommend only using hints as a last resort for experienced developers and database administrators.

See? It’s even documented.

Now that we’ve got that all worked out, we can run the proc and get the right plan depending on the amount of data we need to shuffle around.

SQL Server Query Plan
strangers in the night

Little Star


Now I know what you’re thinking. You wanna know more about that dynamic SQL. You want to solve performance problems and have happy endings.

We’ll do that next week, where I’ll talk about common issues, best practices, and more tricks you can use to get queries to perform better with it.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Starting SQL: How Parameters Can Change Which Indexes SQL Server Chooses

Down And Out


There are lots of different ways that parameter sniffing can manifest in both the operators chosen, the order of operators chosen, and the resources acquired by a query when a plan is compiled. At least in my day-to-day consulting, one of the most common reasons for plans being disagreeable is around insufficient indexes.

One way to fix the issue is to fix the index. We’ll talk about a way to do it without touching the indexes tomorrow.

Let’s say we have this index to start with. Maybe it was good for another query, and no one ever thought twice about it. After all, you rebuild your indexes every night, what other attention could they possible need?

CREATE INDEX v 
    ON dbo.Votes(VoteTypeId, CreationDate);

If we had a query with a where clause on those two columns, it’d be be able to find data pretty efficiently.

But how much data will it find? How many of each VoteTypeId are there? What range of dates are we looking for?

Well, that depends on our parameters.

Cookie Cookie


Here’s our stored procedure. There’s one column in it that isn’t in our index. What a bummer.

CREATE OR ALTER PROCEDURE dbo.VoteCount (@VoteTypeId INT, @YearsBack INT)
AS
BEGIN

SELECT TOP (1000) 
    x.VoteTypeId,
    x.PostId,
    x.TotalPosts,
    x.UniquePosts
FROM 
    (
        SELECT v.VoteTypeId,
               v.PostId,
               COUNT_BIG(v.PostId) AS TotalPosts,
               COUNT_BIG(DISTINCT v.PostId) AS UniquePosts
        FROM dbo.Votes AS v
        WHERE v.CreationDate >= DATEADD(YEAR, (-1 * @YearsBack), '2014-01-01')
        AND   v.VoteTypeId = @VoteTypeId
        GROUP BY v.VoteTypeId,
                 v.PostId
    ) AS x
ORDER BY x.TotalPosts DESC;

END;

That doesn’t matter for a small amount of data, whether it’s encountered because of the parameters used, or the size of the data the procedure is developed and tested against. Testing against unrealistic data is a recipe for disaster, of course.

Cookie Cookie


What can be tricky is that if the sniffing is occurring with the lookup plan, the optimizer won’t think enough of it to request a covering index, either in plan or in the index DMVs. It’s something you’ll have to figure out on your own.

SQL Server Query Plan
i said me too
SQL Server Query Plan Tool Tip
oh yeah that

So we need to add that to the index, but where? That’s an interesting question, and we’ll answer it in tomorrow’s post.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Starting SQL: Why Is My SQL Server Query Suddenly Slower Than It Was Yesterday?

Pick A Database, Any Database


Alright, maybe not any database. Let’s stick with SQL Server. That’s the devil we know.

At some point in your life, you’re going to construct a query that takes user input, and that input is likely going to come in the form of a parameter.

It could be a stored procedure, dynamic SQL, or something from your application. But there it is.

Waiting. Watching.

Sniffing.

Defining A Problem


When we use parameters, we re-use execution plans, at least until a Qualifying Event™ occurs.

What’s a qualifying event?

  • Recompile hint
  • Stats update
  • Temp table modification threshold
  • Plan eviction
  • Server restart

Now, it might be reasonable to think that a mature optimizer — and it is an optimizer, not just a planner — would be able to do something a bit more optimal. After all, why would anyone think it would take the same amount of work to get through 100 rows as it would take to get through 1,000,000 rows? It’s a fundamentally different approach.

Doing a run to the grocery store to replenish a few things requires a far different mindset from going to prepare for a large family meal. You have to choose between a basket or a cart, whether you can jump right to the couple spots you need or you need to walk up and down every aisle, and even if you might need to write down a list because it doesn’t fit into brain memory.

One might also have the expectation that if a significant inaccuracy is detected at runtime, the strategy might change. While that does sort of happen with Adaptive Joins, it’s not a full plan rewrite.

Detecting A Problem


The plan cache usually sucks for this, unless you’re saving the data off to more stable tables. Why? Because most people only figure out they’ve been sniffed after a plan changes, which means it’s not in the cache anymore. You know, when end users start complaining, the app goes unresponsive, you can’t connect to the server, etc.

You could set your watch to it.

But sometimes it’s there. Some funny looking little plan that looks quite innocent, but seems to do a lot of work when you bang it up against other DMVs.

If you have the luxury, Query Store is quite a better tool for detecting plan changes. It’s even got reports built in just for that.

SQL Server Query Store
how nice of you.

For the extra fancy amongst you, I pray that your expensive monitoring tool has a way to tell you when query plans change, or when normally fast plans deviate from that.

Deciphering A Problem


This is where things can get difficult, unless you’re monitoring or logging information. You typically need a few different combinations of parameter values to feed in to your query, so you can see what changed and when. Quite often, there’s no going back easily.

Let’s say you had a plan, and it was a good plan. Then one of those pesky qualifying events comes along, and it’s decided that you need a new plan.

And what if… that new plan is worse? No matter how much you recompile or update stats or toggle with cardinality estimation, you just can’t go back to the way things were without lots of hints or changes to the query? Maybe that’s not parameter sniffing. Maybe that’s parameter snuffing. I’m gonna copyright that.

Most parameter sniffing will result in a plan with a set of bad choices for different amounts of data, which will result in something like this:

SQL Server Query Plan
not crafty

This isn’t a “bad estimate” — it was a pretty good estimate for the first parameter value. It just wasn’t a good estimate for the second parameter value.

And to be honest, quite a bit of parameter sniffing issues come from Nested Loops. Not because it’s bad, but because it’s bad for large amount of data, especially in a serial plan. It’s a pretty easy way to gum up a query, though. Make it get stuck in a loop for 13 million rows. It wasn’t fast? No kidding. Poof, be gone.

But then opposite-land isn’t good, either.

SQL Server Query Plan
like falling, baby

This plan probably makes plenty of sense for a big chunk of data. One big scan, one big hash, one big sort. Done.

Of course, for a small amount of data, we go from taking 1ms to taking 2s. Small amount of data people will likely not be happy with that. Your server might not be either, what with all the extra CPU resources we’re using in this here parallel plan all the time now.

Tomorrow, we’ll look at how sometimes you can fix parameter sniffing with better indexes.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Starting SQL: What Happens When You Don’t Parameterize SQL Server Queries?

Nothing, Nothing, Nothing


Aside from the obvious perils of SQL Injection, parameterizing queries can be helpful in other ways.

Not to downplay SQL Injection at all, it’s just that fixing code is only one part of the equation. Vendor applications often need to do some pretty administrative things, and require elevated permissions.

Just last week I was looking at an application that had a stored procedure which took a parameter that was the name of an executable. There were no checks on the name. You could put in quite not-figuratively anything you wanted, and xp_cmdshell would run it.

All the parameterization in the world won’t help that.

Well, you get what you get.

Of Parameters And Plans


This can happen in three common ways:

  • You have an interface that accepts user-entered values
  • You have an ORM where you haven’t explicitly defined parameter types and precisions
  • You have dynamic SQL where values are concatenated into strings

Since I spend all my time in the database, I’m going to show you the third one. I wouldn’t normally format a query like this, but I’m trying to keep it web-friendly.

DECLARE @StartDate DATETIME = '20130101';
WHILE @StartDate < '20140101'
BEGIN
DECLARE @NoParams4u NVARCHAR(MAX) = 
N'
SELECT SUM(c.Score) AS TotalScore
FROM dbo.Comments AS c
WHERE c.CreationDate 
    BETWEEN CONVERT(DATETIME, ''' + RTRIM(@StartDate) + ''') 
    AND CONVERT(DATETIME, ''' + RTRIM(DATEADD(DAY, 11, @StartDate)) + ''')
AND 1 = (SELECT 1);
';

EXEC sys.sp_executesql @NoParams4u;
RAISERROR('%s', 0, 1, @NoParams4u);

SET @StartDate = DATEADD(DAY, 11, @StartDate);

END
GO

To make things interesting, I’ve created a non-covering index on the Comments table:

CREATE INDEX c ON dbo.Comments(CreationDate);

I’ve also had to use an oddly specific number of day increment in order to get some good plan variety because of that index. If you ever wonder why some blog posts take three hours to write, the most likely cause is finding the right number.

The first thing you should notice is that even using sp_executesql doesn’t help when our code isn’t parameterized. The second thing you’ll probably notice is 1 = (SELECT 1) at the end of the query. Past me has answers to all those questions, if you click on the links.

Bouquet


If I run that loop, I get back 34 results. Each query gets an individually compiled query plan, though there are only three “different” plans used.

SQL Server Query Plan
eyeful

Notice that each plan has different literal date values passed in to it, and different estimates. If we created a covering index, we’d get plan stability across executions, but then we’d still have to compile it when these literal values get passed in. I did it like this to reinforce my point.

We can validate that by looking in the plan cache and surrounding DMVs, using sp_BlitzCache.

Planimal Activist


If you have code like this, one good way to find culprits is by running it like this:

EXEC sp_BlitzCache @SortOrder = 'query hash';

A zoomed-in snapshot of the part of the results that prove my point looks about like this:

SQL Server Query Results
aye yi yi

This is just the top 10 results. You can see a warning about multiple plans, and that each query has a single execution.

I mentioned before that if we add a covering index, we’ll get plan stability. That’s true, but specific to this demo on SQL Server 2019, there’s an alternate plan available for executions that qualify for Batch Mode On Rowstore (BMOR):

SQL Server Query Plan
i’m in it

In prior versions of SQL Server in row mode only plans, we could only have a stream aggregate. But even getting the same plan most of the time, we still need to compile it every time. SQL Server still thinks each of these queries is “new” and needs to get a plan compiled.

Fixing It


There are two options for fixing this. If you need to do it quickly, at scale, the database level option FORCED PARAMETERIZATION can take care of most of these problems. Just make sure you read up on the limitations. To fix it for just a couple problem queries, you need to fix the dynamic SQL.

DECLARE @NoParams4u NVARCHAR(MAX) = 
N'
SELECT SUM(c.Score) AS TotalScore
FROM dbo.Comments AS c
WHERE c.CreationDate BETWEEN @StartDate AND DATEADD(DAY, 11, @StartDate)
AND 1 = (SELECT 1);
';

EXEC sys.sp_executesql @NoParams4u, N'@StartDate DATETIME', @StartDate;

I’m not concatenating values into the string anymore, and I’m passing the @StartDate value in when I execute the dynamic SQL.

One point I want to make is that it’s generally safe to do date math on the parameter. I’m not doing date math on the column, which would generally be a bad idea.

But anyway, now our plan gets used 34 times.

SQL Server Query Results
damn family

In this case, plan reuse works out well. Every query is looking at a sufficiently narrow range of data to have it not matter, and the longest running execution is around 31ms.

But what about when that doesn’t work out? When can parameterization backfire? We’ll find out tomorrow!

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Starting SQL: What’s The Difference Between Parameters And Local Variables In SQL Server?

Parameter Positive


There are many good reasons to parameterize a query, but there are also trade-offs. There’s no such thing as a free parameter, as they say.

In this post, we’re going to discuss what is and isn’t a parameter, and some of the pros and cons.

What’s important to keep in mind is that good indexing can help avoid many of the cons, but not all. Bad indexing, of course, causes endless problems.

There are many good reasons to parameterize your queries, too. Avoiding SQL injection is a very good reason.

But then!

What’s Not A Parameter


It can be confusing to people who are getting started with SQL Server, because parameters and variables look exactly the same.

They both start with @, and feel pretty interchangeable. They behave the same in many ways, too, except when it comes to cardinality estimation.

To generalize a bit, though, something is a parameter if it belongs to an object. An object can be an instance of:

  • A stored procedure
  • A function
  • Dynamic SQL

Things that aren’t parameters are things that come into existence when you DECLARE them. Of course, you can pass things you declare to one of the objects above as parameters. For example, there’s a very big difference between these two blocks of code:

DECLARE @VoteTypeId INT = 7;

SELECT *
FROM dbo.Votes AS v
WHERE v.VoteTypeId = @VoteTypeId;

DECLARE @sql NVARCHAR(MAX) = N'
SELECT *
FROM dbo.Votes AS v
WHERE v.VoteTypeId = @VoteTypeId;
'
EXEC sp_executesql @sql, N'@VoteTypeId INT', @VoteTypeId;

But it’s not obvious until you look at the query plans, where the guess for the declared variable is god awful.

Then again, if you read the post I linked to up there, you already knew that. Nice how that works.

If you’re too lazy to click, I’m too lazy to repeat myself.

SQL Server Query Plan
thanks

What’s the point? Variables, things you declare, are treated differently from parameters, things that belong to a stored procedure, function, or dynamic SQL.

Parameter Problems


The problem with parameterization is one of familiarity. It not only breeds contempt, but… sometimes data just grows apart.

Really far apart.

SELECT 
    v.VoteTypeId, 
    COUNT_BIG(*) AS records
FROM dbo.Votes AS v
GROUP BY v.VoteTypeId
ORDER BY records;
SQL Server Query Results
pattern forming

Natural Selection


When you parameterize queries, you give SQL Server permission to remember, and more importantly, to re-use.

What it re-uses is the execution plan, and what it remembers are cardinality estimates. If we do something like this, we don’t get two different execution plans, or even two different sets of guesses, even though the values that we’re feeding to each query have quite different distributions in our data.

The result is two query plans that look quite alike, but behave quite differently.

SQL Server Query Plan
wist

One takes 23 milliseconds. The other takes 1.5 seconds. Would anyone complain about this in real life?

Probably not, but it helps to illustrate the issue.

Leading Miss


Where this can get confusing is when you’re trying to diagnose a performance problem.

If you look in the plan cache, or in query store, you’ll see the plan that gets cached for the very first parameter. It’ll look simple and innocent, sure. But the problem is with a totally different parameter that isn’t logged anywhere.

You might also face a different problem, where the query recompiles because you restarted the server, updated stats, rebuilt indexes, or enough rows in the table changed to trigger an automatic stats update. If any of those things happen, the optimizer will wanna come up with a new plan based on whatever value goes in first.

If the roles get reversed, the plan will change, but they’ll both take the same amount of time now.

DECLARE @VoteTypeId INT;

SET @VoteTypeId = 16

DECLARE @sql NVARCHAR(MAX) = N'
SELECT *
FROM dbo.Votes AS v
WHERE v.VoteTypeId = @VoteTypeId;
';
EXEC sp_executesql @sql, N'@VoteTypeId INT', @VoteTypeId;

SET @VoteTypeId = 7;

SET @sql = N'
SELECT *
FROM dbo.Votes AS v
WHERE v.VoteTypeId = @VoteTypeId;
';
EXEC sp_executesql @sql, N'@VoteTypeId INT', @VoteTypeId;
SQL Server Query Plan
totin’

Deal With It ?


In the next few posts, we’ll talk about what happens when you don’t parameterize queries, and different ways to deal with parameter sniffing.

  • A recompile hint can help, it might not always be appropriate depending on execution frequency and plan complexity
  • Optimize for unknown hints will give you the bad variable guess we saw at the very beginning of this post

We’re going to need more clever and current ways to fix the issue. If you’re stuck on those things recompiling or unknown-ing, you’re stuck not only on bad ideas, but outdated bad ideas.

Like duck l’orange and Canadian whiskey.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Starting SQL: Query Plan Round Up

Put On


There are lots of things that can be fine in one query plan, and horrible in another.

Likewise, a query plan might be great for one set of parameters, and terrible for another. That’s called parameter sniffing, and that’s what we’re going to talk about next week.

What I want you to take away from the last couple weeks of posts is some of the improvements that have come into query plans over the years, and some of the shortcomings that still exist.

Despite the shortcomings, query plans offer an important view into things. When viewed alongside metrics in other DMVs, the text of the query, indexes that are available, and any parameters passed to the query, you can get a fairly complete view of things.

Major Pain


Of course, you can’t get rabbit-holed on this stuff. As much detail as gets captured in actual plans and in the other sources I mentioned, they don’t tell you about important things like blocking or resource contention.

You can infer this a bit by looking at waits on the server, or by comparing CPU to duration.

If you see lots of LCK, THREADPOOL, or RESOURCE_SEMAPHORE waits, it might not be an issue of tuning a single query.

Likewise, if you see a lot of PAGEIOLATCH_xx waits, you may need to look at your RAM to data ratio (among other things).

We’re at the point now where we’re ready to dive into a tougher subject, and start answering questions like “why did my query suddenly get slower?”

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Starting SQL: Things Your SQL Server Execution Plan Should Warn You About

Mild Best Wishes


Microsoft has chosen some odd things to warn us about in query plans. For estimated and cached plans, I totally understand some of the limitations.

Though the DMVs for queries associated with query plans (both in the plan cache and query store) log some additional metrics about memory grants, spills, CPU, duration, reads, and more, none of that additional information ends up as feedback in the plan XML.

I’m not complaining, either. Modifying XML is stupid.

Bellyachin’


What could be in the XML is an extension of the current set of warnings, with maybe a bit more in Actual plans.

Query plans could warn if:

  • A non-SARGable predicate caused a scan
  • A non-SARGable/complex predicate caused a Filter
  • A residual predicate did a lot of reads
  • Eager Index Spools are over a certain number of rows
  • Performance Spools have an unfavorable rebind/rewind ratio
  • If modes switch between row and batch
  • Why an index wasn’t used
  • When estimated and actual rows or executions are way off
  • When joins have an OR clause
  • When parallel row distributions are uneven
  • Louder warnings for when something forces a query to run serially
  • Show multiple missing index requests when present
  • When operators execute more than once
  • When a statistic used has significant modifications
  • A roll up of per-operator I/O at the root node

Many of these things could be in both estimated and actual plans, and would really help people understand both why the things they do when writing queries can mess them up, and also when there might be a better way to do things.

A Professional With Standards


Is that a lot of stuff? Yes. But think about the warnings we get now, and the last time they were really helpful to you.

If they ever were, I can guarantee that they’ve been misleading more often.

All in all, people need a more informative query plan, and building this sort of instrumentation for end users also gives the robots in Azure, and future Intelligent Query Processing features some nice feedback.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Starting SQL: Things Your SQL Server Execution Plan Might Warn You About

Oh look, XML


One of my favorite pieces of SQL Server documentation is the XML schema for query plans.

Whenever I’ve needed to figure something out to query the plan cache, or when I’ve been looking for stuff that might be interesting to pull out of it, this is where I’ve gone.

Especially when plan XML, it helps me rest a bit easier to know I’ve covered all the documented possibilities.

Of course, there’s no distinction in the documentation between what can be in the Estimated plan, and what’s reserved for Actual plans.

Such Document


While that’s inconvenient, you can figure most things out by separating plan elements into two groups:

  • Optimization time metrics
  • Runtime metrics

Here are the documented warnings available in query plans.

SQL Server Showplan XML
what was the robot’s name?

Things like spills and memory usage can only be known at runtime, when the optimizer’s estimates are put into play.

For many other things, the optimizer likely knows about and has to account for the warnings while it’s coming up with a query plan.

Warnings About Warnings


One thing to keep in mind about many of these warnings, whether they’re optimization-time or run-time, is that they’re trigger for things that might not matter, or even necessarily be true.

I’ve written in the past about silly execution plan warnings. I’m not going to re-write all those here.

But I do want you to reinforce a couple points I’ve made over the course of the series:

  • Estimated and cached plans miss details that make troubleshooting easier
  • Not every metric and warning is a critical datapoint

For example, I’ve seen people focus on small spills in execution plans many times only to have them not be the cause of a performance problem.

Operator times make distinguishing this easier, of course. Prior to that, even actual plans could be misleading and unforgiving.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Starting SQL: I Don’t Care How Many Rows Are In Your Table Variable, They Can Still Hurt Performance

Well, well, well


So you’re that odd soul who has been listening to rumors about table variables. Perhaps things about them only being in memory, or that they’re okay to use if you only put less than some arbitrary number of rows in them.

Those things are both wrong. But of course, my favorite rumor is the one about arbitrary numbers of rows being safe.

Ouch! What a terrible performance


Let’s do everything in our power to help SQL Server make a good guess.

We’ll create a couple indexes:

CREATE INDEX free_food ON dbo.Posts(OwnerUserId);
CREATE INDEX sea_food ON dbo.Comments(UserId);

Those stats’ll be so fresh you could make tartare with them.

We’ll create our table variable with a primary key on it, which will also be the clustered index.

DECLARE @t TABLE( id INT PRIMARY KEY );
INSERT @t ( id )
VALUES(22656);

And finally, we’ll run the select query with a recompile hint. Recompile fixes everything, yeah?

SELECT AVG(p.Score * 1.) AS lmao
FROM   @t AS t
JOIN   dbo.Posts AS p
    ON p.OwnerUserId = t.id
JOIN   dbo.Comments AS c
    ON c.UserId = t.id
OPTION(RECOMPILE);
GO

How does the query do for time? Things start off okay, but keep the cardinality estimate in mind.

SQL Server Query Plan
Get the hook

But quickly go downhill.

SQL Server Query Plan
Your granny lied!

Fish are dumb, dumb, dumb


The whole problem here is that, even with just one row in the table variable, an index on the one column in the table variable, and a recompile hint on the query that selects from the table variable, the optimizer has no idea what the contents of that single row are.

That number remains a mystery, and the guess made ends up being wrong by probably more than one order of magnitude. Maybe even an order of manure.

Table variables don’t gather any statistical information about what’s in the column, and so has no frame of reference to make a better cardinality estimate on the joins.

If we insert a value that gets far fewer hits in both the Posts and Comments tables (12550), the estimate doesn’t really hurt. But note that the guesses across all operators are exactly the same.

SQL Server Query Plan
You don’t swing it like you used to, man

C’est la vie mon ami


You have a database. Data is likely skewed in that database, and there are already lots of ways that you can get bad guesses. Parameter sniffing, out of date stats, poorly written queries, and more.

Databases are hard.

The point is that if you use table variables outside of carefully tested circumstances, you’re just risking another bad guess.

All of this is tested on SQL Server 2019, with table variable deferred compilation enabled. All that allows for is the number of rows guessed to be accurate. It makes no attempt to get the contents of those rows correct.

So next time you’re sitting down to choose between a temp table and a table variable, think long and hard about what you’re going to be doing with it. If cardinality esimation might be important, you’re probably going to want a temp table instead.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Starting SQL: How Eager Index Spool Operators Hurt Query Performance In SQL Server

Hot To Trot


Most of the time when the optimizer thinks an index will make a big enough difference, it’ll tell you.

Sure, it’s not perfect, but it can get you where you’re going fast enough. If you’re relying on that sort of feedback in query plans, or in the missing index DMVs, you’ll hate this.

Not only does SQL Server create an index for you, it doesn’t really tell you about it. There’s no loud warning here.

It also throws that index away when the query is done executing. It only exists in tempdb while the query executes, and it’s only available to the query that builds it.

And boy, they sure can take a long time to build.

SQL Server Query Plan
tick tock

Let’s take a closer look!

Skidding Out


Eager index spools can occur on the inner side of Nested Loops joins to reduce the amount of work that needs to be done there, by creating a more opportune index for the loop to work off of.

That all sounds very good, but there are some problems:

  • The index gets created single-threaded
  • The way data is loaded into the spool is very inefficient
  • The spool is disposed of when the query is finished,
  • It’ll get built over and over again on later executions
  • There’s no missing index request for the spool anywhere

I’m not suggesting that the query would be faster without the spool. Reliable sources tell me that this thing runs for over 6 hours without it. My suggestion is that when you see Eager Index spools, you should pay close attention.

Let’s talk about how you can do that.

Mountainous


In some cases, your only option is to look at the Eager Index Spool to see what it’s doing, and create an index on your own to mimic it.

If you’re the kind of person who likes free scripts, sp_BlitzCache will look for Eager Index Spools in your query plans and do that for you. You’re welcome.

If you’re a more manual type, here’s what you do: Look at the Eager Index Spool.

SQL Server Query Plan Tool Tip
ham ham ham ham

The Seek Predicate(s) are they key columns, and the Output List is the included columns.

CREATE INDEX spool_b_gone 
ON dbo.Posts(ParentId) INCLUDE (Score);

Since the Id column is the clustered index, we don’t explicitly need it in the index definition — remember that nonclustered indexes inherit them. It’ll end up as a “hidden” key column, after all.

Human Touch


In most cases, this will be good enough. The performance difference will be night and day, if the build source for the Eager Index Spool was fairly large, or if your query built the same Eager Index Spool multiple times.

Though just like missing index requests, Eager Index Spools don’t always come up with the *best* index.

Thinking through our query, we may want to move Score up to the key of the index.

SELECT pq.OwnerUserId, pq.Score, pq.Title, pq.CreationDate,
       pa.OwnerUserId, pa.Score, pa.CreationDate
FROM dbo.Posts AS pa
INNER JOIN dbo.Posts AS pq
    ON pq.Id = pa.ParentId
WHERE pq.PostTypeId = 1
AND   pq.CommunityOwnedDate IS NULL
AND   pq.AnswerCount > 1
AND   pa.PostTypeId = 2
AND   pa.OwnerUserId = 22656
AND   pa.Score >
(
    SELECT MAX(ps.Score)
    FROM dbo.Posts AS ps
    WHERE ps.ParentId = pa.ParentId 
    AND   ps.Id <> pa.Id
)
ORDER BY pq.Id;

See that subquery at the very end, where we’re aggregating on Score? Having Score in the key of the index will put the data in order, which makes a Stream Aggregate pretty painless. Remember that Stream Aggregates expect sorted input.

That’s, like, how they stream.

CREATE INDEX spool_b_gone 
    ON dbo.Posts(ParentId, Score);

Letter To The Query Editor


There are some cases where changing a query is a lot less painful than changing or adding indexes.

You might already have a lot of indexes, or you might have a really big table, or you might be on Standard Edition, which is a lot like being on one of those airplanes where the entire thing is economy class.

In this case, we can rewrite the query in a way that avoids the Eager Index Spool entirely:

SELECT pq.OwnerUserId, pq.Score, pq.Title, pq.CreationDate,
        pa.OwnerUserId, pa.Score, pa.CreationDate
FROM dbo.Posts AS pa
INNER JOIN dbo.Posts AS pq
    ON pq.Id = pa.ParentId
WHERE pq.PostTypeId = 1
AND   pq.CommunityOwnedDate IS NULL
AND   pq.AnswerCount > 1
AND   pa.PostTypeId = 2
AND   pa.OwnerUserId = 22656
AND NOT EXISTS
(
    SELECT 1/0
    FROM dbo.Posts AS ps
    WHERE ps.ParentId = pa.ParentId
    AND   ps.Id <> pa.Id
    AND   ps.Score >= pa.Score
)
ORDER BY pq.Id;

Which gets us a different plan. And you can see why we’d want one.

SQL Server Query Plan
spoolishness

Avoiding the need for an Eager Index Spool reduces the query time from over a minute to under a second.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.