Execution Plans – Page 13

Encouraging Parallelism In SQL Server Nested Loops Join Query Plans

Posted on October 30, 2020May 16, 2022 by Erik Darling

‘Splainin

When you run a query, the optimizer has a lot to think about. One of those things is if the plan will benefit from parallelism.

That generally happens as long as:

The plan isn’t trivial — it has to receive full optimization
Nothing is artificially inhibiting parallelism (like scalar functions or table variable modifications)
If the serial plan cost is greater than the Cost Threshold For Parallelism (CTFP)

As long as all those qualifications are met, the optimizer will come up with competing parallel plans. If it locates a parallel plan that’s cheaper than the serial plan, it’ll get chosen.

This is determined at a high level by adding up the CPU and I/O costs of each operator in the serial plan, and doing the same for the parallel plan with the added costs of one or more parallel exchanges added in.

Yesterday we saw a case where the Gather Streams operator was costed quite highly, and it prevented a parallel plan from being chosen, despite the parallel plan in this case being much faster.

It’s important to note that costing for plans is not a direct reflection of actual time or effort, nor is it accurate to your local configuration.

They’re estimates used to come up with a plan. When you get an actual plan, there are no added-in “Actual Cost” metrics.

How Nested Loops Is Different

In merge or hash join plans, both sides of the join are part of the costing algorithm to decide if parallelism should be engaged.

An example with a hash join:

SQL Server Query Plan — parallel: 4.7 CPU query bucks

The estimated CPU cost of scanning the Posts table is reduced by 14 or so query bucks. The I/O cost doesn’t change at all.

In this case, it results in a parallel plan being naturally chosen, because the overall plan cost for the parallel plan is cheaper.

For Nested Loops, it’s different:

Slashing Prices

In Nested Loops plans, only the stuff on the outer side of the join experiences a cost reduction by engaging parallelism.

That means that if you’ve got a plan shaped like this that you need to go parallel, you need to figure out how to make the outside as expensive on CPU as possible.

In a lot of cases, you can use ORDER BY to achieve this because it can introduce a Sort operator into the query plan.

Of course, where that Sort operator ends up can change things.

For example, if I ask to order results by Reputation here:

SELECT u.Id, 
       u.DisplayName, 
       u.Reputation, 
       ca.Id, 
       ca.Type, 
       ca.CreationDate,
       ca.Text --Text in the select list
FROM dbo.Users AS u
OUTER APPLY
(
SELECT c.Id,
       DENSE_RANK() 
           OVER ( PARTITION BY c.PostId 
                  ORDER BY c.Score DESC ) AS Type,
        c.CreationDate,
        c.Text
FROM dbo.Comments AS c
WHERE c.UserId = u.Id
) AS ca
WHERE ca.Type = 0
ORDER BY u.Reputation DESC;

The Sort ends up before the join, and only applies to relatively few rows, and the plan stays serial.

But if I ask for something from inside of the cross apply to be ordered, the number of rows the optimizer expects to have to sort increases dramatically, and so does the cost.

SELECT u.Id, 
       u.DisplayName, 
       u.Reputation, 
       ca.Id, 
       ca.Type, 
       ca.CreationDate,
       ca.Text --Text in the select list
FROM dbo.Users AS u
OUTER APPLY
(
SELECT c.Id,
       DENSE_RANK() 
           OVER ( PARTITION BY c.PostId 
                  ORDER BY c.Score DESC ) AS Type,
        c.CreationDate,
        c.Text
FROM dbo.Comments AS c
WHERE c.UserId = u.Id
) AS ca
WHERE ca.Type = 0
ORDER BY ca.CreationDate DESC;

The additional cost on the outer side tilts the optimizer towards a parallel plan.

There’s No Such Thing As A Free Cool Trick™

This, of course, comes at a cost. While you do gain efficiency in the query finishing much faster, the Sort operator asks for a nightmare of memory.

If you have ~10GB of memory to spare for a memory grant, cool. This might be great.

Of course, there are other ways to control memory grants via hints and resource governor, etc.

In some cases, adding an index helps, but if we do that then we’ll lose the added cost and the parallel plan.

Like most things in life, it’s about tradeoffs.

Thanks for reading!

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

How String Data Can Hurt Query Performance In SQL Server

Posted on October 29, 2020May 16, 2022 by Erik Darling

Mistakes!

Strings cause nothing but problems in databases.

Standard vs. Unicode
Odd collations
Inflated memory grants
Oversized columns because of truncation
String splitting

And of course, sometimes they can cause plans to be costed strangely.

Strong Tradition

Working with the queries we’ve been tinkering with in all the posts this week, let’s look at a slightly different oddity.

/*Q1*/
SELECT u.Id, 
       u.DisplayName, 
       u.Reputation, 
       ca.Id, 
       ca.Type, 
       ca.CreationDate,
       ca.Text --Text in the select list
FROM dbo.Users AS u
OUTER APPLY
(
SELECT c.Id,
       DENSE_RANK() 
           OVER ( PARTITION BY c.PostId 
                  ORDER BY c.Score DESC ) AS Type,
        c.CreationDate,
        c.Text
FROM dbo.Comments AS c
WHERE c.UserId = u.Id
) AS ca
WHERE ca.Type = 0;


/*Q2*/
SELECT u.Id, 
       u.DisplayName, 
       u.Reputation, 
       ca.Id, 
       ca.Type, 
       ca.CreationDate
       -- No Text in the select list
FROM dbo.Users AS u
OUTER APPLY
(
SELECT c.Id,
       DENSE_RANK() 
           OVER ( PARTITION BY c.PostId 
                  ORDER BY c.Score DESC ) AS Type,
        c.CreationDate,
        c.Text
FROM dbo.Comments AS c
WHERE c.UserId = u.Id
) AS ca
WHERE ca.Type = 0;

The first query has the Text column in the outer select list, and the second query doesn’t. Please read the comments for additional clarity.

Big Plans

The plan without Text in the outer project goes parallel, and the one with it does not.

But why?

Forcing The Issue

Let’s add a third query into the mix to force the query to go parallel.

/*Q3*/
SELECT u.Id, 
       u.DisplayName, 
       u.Reputation, 
       ca.Id, 
       ca.Type, 
       ca.CreationDate,
       ca.Text --Text in the select list
FROM dbo.Users AS u
OUTER APPLY
(
SELECT c.Id,
       DENSE_RANK() 
           OVER ( PARTITION BY c.PostId 
                  ORDER BY c.Score DESC ) AS Type,
        c.CreationDate,
        c.Text
FROM dbo.Comments AS c
WHERE c.UserId = u.Id
) AS ca
WHERE ca.Type = 0
OPTION(USE HINT('ENABLE_PARALLEL_PLAN_PREFERENCE'));

Things are pretty interesting, here.

The parallel plan is actually costed cheaper up through the Filter operator. In the serial plan, the entire subtree costs 35,954. In the parallel plan, it’s at 35,719.

At 200 query bucks cheaper, we’re in good shape! And then… We Gather Streams ☹

Mortem

The Gather Streams pushes the final plan cost for the parallel plan up higher than the serial plan.

Even though the parallel plan finishes ~26 seconds faster, the optimizer doesn’t choose it naturally because it is a cheapskate.

Bummer, huh?

An important point to keep in mind is that in nested loops join plans, the inner side of the query doesn’t receive any cost adjustments for parallel vs. serial versions. All of the costing differences will exist on the outside.

That’s why only the last few operators in the plan here are what makes a difference.

And that’s what we’ll finish out the week with!

Thanks for reading!

Going Further

Eager Index Spools Are Judgmental Query Plan Operators In SQL Server

Posted on October 28, 2020May 16, 2022 by Erik Darling

You’re Short

I ran across something similar in a previous post: Index Spools When You Have An Index.

But here we are again, with the optimizer treating us like fools for our index choices.

Let’s say we have this index on the Comments table:

CREATE INDEX lol
    ON dbo.Comments
        (UserId)
    INCLUDE
        (Id, PostId, Score, CreationDate, Text);

Is it a great idea? I dunno. But it’s there, and it should make things okay for this query:

SELECT u.Id, 
       u.DisplayName, 
       u.Reputation, 
       ca.Id, 
       ca.Type, 
       ca.CreationDate
FROM dbo.Users AS u
OUTER APPLY
(
SELECT c.Id,
       DENSE_RANK() 
           OVER ( PARTITION BY c.PostId 
                  ORDER BY c.Score DESC ) AS Type,
        c.CreationDate 
FROM dbo.Comments AS c
WHERE c.UserId = u.Id
) AS ca
WHERE ca.Type = 0;

You’re Round

But when we run the query and collect the plan, something rather astounding happens.

The optimizer uses our index to build a smaller index!

Digging in on the Eager Index Spool, it’s a nearly identical copy of the index we have, just without the Text column.

Your Mother Dresses You Funny

Of course, the optimizer being the unreasonable curmudgeon that it is, the only workaround is to also create the more narrow index.

CREATE INDEX lmao
    ON dbo.Comments
        (UserId)
    INCLUDE
        (Id, PostId, Score, CreationDate);

Or add the Text column to the select:

SELECT u.Id, 
       u.DisplayName, 
       u.Reputation, 
       ca.Id, 
       ca.Type, 
       ca.CreationDate,
       ca.Text
FROM dbo.Users AS u
OUTER APPLY
(
SELECT c.Id,
       DENSE_RANK() 
           OVER ( PARTITION BY c.PostId 
                  ORDER BY c.Score DESC ) AS Type,
        c.CreationDate,
        c.Text
FROM dbo.Comments AS c
WHERE c.UserId = u.Id
) AS ca
WHERE ca.Type = 0;

But that has a weird side effect, too. We’ll look at that tomorrow.

Thanks for reading!

Going Further

A Fun Way To Return Zero Rows To SQL Server Management Studio Using ROW_NUMBER

Posted on October 27, 2020May 16, 2022 by Erik Darling

Demos And Demos!

Sending rows out to SSMS is annoying. It takes a long time when there are a lot of them, and sometimes you just wanna show a query plan that does a lot of work without all the blah blah.

And sure, there are other great ways that I talked about in this Q&A.

One other way is a method I used in yesterday’s post.

Let’s be timely and look at that!

Everyone Loves Windowing Functions

They do all sorts of funny stuff. Number things, rank things.

And they make great demos, whether you’re trying to show how to do something cool, or make them perform well.

The funny thing is that the optimizer doesn’t really understand that, unless you intervene, the numbering and ranking doesn’t start at zero.

You can have a whole query run, do a bunch of really hard work, take a long time, and not return any rows, just by filtering to where the function = 0.

Using yesterday’s query as an example, here’s what happens:

SELECT u.Id, 
       u.DisplayName, 
       u.Reputation, 
       u.CreationDate, 
       ca.*
FROM dbo.Users AS u
OUTER APPLY
    (
        SELECT *, 
        DENSE_RANK() 
              OVER( PARTITION BY vs.Id 
                    ORDER BY vs.Id DESC ) AS whatever
        FROM dbo.VotesSkewed AS vs
        WHERE vs.UserId = u.Id
        AND   vs.VoteTypeId BETWEEN 1 AND 5
    ) AS ca
WHERE ca.whatever = 0;

Since I’m generating the rank here on the primary key/clustered index for the table, it’s basically free to do. It’s fully supported by the index.

If you create a different index to support that, e.g. a POC index (Partitioning, Ordering, Covering), you can use that too.

Bag Of Bones

This query will, of course, return no rows. But it will do all the prerequisite work to generate a result set to filter on. That’s where that filter operator kicks in, and nothing passes through it. It’s a stopping point right before things would actually have to start kicking out to SSMS.

But of course, the optimizer doesn’t know that until we get there. If it did, we might just end up with a constant scan and a query that finishes instantly.

For example, if you add a silly where clause like this:

SELECT u.Id, 
       u.DisplayName, 
       u.Reputation, 
       u.CreationDate, 
       ca.*
FROM dbo.Users AS u
OUTER APPLY
    (
        SELECT *, 
        DENSE_RANK() 
              OVER( PARTITION BY vs.Id 
                    ORDER BY vs.Id DESC ) AS whatever
        FROM dbo.VotesSkewed AS vs
        WHERE vs.UserId = u.Id
        AND   vs.VoteTypeId BETWEEN 1 AND 5
    ) AS ca
WHERE 1 = (SELECT 0)

You end up with this:

That’s That

If you find yourself with a demo that returns a lot of rows, and you don’t want to use a TOP or OFFSET/FETCH to only return some of them, this is a fun way to return nothing.

Thanks for reading!

Going Further

Please Stop Ignoring CXCONSUMER Waits In SQL Server

Posted on October 26, 2020May 16, 2022 by Erik Darling

I’m Begging You, Here

Ever since SSMS started collecting wait stats in query plans, I’ve been mad about a couple things that get filtered out:

Lock waits
CXCONSUMER waits

Lock waits are particularly annoying. Imagine (I know, this might be difficult) that you have a friend who is puzzled by why a query is sometimes slow.

They send you an actual plan for when it’s fast, and an actual plan for when it’s slow. You compare them in every which way, and everything except duration is identical.

It’d be a whole lot easier to answer them if LCK waits were collected, but hey. Let’s just make them jump through another hoop to figure out what’s going on.

CXCONSUMER has a similar problem — and here’s the thing — if people are going through the trouble of collecting this information, give’em what they ask for. Don’t just give them what you think is a good idea.

Highly Affected People

Let’s look at a query where parallelism all gets skewed to one thread.

SELECT u.Id, 
       u.DisplayName, 
       u.Reputation, 
       u.CreationDate, 
       ca.*
FROM dbo.Users AS u
OUTER APPLY
    (
        SELECT *, 
        DENSE_RANK() 
              OVER( PARTITION BY vs.Id 
                    ORDER BY vs.Id DESC ) AS whatever
        FROM dbo.VotesSkewed AS vs
        WHERE vs.UserId = u.Id
        AND   vs.VoteTypeId BETWEEN 1 AND 4
    ) AS ca
WHERE ca.whatever = 0;

It runs for 42 seconds!

But the only wait collected is SOS_SCHEDULER_YIELD. For 392 ms.

Different Angle

If we watch the server’s wait stats while the query runs, we see a totally different story.

EXEC sp_BlitzFirst @Seconds = 50, @ExpertMode = 1;

We had four waits on CXCONSUMER that all lasted nearly 11 seconds. Which of course happens because all the rows end up on one thread.

Sure, that’s easy enough to spot if you go looking, but having CXCONSUMER waits collected would make it a whole lot easier to know what to look for.

That is surely a significant enough wait to include. And don’t tell me it’s not actionable, because I spend a good chunk of time fixing problems like this.

Thanks for reading!

Going Further

A SQL Server Parameterization Puzzle With TOP: Part 2

Posted on October 22, 2020May 16, 2022 by Erik Darling

Up From The Floor

Now that we’ve covered what happened with our query, how can we fix it?

Remember when I said that this only happens with literals? I sort of lied.

Sorry.

Pants On Fire

Probably the simplest thing to do would be to set MAXDOP to 1 for the query. Avoiding the parallel exchanges avoids the problem, but the query does run longer than the original with a literal TOP. That being said, it may be the simplest solution in some cases for you if it stabilizes performance.

If you’re feeling as brave as Sir Robin, you can add an OPTIMIZE FOR hint to bring back the early-buffer-send behavior.

CREATE OR ALTER PROCEDURE dbo.SniffedTop (@Top INT)
AS
BEGIN
SET NOCOUNT, XACT_ABORT ON;

    SELECT TOP (@Top)
            u.DisplayName,
            b.Name
    FROM dbo.Users u
    CROSS APPLY 
    (
        SELECT TOP (1) 
                b.Name
        FROM dbo.Badges AS b
        WHERE b.UserId = u.Id
        ORDER BY b.Date DESC
    ) AS b
    WHERE u.Reputation >= 10000
    ORDER BY u.Reputation DESC
    OPTION(OPTIMIZE FOR(@Top = 1));

END;
GO

Are they always better? I have no idea, but if you’ve got long-running queries with a parameterized TOP, this might be something worth experimenting with.

Another rewrite that works is slightly more complicated. Though for maximum benefit, Batch Mode is necessary.

CREATE OR ALTER PROCEDURE dbo.SniffedTop (@Top INT)
AS
BEGIN
SET NOCOUNT, XACT_ABORT ON;

    SELECT TOP (@Top)
            u.DisplayName,
            b.Name
    FROM dbo.Users u
    CROSS APPLY 
    (
        SELECT *
        FROM 
        (
            SELECT *,
            	ROW_NUMBER() 
            	    OVER( PARTITION BY b.UserId
            		      ORDER BY b.Date DESC ) AS n
            FROM dbo.Badges AS b
        ) AS b
        WHERE b.UserId = u.Id
        AND b.n = 1
    ) AS b
    WHERE u.Reputation >= 10000
    ORDER BY u.Reputation DESC;

END;
GO

So that’s fun. We’re having fun. I like fun.

I’m gonna make a PowerPoint about fun.

Other Things, And Drawbacks

So like, you could add a recompile hint to allow the TOP parameter to be sniffed, sure. But then you’re sort of undoing the good you did parameterizing in the first place.

You could also write unparameterized dynamic SQL, but see above. Same problem, plus a blown out plan cache if people ask for different values.

Optimize for unknown, and OFFSET/FETCH also don’t work.

Of course, one thing that would help here is a more appropriate index leading with UserId. However, most good demos arise from less than ideal indexing, so you’re just going to have to deal with it.

Thanks for reading!

Going Further

A SQL Server Parameterization Puzzle With TOP: Part 1

Posted on October 21, 2020May 16, 2022 by Erik Darling

The Gift That Keeps On Giving

I love when a demo written for one purpose turns into an even better demo for another purpose.

While working with a client recently, they ran into a performance issue when trying to promote plan reuse by parameterizing the user-input number for TOP.

In part 1, I’m going to show you what happened and why, and in part 2 I’ll discuss some workarounds.

Regresso Chicken Face Soup

When executed with a literal value in the top, this query runs for around 10 seconds.

I’m not saying that’s great, but it’s a good enough starting place.

SELECT TOP (38)
        u.DisplayName,
        b.Name
FROM dbo.Users u
CROSS APPLY 
(
    SELECT TOP (1) 
            b.Name
    FROM dbo.Badges AS b
    WHERE b.UserId = u.Id
    ORDER BY b.Date DESC
) AS b
WHERE u.Reputation >= 10000
ORDER BY u.Reputation DESC;

If we take that same query, put it in a proc, and run it with an identical value in the TOP, things will turn out not-so-well.

CREATE OR ALTER PROCEDURE dbo.SniffedTop (@Top INT)
AS
BEGIN
SET NOCOUNT, XACT_ABORT ON;

    SELECT TOP (@Top)
            u.DisplayName,
            b.Name
    FROM dbo.Users u
    CROSS APPLY 
    (
        SELECT TOP (1) 
                b.Name
        FROM dbo.Badges AS b
        WHERE b.UserId = u.Id
        ORDER BY b.Date DESC
    ) AS b
    WHERE u.Reputation >= 10000
    ORDER BY u.Reputation DESC;

END
GO 

EXEC dbo.SniffedTop @Top = 38;

The query runs for a significantly longer amount of time.

What Happened?

Unofficially, when TOP uses a constant and the constant is “small” (under 101), the exchange packets are allowed to send earlier than normal, as long as the exchange is below the TOP operator. They’re allowed to send as soon as they have a row, rather than waiting for them to fill up completely.

This can only happen with constants (or…!), and the behavior is true going back to 2005. It may change in the future, so if you’re reading this at some far off date, please don’t be too harsh on me in the comments.

When you parameterize TOP, it’s considered unsafe to send the exchange buffers early. After all, you could stick anything in there, up through the BIGINT max. In cases where you’ve got a BIG TOP, sending, say, 9,223,372,036,854,775,807 rows one at a time would be significantly ickier than sending over a smaller number of full exchange buffers.

If you’re surprised to hear that parallel exchange buffers can send at different times, you’re not alone. I was also surprised.

SQL Server: Full Of Surprises. Horrible surprises.

In the second query, where exchange buffers are sent when full, we spend a lot longer waiting for them to fill up. This isn’t exposed anywhere in the plan, and you’d need either a debugger or this blog post to figure it out.

Yep.

Yep. Yep. Yep. Yep. Yep. Yep.

Thanks for reading!

Going Further

Reasons Your Query Is Slow Right Now: A Bad Query Plan Appears

Posted on October 15, 2020May 16, 2022 by Erik Darling

Update Stats, They Said

Okay look, you probably should update stats. At least when you do it, you have some control over the situation.

If you let SQL Server get up to its own devices, you might become quite surprised.

One after-effect of updated stats is, potentially, query plan invalidation. When that happens, SQL Server might get hard to work coming up with a new plan that makes sense based on these new statistics.

And that, dear friends, is where things can go bad.

New Contributor ?

Let’s say we have this query, which returns the average post and comment score for a single user.

CREATE OR ALTER PROCEDURE dbo.AveragePostScore(@UserId INT)
AS
BEGIN
SET NOCOUNT, XACT_ABORT ON;

SELECT u.DisplayName, 
       AVG(p.Score * 1.) AS lmao_p,
       AVG(c.Score * 1.) AS lmao_c
FROM   dbo.Users AS u
JOIN   dbo.Posts AS p
    ON p.OwnerUserId = u.Id
JOIN   dbo.Comments AS c
    ON c.UserId = u.Id
WHERE u.Id = @UserId
GROUP BY u.DisplayName;

END;
GO

Most of the time, the query runs fast enough for the occasional run to not end too poorly.

But then a recompile happens, and a new contributor decides to look at their profile.

Okay To Worse

What comes next you could fill a textbook with.

EXEC dbo.AveragePostScore @UserId = 3150367;

A new plan gets compiled:

And you know, it looks great for a new user.

And you know, it looks not so great for a slightly more seasoned user.

So What Changed?

Running the query first for a user with a bit more site history gives us a plan with a very different shape, that finishes in under 2 seconds. Repeating that plan for less experienced users doesn’t cause any problems. It finishes in very little time at all.

The plan itself remains largely more familiar than most parameter sniffing scenarios wind up. There are plenty more similarities than differences. It really does just come down to join order here.

Alright, now we know what happened. How would we figure out if this happened to us IRL?

I Shot The Trouble

We can do what we did yesterday, and run sp_BlitzFirst. That’ll warn us if stats recently got updated.

EXEC sp_BlitzFirst @Seconds = 10, @ExpertMode = 1;

If we want to try to validate if that stats update messed with a plan, we can use sp_WhoIsActive:

EXEC sp_WhoIsActive @get_plans = 1, @get_avg_time = 1;

The get_avg_time parameter is one I’ve talked about before, and in this case it’s quite helpful.

We’ve got a bunch of queries that usually run in 0 seconds running for at least 20 seconds.

Fingerprinted

Since we used get_plans here too, we can grab the execution plan, and see which parameters were used on compilation and execution.

Get the properties of the select operator, either by right clicking and selecting properties, or hitting F4.

SQL Server Query Plan Properties — not yet

Now we know two things, and can test four things:

If we run this query using the compiled value, is it fast?
If we run this query using the runtime value, is it slow?
If we recompile and run it for the runtime value, is it fast or slow?
Is the compile time value still fast when it uses the “new” plan?

This is a little easier with stored procs, because you can do something like:

EXEC sys.sp_recompile @objname = N'dbo.AveragePostScore';

To use DBCC FREEPROCCACHE to target a specific query, you need the sql handle or plan handle. You don’t wanna jump off and clear the whole cache here, unless you’re desperate. Just make sure you understand that you might fix one query, and break others, if you clear the whole thing.

It’s better to be targeted when possible.

And of course, if you’ve got Query Store up and running, you may do well to look at Regressed or High Variance query views there, and force the faster plan.

Thanks for reading!

Going Further

Starting SQL: Why Not Make Every SQL Server Query Dynamic SQL?

Posted on September 28, 2020May 15, 2022 by Erik Darling

Problem Solver

I think it was Doug Lane who coined the stages of dynamic SQL. One of them dealt with the fact that once you start using it, you just wanna use it everywhere, even places where you think you don’t need it.

Most of the time, you don’t. A parameter is good enough. But like we saw last week, sometimes parameters can backfire, and you can use dynamic SQL to save query performance.

That’s one great use, but it’s one you’re gonna have to use constantly. Most of the time when you’re using dynamic SQL, it’s not going to be to correct performance.

You want to build or use a different string based on some input
You have a statement you want to execute over multiple targets
Your script has to support multiple versions of SQL Server

Of course, one can’t reasonably write about dynamic SQL in SQL Server without linking to Erland. Boy howdy, did I learn most everything I know from him.

I Disconnect From You

One of the big downsides of dynamic SQL is that statements in a stored procedure are no longer associated with that stored procedure in system DMVs.

Of course, you can address this somewhat by adding a comment to the query inside the dynamic SQL:

/*headless*/
DECLARE @super_cool_sql NVARCHAR(MAX) = N'
SELECT * 
FROM ALL THE TABLES!
';

/*more headed*/
DECLARE @super_cool_sql NVARCHAR(MAX) = N'
SELECT * 
/*super cool procedure name*/
FROM ALL THE TABLES!
';

Where you put the comment is irrelevant, but if it’s a particularly long query, I’d probably want it close to or above the select so it doesn’t get truncated.

But we’re all nice people who don’t write queries with more than 65k characters and spaces in them.

Right? Right.

While it’s nice to know where they came from, they’re not easy to track down because they don’t have a parent object id — they’re rogue agents in the plan cache.

It can also make troubleshooting a stored procedure difficult because it can be a little harder to see which statements did the most work.

You might be conditionally executing certain blocks of dynamic SQL for different reasons
You might be building dynamic where clauses that have different performance profiles

Other Thans

The first thing I want to cover outright is that IF branching without dynamic SQL does not work. Flat out.

Yes, you can control the flow of logic, but it has no desirable impact on query plan compilation. Everything gets compiled the first time.

Instead of dynamic SQL, though, you could use separate stored procedures, which at least makes the objects a little easier to track down in the plan cache or Query Store.

CREATE OR ALTER PROCEDURE dbo.VoteCount (@VoteTypeId INT, @YearsBack INT)
AS
BEGIN

IF @VoteTypeId IN (2, 1, 3, 5, 10, 6)
BEGIN
    EXEC dbo.VoteCount_InnerBig @VoteTypeId, @YearsBack;
END;

IF @VoteTypeId IN (16, 15, 11, 8, 9, 7, 12, 4)
BEGIN
    EXEC dbo.VoteCount_InnerSmall @VoteTypeId, @YearsBack;
END;

END;

Both of those stored procedures can have the same statement in them, without the ? = (SELECT ?) addition needed with the dynamic SQL option.

That they’re owned by different objects is enough to get them separate optimization paths. You’re also a bit less prone to permissions issues, if you’re the kind of person who takes those seriously. But if your app logs in as db_owner or whatever, well, BOFL with that.

Speaking of which, even though I find security incredibly dull and frustrating, let’s talk a little about how people can take advantage of bad dynamic SQL.

Thanks for reading!

Going Further

Starting SQL: Fixing Parameter Sensitivity Problems With SQL Server Queries

Posted on September 25, 2020May 15, 2022 by Erik Darling

Mostoftentimes

There is a good solution to alleviating parameter sniffing. Ones I commonly use are:

Temp tables
Changing indexes
Dynamic SQL

There isn’t really a good place to put a temp table in this stored proc, though I did blog about a similar technique before.

It would certainly be a good candidate for index changes though, because the first thing we need to address is that key lookup.

It’s a sensitive issue.

King Index

We’re going to walk through something I talked about what seems like an eternity ago. Why? Because it has practical application here.

When you look at the core part of the query, PostId is only in the select list. Most advice around key lookups (including, generally, my own) is to consider putting columns only in the output into the includes of the index.

SQL Server Quer Plan Tool Tip — and that’s where this is

But we’re in a slightly different situation, here.

SELECT v.VoteTypeId,
       v.PostId,
       COUNT_BIG(v.PostId) AS TotalPosts,
       COUNT_BIG(DISTINCT v.PostId) AS UniquePosts
FROM dbo.Votes AS v
WHERE v.CreationDate >= DATEADD(YEAR, (-1 * @YearsBack), '2014-01-01')
AND   v.VoteTypeId = @VoteTypeId
GROUP BY v.VoteTypeId,
         v.PostId

We’re getting a distinct count, and SQL Server has some choices for coming up with that.

If we follow the general advice here and create this index, we’ll end up in trouble:

CREATE INDEX v 
    ON dbo.Votes(VoteTypeId, CreationDate) INCLUDE(PostId)
    WITH (DROP_EXISTING = ON);

Since the Stream Aggregate expects ordered data, and PostId isn’t in order in the index (because includes aren’t in any particular order), we need to sort it. For a small amount of data, that’s fine. For a large amount of data, it’s not.

There is a second Sort in the plan further down, but it’s on the count expression, which means we can’t index it without adding in additional objects, like an indexed view.

SELECT TOP (1000) 
    x.VoteTypeId,
    x.PostId,
    x.TotalPosts,
    x.UniquePosts
FROM 
    (
        SELECT v.VoteTypeId,
               v.PostId,
               COUNT_BIG(v.PostId) AS TotalPosts, -- this is the expression
               COUNT_BIG(DISTINCT v.PostId) AS UniquePosts
        FROM dbo.Votes AS v 
        WHERE v.CreationDate >= DATEADD(YEAR, (-1 * @YearsBack), '2014-01-01')
        AND   v.VoteTypeId = @VoteTypeId
        GROUP BY v.VoteTypeId,
                 v.PostId
    ) AS x
ORDER BY x.TotalPosts DESC; -- this is the ordering

What’s An Index To A Non-Believer?

A better index in this case looks like this:

CREATE INDEX v 
    ON dbo.Votes(VoteTypeId, PostId, CreationDate)
    WITH (DROP_EXISTING = ON);

It will shave about 6 seconds off the run time, but there’s still a problem when the “big” data doesn’t go parallel:

When the plan goes parallel, it’s about 4x faster than the serial version. Now I know what you’re thinking, here. We could use OPTIMIZE FOR to always get the plan for the big value. And that’s not a horrible idea — the small data parameter runs very quickly re-using the parallel plan here — but there’s another way.

Let’s look at our data.

Don’t Just Stare At It

Let’s draw an arbitrary line. I think a million is a popular number. I wish it was a popular number in my bank account, but you know.

SQL Server Query Results — unwritten law

I know we’re ignoring the date column data, but this is good enough for now. There’s only so much I can squeeze into one blog post.

The point here is that we’re going to say that anything under a million rows is okay with using the small plan, and anything over a million rows needs the big plan.

Sure, we might need to refine that later if there are outliers within those two groups, but this is a blog post.

How do we do that? We go dynamic.

Behike 54

Plan ol’ IF branches plan ol’ don’t work. We need something to get two distinct plans that are re-usable.

Here’s the full procedure:

CREATE OR ALTER PROCEDURE dbo.VoteCount (@VoteTypeId INT, @YearsBack INT)
AS
BEGIN

DECLARE @sql NVARCHAR(MAX) = N'';

SET @sql += N'
SELECT TOP (1000) 
    x.VoteTypeId,
    x.PostId,
    x.TotalPosts,
    x.UniquePosts
/*dbo.VoteCount*/
FROM 
    (
        SELECT v.VoteTypeId,
               v.PostId,
               COUNT_BIG(v.PostId) AS TotalPosts,
               COUNT_BIG(DISTINCT v.PostId) AS UniquePosts
        FROM dbo.Votes AS v 
        WHERE v.CreationDate >= DATEADD(YEAR, (-1 * @YearsBack), ''2014-01-01'')
        AND   v.VoteTypeId = @VoteTypeId '

IF @VoteTypeId IN (2, 1, 3, 5, 10, 6)
BEGIN
    SET @sql += N'
        AND 1 = (SELECT 1)'
END

IF @VoteTypeId IN (16, 15, 11, 8, 9, 7, 12, 4)
BEGIN
    SET @sql += N'
        AND 2 = (SELECT 2)'
END

SET @sql += N'
        GROUP BY v.VoteTypeId,
                 v.PostId
    ) AS x
ORDER BY x.TotalPosts DESC;
';

RAISERROR('%s', 0, 1, @sql) WITH NOWAIT;

EXEC sys.sp_executesql @sql, 
                       N'@VoteTypeId INT, @YearsBack INT', 
                       @VoteTypeId, @YearsBack;

END;

There’s a bit going on in there, but the important part is in the middle. This is what will give use different execution plans.

IF @VoteTypeId IN (2, 1, 3, 5, 10, 6)
BEGIN
    SET @sql += N'
        AND 1 = (SELECT 1)'
END

IF @VoteTypeId IN (16, 15, 11, 8, 9, 7, 12, 4)
BEGIN
    SET @sql += N'
        AND 2 = (SELECT 2)'
END

Sure, there are other ways to do this. You could even selectively recompile if you wanted to. But some people complain when you recompile. It’s cheating.

Because the SQL Server Query Optimizer typically selects the best execution plan for a query, we recommend only using hints as a last resort for experienced developers and database administrators.

See? It’s even documented.

Now that we’ve got that all worked out, we can run the proc and get the right plan depending on the amount of data we need to shuffle around.

Little Star

Now I know what you’re thinking. You wanna know more about that dynamic SQL. You want to solve performance problems and have happy endings.

We’ll do that next week, where I’ll talk about common issues, best practices, and more tricks you can use to get queries to perform better with it.

Thanks for reading!