Things SQL Server vNext Should Address: How Did I Do?

Mom I Did It


A while back, I wrote a bunch of posts about things I’d like to see vNext take care of. In this post, since it’s Friday and I don’t wanna do anything, will round those up and cover whether or they made it in or not.

Well, maybe I’ll need to update the list for future releases of SQL Server 2022.

Hmpf.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

SQL Server 2022 DOP Feedback: Related Extended Events

Short Round


Sort of on the heels of yesterday’s post, here are some Extended Events related to the DOP Feedback feature new to SQL Server 2022.

Here’s the complete text of a session to collect all of the related events that I’ve noticed so far:

CREATE EVENT SESSION 
    dop_feedback 
ON SERVER 
ADD EVENT 
    sqlserver.dop_feedback_eligible_query
    (
        ACTION(sqlserver.sql_text)
    ),
ADD EVENT 
    sqlserver.dop_feedback_provided
    (
        ACTION(sqlserver.sql_text)
    ),
ADD EVENT 
    sqlserver.dop_feedback_validation
    (
        ACTION(sqlserver.sql_text)
    ),
ADD EVENT 
    sqlserver.dop_feedback_reverted
    (
        ACTION(sqlserver.sql_text)
    )
ADD TARGET package0.event_file(SET filename=N'dop_feedback')
WITH 
(
    EVENT_RETENTION_MODE = ALLOW_SINGLE_EVENT_LOSS,
    MAX_DISPATCH_LATENCY = 5 SECONDS,
    STARTUP_STATE = ON
);

There is one additional event in the debug channel called maxdop_feedback_received, but the contents of it don’t appear immediately actionable.

Defining Moments


Here are the definitions for each of the events above:

  • dop_feedback_eligible_query: Reports when a query plan becomes eligible for dop feedback
  • dop_feedback_provided: Reports DOP feedback provided data for a query
  • dop_feedback_validation: Reports when the validation occurs for the query runtime stats against baseline or previous feedback stats
  • dop_feedback_reverted: This reports when a DOP feedback is reverted

Fairly straightforward, here. Also seems like a decent set of events that you’d wanna have in place.

Thanks, Microsoft.

MAPDOP


The map values for each of these events is also available:

+-----------------------+---------+--------------------------------------+
|         name          | map_key |              map_value               |
+-----------------------+---------+--------------------------------------+
| dop_calculation_stage |       0 | SetMaxDOP                            |
| dop_calculation_stage |       1 | SetTraceflag                         |
| dop_calculation_stage |       2 | CalculateBasedOnAvailableThreads     |
| dop_calculation_stage |       3 | PostCalculate                        |
| dop_feedback_state    |       0 | NotAnalyzed                          |
| dop_feedback_state    |       1 | NotEligible                          |
| dop_feedback_state    |       2 | InAnalysis                           |
| dop_feedback_state    |       3 | NoRecommendation                     |
| dop_feedback_state    |       4 | AnalysisStoppedDueToThrottling       |
| dop_feedback_state    |       5 | AnalysisStoppedDueToMaxResetsReached |
| dop_feedback_state    |       6 | AnalysisStoppedMinimumDOP            |
| dop_feedback_state    |       7 | PendingValidationTest                |
| dop_feedback_state    |       8 | VerificationRegressed                |
| dop_feedback_state    |       9 | RegressionDueToAbort                 |
| dop_feedback_state    |      10 | Stable                               |
| dop_statement_type    |       1 | Select                               |
| dop_statement_type    |       2 | Insert                               |
| dop_statement_type    |       3 | Update                               |
| dop_statement_type    |       4 | Delete                               |
| dop_statement_type    |       5 | Merge                                |
+-----------------------+---------+--------------------------------------+

Why two are zero-based and one is not is beyond what I can explain to you, here.

Perhaps that will be addressed in a future release.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

PASS Data Community Summit 2022: Come See My Session, The SQL Server Performance Tasting Menu

Workin’


On top of my precon, The Professional Performance Tuning Blueprint, I’ve also got a regular session selected!

The SQL Server Performance Tasting Menu:

You’re a DBA or Developer, and you’ve been using SQL Server for a few years.

You know there are different ways to make queries faster, but you’re not sure when to use them.

I’m Erik Darling, and I’ll be your sommelier for the evening.

Over several courses of delicious demos, I’ll show you the types of performance problems different tuning techniques pair well with, and which ones to avoid.

When we’re done, you’ll understand exactly what patterns to look for when you’re troubleshooting slow queries, and how to approach them.

You’ll have the secret recipe for gourmet queries.

Dates And Times


The PASS Data Community Summit is taking place in Seattle November 15-18, 2022 and online.

You can register here, to attend online or in-person. I’ll be there in all my fleshy goodness, and I hope to see you there too!

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Fixing Ordered Column Store Sorting In SQL Server 2022

Groove Is In The Heart


When Brent posted about the availability of, and disappointment with creating ordered column store indexes in SQL Server 2022, I got to work.

I can’t have my dear friend Brent being all distraught with all those fast cars around. That’s how accidents happen, and I fear he might leave the Blitz scripts to me in his will or something.

Anyway, I decided to dig in and see what was going on behind the scenes. Which of course, means query plans, and bothering people who are really good at debuggers.

Most of the problems that you’ll run into in SQL Server will come from sorting data.

Whenever I have to think about Sorts, I head to this post about all the different Sorts you might see in a query plan.

More on that later, though.

Cod Piece


In Paul’s post, he talks about using undocumented trace flag 8666 to get additional details about Sort operators.

Let’s do that. Paul is smart, though he is always completely wrong about which season it is.

DROP TABLE IF EXISTS
    dbo.Votes_CCI;

SELECT
    v.*
INTO dbo.Votes_CCI
FROM dbo.Votes AS v;

I’m using the Votes table because it’s nice and narrow and I don’t have to tinker with any string columns.

Strings in databases were a mistake, after all.

DBCC TRACEON(8666);
CREATE CLUSTERED COLUMNSTORE INDEX
    vcci
ON dbo.Votes_CCI
ORDER (Postid);
DBCC TRACEOFF(8666);

Here’s what we get back in the query plan:

SQL Server Query Plan
Tainted Sort

We’ve got a Soft Sort! What does our seasonally maladjusted friend say about those?

A “soft sort” uses only its primary memory grant and never spills. It doesn’t guarantee fully-sorted output. Each sort run using the available memory grant will be sorted. A “sort sort” represents a best effort given the resource available. This property can be used to infer that a Sort is implemented with CQScanPartitionSortNew without attaching a debugger. The meaning of the InMemory property flag shown above will be covered in part 2. It does not indicate whether a regular sort was performed in memory or not.

Well, with that attitude, it’s not surprising that there are so many overlapping buckets in the column store index. If it’s not good enough, what can you do?

Building the index with the Soft Sort here also leads to things being as bad as they were in Brent’s post.

Insert Debugging Here


Alas, there’s (almost) always a way. Microsoft keeps making these trace flag things.

There are a bunch of different ways to track them down, but figuring out the behavior of random trace flags that you may find just by enabling them isn’t easy.

One way to tie a trace flag to a behavior is to use WinDbg to step through different behaviors in action, and see if SQL Server checks to see if a trace flag is enabled when that behavior is performed.

If you catch that, you can be reasonably sure that the trace flag will have some impact on the behavior. Not all trace flags can be enabled at runtime. Some need to be enabled as startup options.

Sometimes it’s hours and hours of work to track this stuff down, and other times Paul White (b|t) already has notes on helpful ones.

The trace flag below, 2417, is present going back to SQL Server 2014, and can help with the Soft Sort issues we’re seeing when building ordered clustered column store indexes today.

Here’s another one:

DBCC TRACEON(8666, 2417);
CREATE CLUSTERED COLUMNSTORE INDEX
    vcci
ON dbo.Votes_CCI
ORDER (Postid)
WITH(MAXDOP = 1);
DBCC TRACEOFF(8666, 2417);

The MAXDOP 1 hint isn’t strictly necessary. With a parallel plan, you may see up to DOP overlapping row groups.

SQL Server Query Plan
community service

That’s why it was a popular maneuver to emulate this behavior by creating a clustered row store index, and then create a clustered column store index over it with drop existing and a MAXDOP 1 hint.

At DOP 1, you don’t see that overlap. It takes a lot longer of course — 3 minutes instead of 30 or so seconds — which is a real bummer. But without it, you could see DOP over lapping rowgroups.

If you want All The Pretty Little Rowgroups, this is what you have to do.

Anyway, the result using sp_BlitzIndex looks a lot better now:

EXEC sp_BlitzIndex
    @TableName = 'Votes_CCI';
SQL Server Query Results
capture the flag

How nice.

You can also use undocumented and unsupported trace flag 11621, which is

[A] feature flag for the ‘partition sort on column store order’ so the end result is similar, but via a different mechanism to 2417.
A partition sort is useful in general to prevent unnecessary switching between partitions. If you sort the stream by partition, you process all the rows for one before moving on to the next. A soft sort is ok there because it’s just a performance optimization. Worst case, you end up switching between partitions quite often because the sort ran out of memory, but correct results will still occur.

Chain Gang


A “reasonable” alternative to trace flags maybe to adjust the index create memory configuration option. If we set it down to the minimum value, we get a “helpful” error message:

EXEC sys.sp_configure 
    'index create memory', 
    704;

RECONFIGURE;

As promised:

Msg 8606, Level 17, State 1, Line 31

This index operation requires 123208 KB of memory per DOP.

The total requirement of 985800 KB for DOP of 8 is greater than the sp_configure value of 704 KB set for the advanced server configuration option “index create memory (KB)”.

Increase this setting or reduce DOP and rerun the query.

If you get the actual execution plan for the clustered column store index create or rebuild with the Soft Sort disabled and look at the memory grant, you get a reasonable estimate for what to set index create memory to.

Changing it does two things:

  • Avoids the very low memory grant that Soft Sorts receive, and causes the uneven row groups
  • The Soft Sort keeps the index create from going above that index create memory number

Setting index create memory for this particular index creation/rebuild to 5,561,824 gets you the nice, even row groups (at MAXDOP 1) that we saw when disabling the Soft Sort entirely.

Bottom line, here is that uneven row groups happen with column store indexes when there’s a:

  • Parallel create/rebuild
  • Low memory grant create/rebuild

If this sort of thing is particularly important to you, you could adjust index create memory to a value that allows the Soft Sort adequate memory.

But that’s a hell of a lot of work, and I hope Microsoft just fixes this in a later build.

Reality Bites


The cute thing here is that, while this syntactical functionality has been available in Azure Cloud Nonsense© for some time, no one uses that, so no one cares.

The bits for this were technically available in SQL Server 2019 as well, but I’m not telling you how to do that. It’s not supported, and bad things might happen if you use it.

I mean, bad things happen in SQL Server 2022 where it’s supported unless you use an undocumented trace flag, but… Uh. I dunno.

This trace flag seems to set things back to how things worked in the Before Times, though, which is probably how they should have stayed.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Ordered Columnstore Indexes on SQL Server 2022 CTP 2.0

Brent recently blogged about ordered columnstore indexes in SQL Server 2022 and had some trouble with them, so I decided to take a look into the mechanics of the feature. I’m testing on SQL Server 2022 CTP 2.0.

What does the ordered columnstore feature do?

  1. A sort operator may be added to query plans that insert into the table. The sort operator is a bit unusual in that the data may not be fully sorted.
  2. A sort operator is added when initially creating an ordered columnstore index. The level of rowgroup elimination fragmentation will depend on memory, DOP, and other factors.
  3. A sort operator is added when rebuilding an ordered columnstore index. The level of rowgroup elimination fragmentation will depend on memory, DOP, and other factors.

Technical details for insert

The sort for inserting into an ordered columnstore is a DML request sort. It appears to use the same internal mechanism as the sort that’s added for inserting into partitioned columnstore tables. The difference is that the data is sorted by the specified columns instead of a calculated partition id. In my testing, the sort appears to be a best effort sort that does not spill to tempdb. This means that if SQL Server thinks there won’t be enough memory then the data will not be fully sorted. Parallel inserts have an additional complication. Consider the following query plan image:

That is a row mode sort. It is a row mode sort because a batch mode parallel sort would put all resulting rows on a single thread which would make the parallel insert pointless. However, there’s no repartition streams operator as a child of the sort. Data is sorted on each thread in a best effort fashion. Even if there is enough memory to fully sort the data, you will end up with DOP threads of sorted data instead. The data will not be sorted globally. The split into threads will increase rowgroup elimination fragmentation.

As mentioned earlier, the sort operator does not always appear. It is not present when the cardinality estimate is very low (around 250 rows). I suspect that the same logic is used for adding the sort as adding the memory grant for compression. For very low cardinality estimates, the data will be inserted into delta rowgroups, even if there’s more than 102399 rows. By that same reasoning, I expect that there is no sort operator if the INSERT query hits a memory grant timeout.

For more information on this sort, see CQScanPartitionSortNew in Paul White’s blog post about different sort types in SQL Server.

Technical details for CREATE/REBUILD index

I spent less time looking into the sort that’s added as part of CREATE or REBUILD index. In my testing, the sort again does not spill to tempdb. The sort is also performed on a per thread basis for parallel index operations. The minimum fragmentation level will be achieved for a MAXDOP 1 operation with sufficient memory. Reducing memory or running the create index in parallel will increase fragmentation. This is unfortunate because ordered columnstore indexes do not support online index creation or rebuilds.

An unpopular opinion

I think that the community worries too much with columnstore with respect to achieving perfect segment ordering and keeping rowgroups at exactly 1048576 rows. If you perform basic maintenance and partition tables appropriately then that should be good enough for most data warehouse workloads. Most query performance issues are going to be caused by getting no elimination at all, scanning through too many soft-deleted rows, or the usual query performance problems. Scanning 11 rowgroups instead of 5 probably isn’t why your queries on columnstore indexes are slow today.

Final thoughts

As is, this feature can be described as a poor man’s partitioning. The sweet spot for this functionality feels very limited to me in its current state, but we’re still on CTP 2.0. Maybe there will be changes before RTM. Thanks for reading!

Considerations For Paging Queries In SQL Server With Batch Mode (Don’t Use OFFSET/FETCH)

First Things First


The first SQL Server blog posts that I ever read while trying to solve a specific problem here these two:

They sort of changed my life a little, despite the author’s aversion to the letter Z. So that’s cool. Can’t have everything.

To this day, though, I see people screw up paging queries in numerous ways.

  • Selecting all the columns in one go
  • Adding in joins when exists will do
  • Sticking a DISTINCT on there just because
  • Thinking a view will solve some problem
  • Piles and piles of UDFs
  • Local variables for TOP or OFFSET/FETCH
  • Not paying attention to indexing

It’s sort of like every other query I see, except with additional complications.

Especially cute for a query slathered in NOLOCK hints is the oft-accompanying concern that “data might change and people might see something weird when they query for the next page”.

Okay, pal. Now you’re concerned.

Modern Love


A while back I recorded a video about using nonclustered column store indexes to improve the performance of paging queries:

While a lot of the details in there are still true, I want to talk about something slightly different today. While nonclustered column store indexes make great data sources for queries with unpredictable search predicates, they’re not strictly necessary to get batch mode anymore.

With SQL Server 2019, you can get batch mode on row store indexes, as long as you’re on Enterprise Edition, and in compatibility level 150.

Deal with it.

The thing is, how you structure your paging queries can definitely hurt your chances of getting that optimization.

Saddened Face


The bummer here is that the paging technique that I learned from Paul’s articles (linked above) doesn’t seem to qualify for batch mode on row store without a column store index in place, so they don’t make the demo cut here.

The good news is that if you’re going to approach this with any degree of hope for performance, you’re gonna be using a column store index anyway.

The two methods we’re going to look at are OFFSET/FETCH and a more traditional ROW_NUMBER query.

As you may have picked up from the title, one will turn out better, and it’s not the OFFSET/FETCH variety. Especially as you get larger, or go deeper into results, it becomes a real boat anchor.

Anyway, let’s examine, as they say in France.

Barfset Wretch


This is the best way of writing this query that I can come up with.

DECLARE 
    @page_number int = 1,
    @page_size int = 1000;

WITH 
    paging AS
(
    SELECT 
        p.Id
    FROM dbo.Posts AS p
    ORDER BY 
        p.LastActivityDate, 
        p.Id 
    OFFSET ((@page_number - 1) * @page_size) 
    ROW FETCH NEXT (@page_size) ROWS ONLY
)
SELECT 
    p.*
FROM paging AS pg
JOIN dbo.Posts AS p
    ON pg.id = p.Id
ORDER BY 
    p.LastActivityDate,
    p.Id
OPTION (RECOMPILE);

Note that the local variables don’t come into play so much here because of the recompile hint.

Still, just to grab 1000 rows, this query takes just about 4 seconds.

SQL Server Query Plan
what took you so long?

This is not so good.

Examine!

Hero Number


The better-performing query here with the batch mode on row store enhancement(!) is using a single filtered ROW_NUMBER to grab the rows we care about.

DECLARE 
    @page_number int = 1,
    @page_size int = 1000;

WITH 
    fetching AS
(
    SELECT 
        p.Id, 
        n = 
            ROW_NUMBER() OVER 
            ( 
                ORDER BY
                    p.LastActivityDate, 
                    p.Id 
            )
    FROM dbo.Posts AS p
)
SELECT 
    p.*
FROM fetching AS f
JOIN dbo.Posts AS p
    ON f.Id = p.Id
WHERE f.n > ((@page_number - 1) * @page_size)
AND   f.n < ((@page_number * @page_size) + 1)
ORDER BY 
    p.LastActivityDate,
    p.Id
OPTION (RECOMPILE);

Again, this is about the best I can write the query. Maybe you have a better way. Maybe you don’t.

Mine takes a shade under 2 seconds. Twice as fast. Examine!

SQL Server Query Plan
cell tv

I’ll take twice as fast any day of the week.

Compare/Contrast


The OFFSET/FETCH query plan is all in row mode, while the ROW_NUMBER query has batch mode elements.

You can see this by eyeballing the plan: it has a window aggregate operator, and an adaptive join. There are other batch mode operators here, but none have visual cues in the graphical elements of the plan.

This is part of what makes things faster, of course. The differences can be even more profound when you add in the “real life” stuff that paging queries usually require. Filtering, joining, other sorting elements, etc.

Anyway, the point here is that how you write your paging queries from the start can make a big difference in how they end up, performance-wise.

Newer versions of SQL Server where certain behaviors are locked behind heuristics (absent column store indexes being present in some manner) can be especially fickle.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

SQL Server 2022 Parameter Sensitive Plan Optimization: Does Not Care To Fix Your Local Variable Problems

–To Fix Parameter Sniffing


There are some code comments you see that really set the stage for how tuning a query is going to go.

Usually one misgiving about how SQL Server works gives way to a whole levee-breaking bevy of other ones and three days later you can’t feel your legs but dammit it’s done.

Okay, maybe it was three hours, but it felt like three days. Something about the gravitation pull of these black hole queries.

One fix I’ve been wishing for, or wish I’ve been fixing for, is a cure for local variables. I’d even be cool if Forced Parameterization was that cure, but you know…

Time will tell.

Husk


Let’s say we’ve got this stored procedure, which does something similar to the “I’m gonna fix parameter sniffing with a local variable hey why is everything around me turning to brimstone before my very eyes?” idea, but with… less of an end-of-times vibe.

CREATE OR ALTER PROCEDURE 
    dbo.IndexTuningMaster
( 
    @OwnerUserId int,
    @ParentId int, 
    @PostTypeId int 
)
AS
BEGIN
SET NOCOUNT, XACT_ABORT ON;

    /*Someone passed in bad data and we got a bad query plan,
      and we have to make sure that doesn't happen again*/
    
    DECLARE 
        @ParentIdFix int = 
            CASE 
                WHEN @ParentId < 0 
                THEN 0 
                ELSE @ParentId 
            END;
    
    SELECT TOP (1) 
        p.*
    FROM dbo.Posts AS p
    WHERE p.ParentId = @ParentIdFix
    AND   p.PostTypeId = @PostTypeId
    AND   p.OwnerUserId = @OwnerUserId
    ORDER BY 
        p.Score DESC, 
        p.Id DESC;

END;

How bad could a top 1 query be, anyway?

Fortune Teller


When we run this query like so and so:

EXEC dbo.IndexTuningMaster 
    @OwnerUserId = 22656, 
    @ParentId = 0, 
    @PostTypeId = 1;

EXEC dbo.IndexTuningMaster 
    @OwnerUserId = 22656, 
    @ParentId = 184618, 
    @PostTypeId = 2;

We come up with zip zero zilch none nada:

SQL Server Query Plan
still playing

We get a super low guess for both. obviously that guess hurts a large set of matched data far worse than a small one, but the important thing here is that both queries receive the same bad guess.

This is a direct side effect of the local variable’s poor estimate, which PSP isn’t quite yet ready to go up against.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

SQL Server 2022 Parameter Sensitive Plan Optimization: How PSP Can Help Some Queries With IF Branches

Time Served


I’ve spent a bit of time talking about how IF branches can break query performance really badly in SQL Server.

While the Parameter Sensitive Plan (PSP) optimization won’t fix every problem with this lazy coding habit, it can fix some of them in very specific circumstances, assuming:

  • The parameter is eligible for PSP
  • The parameter is present across IF branches

We’re going to use a simple one parameter example to illustrate the potential utility here.

After all, if I make these things too complicated, someone might leave a comment question.

The horror

IFTTT


Here’s the procedure we’re using. The point is to execute one branch if @Reputation parameter is equal to one, and another branch if it equals something else.

In the bad old days, both queries would get a plan optimized at compile time, and neither one would get the performance boost that you hoped for.

In the good news days that you’ll probably get to experience around 2025, things are different!

CREATE OR ALTER PROCEDURE 
    dbo.IFTTT 
(
    @Reputation int
)
AS 
BEGIN
SET NOCOUNT, XACT_ABORT ON;

SET STATISTICS XML ON;  

    IF @Reputation = 1
    BEGIN
        SELECT 
            u.Id, 
            u.DisplayName, 
            u.Reputation, 
            u.CreationDate
        FROM dbo.Users AS u
        WHERE u.Reputation = @Reputation;
    END;

    IF @Reputation > 1
    BEGIN
        SELECT 
            u.Id, 
            u.DisplayName, 
            u.Reputation, 
            u.CreationDate
        FROM dbo.Users AS u
        WHERE u.Reputation = @Reputation;
    END;

SET STATISTICS XML OFF; 

END;
GO 

Johnson & Johnson


If we execute these queries back to back, each one gets a new plan:

EXEC dbo.IFTTT 
    @Reputation = 1;
GO 

EXEC dbo.IFTTT 
    @Reputation = 2;
GO
SQL Server Query Plan
psychic driving

Optimize For You


The reason why is in the resulting queries, as usual. The Reputation column has enough skew present to trigger the PSP optimization, so executions with differently-bucketed parameter values end up with different plans.

option (PLAN PER VALUE(QueryVariantID = 3, predicate_range([StackOverflow2013].[dbo].[Users].[Reputation] = @Reputation, 100.0, 1000000.0)))

option (PLAN PER VALUE(QueryVariantID = 2, predicate_range([StackOverflow2013].[dbo].[Users].[Reputation] = @Reputation, 100.0, 1000000.0)))

And of course, each plan has different compile and runtime values:

SQL Server Query Plan
care

If I were to run this demo in a compatibility level under 160, this would all look totally different.

This is one change I’m sort of interested to see the play-out on.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

SQL Server 2022 Parameter Sensitive Plan Optimization: Sometimes There’s Nothing To Fix

Best Intentions


After seeing places where the Parameter Sensitive Plan (PSP) optimization quite stubbornly refuses to kick in, it’s somewhat amusing to see it kick in where it can’t possibly have any positive impact.

Even though some parameters are responsible for filtering on columns with highly skewed data, certain other factors may be present that don’t allow for the type of plan quality issues you might run into under normal parameter sensitivity scenarios:

  • Adequate indexing
  • Row goals
  • Other filtering elements

This isn’t to say that they can always prevent problems, but they certainly tend to reduce risks much of the time.

If only everything were always ever perfect, you know?

Setup


Let’s start by examining some data in the Posts table.

First, PostTypeIds:

SQL Server Query Results
resultant

Questions and answers are the main types of Posts. The data is clearly skewed, here, and in my testing this does qualify for PSP on its own.

The thing is, there are several attributes that Questions can have that Answers can’t. One of those is a ParentId. Looking through how the top 15 or so of those counts breaks down:

SQL Server Query Results
hitherto

Okay, so! Wikis don’t have ParentIds, neither do Moderator Nominations. More importantly, Questions don’t.

The Question with the Most answers is Id 184618, with 518. A far cry from the next-nearest Post Types, and light years from the number of Questions with a ParentId of zero.

More important than loving your data is knowing your data.

To Query A Butterfly


Let’s say we have this query:

SELECT TOP (5000)
    p.Id,
    p.OwnerUserId,
    p.Score
FROM dbo.Posts AS p
WHERE p.PostTypeId = @po
AND   p.ParentId = @pa
ORDER BY 
    p.Score DESC;

The three things we care about getting done are:

  • Filtering to PostTypeId
  • Filtering to ParentId
  • Ordering by Score

Either of these indexes would be suitable for that:

CREATE INDEX 
    popa
ON dbo.Posts
(
    PostTypeId,
    ParentId,
    Score DESC
)
WITH
(
    SORT_IN_TEMPDB = ON,
    DATA_COMPRESSION = PAGE
);

CREATE INDEX 
    papo
ON dbo.Posts
(
    ParentId,
    PostTypeId,
    Score DESC
)
WITH
(
    SORT_IN_TEMPDB = ON,
    DATA_COMPRESSION = PAGE
);

With No PSP At All


Under compatibility level 150, we can run the query in a variety of ways and get nearly identical performance results:

SQL Server Query Plan
PostTypeId = 1, ParentId = 0
SQL Server Query Plan
PostTypeId = 2, ParentId = 184618

There’s a 27 millisecond difference between the two to find the first 5000 rows that match both predicates. You would have to run these in a very long loop to accumulate a meaningful overall difference.

In this case, both queries use and reuse the same execution plan. You can see that in the estimates.

With All The PSP


Switching to compat level 160, the queries are injected with the PLAN PER VALUE hint.

SELECT TOP (5000)
    p.Id,
    p.OwnerUserId,
    p.Score
FROM dbo.Posts AS p
WHERE p.PostTypeId = @po
AND   p.ParentId = @pa
ORDER BY 
    p.Score DESC 
OPTION 
(
    PLAN PER VALUE
    (
        QueryVariantID = 2, 
        predicate_range
        (
            [StackOverflow2013].[dbo].[Posts].[PostTypeId] = @po, 
            100.0, 
            10000000.0
        )
    )
)

SELECT TOP (5000)
    p.Id,
    p.OwnerUserId,
    p.Score
FROM dbo.Posts AS p
WHERE p.PostTypeId = @po
AND   p.ParentId = @pa
ORDER BY 
    p.Score DESC 
OPTION 
(
    PLAN PER VALUE
    (
        QueryVariantID = 3, 
        predicate_range
        (
            [StackOverflow2013].[dbo].[Posts].[PostTypeId] = @po, 
            100.0, 
            10000000.0
        )
    )
)

The thing is, both queries end up with identical execution times to when there was no PSP involved at all.

In other words, there is no parameter sensitivity in this scenario, despite there being skew in the column data.

Even searching for the “big” result — Questions with a ParentId of zero, finishes in <30 milliseconds.

Ah well. Gotta train the models somehow.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

SQL Server 2022 Parameter Sensitive Plan Optimization: Indexing Matters

Starter Pistol


I started writing this post to answer a different question about if the Parameter Sensitive Plan (PSP) optimization worked with computed columns.

Of course it does! Even non-persisted computed columns have statistics generated on them. The only gatekeeping activity from there is whether or not your statistics qualify for the PSP optimization to kick in.

My excitement for the PSP optimization reminds me of the first time I drank Johnnie Walker Blue. I was all riled up on the hype around a $200 bottle of scotch.

Then it tasted like dessert syrup.

That’s a lot like what seeing this pop up in related Extended Events feels like, even for columns with massive amounts of skew in them.

SkewnessThresholdNotMet
how i didn’t meet your mother

Microsoft really need to introduce a hint that tells the optimizer to treat a parameter as being sensitive, and/or a way to tweak the threshold for Skewness not being met.

The heuristics are really falling short, here.

Hash And Rehash


Anyway, let’s use the example that I had started with here, to illustrate that the PSP optimization does work with a computed column, but… like any other column, indexes make all the difference.

I’m using the same example query over and over again, because a lot of the other great examples of parameter sensitivity that I have demo queries written for don’t seem to trigger it.

Here’s the computed column I started with:

ALTER TABLE 
    dbo.Posts 
ADD 
    WhatIsIt 
AS ISNULL
   (
       ParentId,
       0
   );

I’m not actually sure that there are NULL values in there, but the column is marked as NULLable so the ISNULL function doesn’t short-circuit.

Here’s the stored procedure we’ll be using:

CREATE OR ALTER PROCEDURE 
    dbo.FindMeSomePostsPlease 
(
    @ParentId int
)
AS 
SET NOCOUNT, XACT_ABORT ON;
BEGIN

    SELECT
        c = COUNT_BIG(*)
    FROM dbo.Posts AS p
    WHERE ISNULL(p.ParentId, 0) = @ParentId

END;

Bruise the lard that our optimizer can match expressions on its own.

No Index, No Cry


Using the calls that have worked to trigger the PSP optimization in the past, we do not get two important things:

  • Different cardinality estimates between executions
  • The option(plan per value... text added to the end of the plan
EXEC dbo.FindMeSomePostsPlease 
    @ParentId = 0;

EXEC dbo.FindMeSomePostsPlease 
    @ParentId = 184618;

The plans are expectedly disappointing:

SQL Server Query Plan
today, satan

Indexable


With the help of our trusty steed index, we get the PSP optimization:

CREATE INDEX 
    p
ON dbo.Posts
(
    WhatIsIt
)
WITH
(
    SORT_IN_TEMPDB = ON,
    DATA_COMPRESSION = PAGE
);

Granted, this is a toy example, but…

SQL Server Query Plan
go time

Bottom line: If you want cool new features to (sometimes) work, you’re gonna need to pay attention to indexes.

What Else About Indexes?


Well, two things. In the example above, PSP didn’t kick in until we had one, I think owing to the computed column not being materialized, but…

Sometimes you’ll get PSP without an index, but the plans won’t be much different aside from cardinality estimates

Here’s a quick example.

DECLARE
    @sql nvarchar(MAX) = N'',
    @po int = 2,
    @pa int = 184618
--184618
SELECT 
    @sql += N'  
SELECT TOP (5000)
    p.Id,
    p.OwnerUserId,
    p.Score
FROM dbo.Posts AS p
WHERE p.PostTypeId = @po
AND   p.ParentId = @pa
ORDER BY 
    p.Score DESC;
';

EXEC sys.sp_executesql
    @sql,
  N'@po int,
    @pa int',
    @po,
    @pa;

First, with no indexes:

SQL Server Query Plan
moving tops

Even though both queries have no choice but to scan the clustered index, we get slightly different plans for vastly different estimates. Sort of (ha ha ha) interestingly, if the Sort hadn’t spilled in the top plan, they would have around the same execution time.

Anyway, I’ve been experimenting with other things, but the results have been surprising. For instance, I thought key column order would push the optimizer towards marking one parameter or the other as sensitive, but that doesn’t happen.

Wild stuff.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.