Common Query Plan Patterns For Windowing Functions: Partition By Parallelism

Ramble On


Prior to SQL Server 2012, this blog post would be wrong. If you’re still on a version prior to 2012, ignore this blog post.

In fact, ignore every blog post and go upgrade those servers. Go on. Get out of here. Show’s over.

After SQL Server 2012, AKA the only versions that currently matter, this blog post is largely correct, and leans more towards correctness as you get to closer to SQL Server 2019.

Sliding scale correctness. Get on board.

Say It Slow


When windowing functions don’t have a Partition By, the parallel zone ends much earlier on than it does with one.

That doesn’t mean it’s always slower, though. My general experience is the opposite, unless you have a good supporting index.

But “good supporting index” is for tomorrow. You’re just going to have to deal with that.

WITH Comments AS 
(
    SELECT
        ROW_NUMBER() OVER
        (
            ORDER BY 
                c.CreationDate
        ) AS n
    FROM dbo.Comments AS c
)
SELECT 
    c.*
FROM Comments AS c
WHERE c.n = 0
OPTION(USE HINT('QUERY_OPTIMIZER_COMPATIBILITY_LEVEL_140'));

With just an Order By in the windowing function, our query plan looks about like so:

SQL Server Query Plan
ripper

Immediately after sorting data, our parallel streams are gathered. This is the end of our parallel zone, and it will occur regardless of if you’re filtering on the windowing function or now. I’m filtering on it here because I don’t want to spend any time retuning rows to SSMS.

Here’s an example of when a parallel zone is started again later:

WITH Comments AS 
(
    SELECT
        ROW_NUMBER() OVER
        (
            ORDER BY 
                c.CreationDate
        ) AS n
    FROM dbo.Comments AS c
)
SELECT 
    c.*
FROM Comments AS c
WHERE c.n > 100000000
OPTION(USE HINT('QUERY_OPTIMIZER_COMPATIBILITY_LEVEL_140'));
SQL Server Query Plan
innuendo

Note that there’s no Top in this plan prior to the Filter.

You’ll see a Top generally when you filter on the windowing function with an equality or less-than predicate. Greater than seems to most often not end up with a Top in the plan.

Margaret Batcher


If we let Batch Mode run free, things turn out a little bit different.

WITH Comments AS 
(
    SELECT
        ROW_NUMBER() OVER
        (
            ORDER BY 
                c.CreationDate
        ) AS n
    FROM dbo.Comments AS c
)
SELECT 
    c.*
FROM Comments AS c
WHERE c.n = 0;
SQL Server Query Plan
branded!

The Window Aggregate operator is within the parallel zone, unlike the Segment and Sequence Project operators in the Row Mode plan.

If we reuse the greater-than query from above while allowing Batch Mode to be used, we get a fully parallel plan.

SQL Server Query Plan
science can’t explain it

Paddington


Of course, Partition By adds work, especially in Row Mode, and especially without a supporting index.

SQL Server Query Plan
boulevard

The nearly 11 second Row Mode plan compared to the 1.6 second Batch Mode plan doesn’t leave a lot of room for arguing.

It’s also worth noting here that Batch Mode Sorts (at least currently) will always sort on a single thread, unless it’s the child of a Window Aggregate operator, like in the above plan.

Tomorrow, we’ll look at how indexing can improve things, but not just row store indexes!

Everyone knows about those P(artition By) O(rder By) C(overing) indexes, but does that attention to ordering matter as much with column store indexes?

Tune in to find out!

Thanks for reading.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Common Query Plan Patterns For Windowing Functions: Row vs Batch Mode

Kickball


To start things off, we’re going to talk about query plan patterns related to windowing functions.

There are several things to consider with windowing function query plans:

  • Row vs Batch mode
  • With and Without Partition By
  • Index Support for Partition and Order By
  • Column SELECTion
  • Rows vs Range/Global aggregates

We’ll get to them in separate posts, because there are particulars about them that would make covering them all in a single post unwieldy.

Anyway, the first one is pretty simple, and starting simple is about my speed.

Row Mode


I’m doing all of this work in SQL Server 2019, with the database in compatibility level 150. It makes my life easier.

First, here’s the query we’ll be using. The only difference will be removing the hint to allow for Batch Mode later on.

WITH Comments AS 
(
    SELECT
        ROW_NUMBER() OVER
        (
            PARTITION BY
                c.UserId
            ORDER BY 
                c.CreationDate
        ) AS n
    FROM dbo.Comments AS c
)
SELECT 
    c.*
FROM Comments AS c
WHERE c.n = 0
OPTION(USE HINT('QUERY_OPTIMIZER_COMPATIBILITY_LEVEL_140'));

Part of the row mode query plan will look like this:

The two operators that generate the numbering are the Segment and Sequence Project.

SQL Server Query Plan
shotted

We’ll talk about the Sort later on. For now, we can see the segment “grouping” by UserId, and the Sequence Project description notes that it works over an ordered set.

You can probably guess why we need the Sort here.

SQL Server Query Plan
consider surgery

Group is in quotes up there, because technically there’s no grouping. The “Segment1002” column in the Output List of the Segment is a computed column that marks the beginning and ending of each set of values. Likewise, the Sequence Project outputs “Expr1001”, which in this case is the calculated row number.

Batch Mode


In Batch Mode, there are three operators associated with windowing functions that get replaced with a single operator: the Window Aggregate.

The operators that get replaced are two we’ve already seen — Segment and Sequence Project, along with one we’ll see in a future post, the Window Spool.

SQL Server Query Plan
richie rich

We still need to Sort data for it without a supporting index. Gosh, those indexes sure are magickal.

The details of the Window Aggregate do still show a sequence generated, but we no longer see the “grouping”.

SQL Server Query Plan
a society

Baby Steps


Batch Mode kicks the pantalones off of Row Mode when it comes to window functions, but that’s not really the point of the post.

If you’re using a relatively modern version of SQL Server and also windowing functions, you should look at various ways to get Batch Mode processing alongside them.

Unless you’re on Standard Edition, probably.

In the next entry in this series, we’ll look at how the absence and presence of Partition By changes parallelism.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Common Query Plan Patterns: Introduction

Muy Bueno


I’m writing this series because as interesting as single operators can be, you rarely run into interesting query plans that are a single operator. I don’t know exactly how many posts this will end up being. I have a list of about 15 things that I’d like to write about.

Anyway, I’d been thinking about something like this for a while, because being able to understand  which part of a query generates which part of a query plan can help you focus in on what you need to work on. It’s also helpful to understand how different aspects of your database design and written queries might manifest in query plans.

You know, for performance.

Anyway, I hope you enjoy it. This post will be remarkably empty on first read, but will populate over time.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Temporary Object Concurrency With In-Memory tempdb Metadata In SQL Server 2019

Moisty



 

Thanks for watching!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Updates To sp_PressureDetector

These Colors Don’t Gum


If you head to my GitHub repo, there’s an update to sp_PressureDetector, a free script I publish to help you figure out when your server is under CPU or memory pressure.

While working with a client recently, I found it useful to illustrate how much of their memory was used by different memory consumers.

  • Buffer pool
  • Total other
  • Top 5 other

It helps put in perspective just how many different things are asking for a piece of your precious RAM.

blue

This is a bad example from my personal server laptop, but you get the idea.

Anyway, that’s that. clink over to my GitHub repo to download it.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Creating An Agent Job To Update Statistics Using Ola Hallengren’s Scripts

Hubba Hubba


Over in my GitHub repo, I’ve added a file that will create an Agent Job to update statistics using Ola Hallengren’s IndexOptimize script.

It’s something I hand out enough that I figured people might be interested in it. Currently, it’s not a default offering from Ola, it’s uh… custom code.

There are lots of people who should be using this, too.

  • Everyone

Because index maintenance scripts don’t measure a generally useful KPI, and one of the main benefits of index rebuilds is the statistics update.

Mindful


Some thing to keep in mind here:

  • You need to be using a relatively new version of Ola’s scripts
  • This script utilizes the @StatisticsModificationLevel parameter, added 2018-06-16
  • That parameter is currently set to 5, and you may need to change that depending on your environement

There are some things you’ll need to change in the script, if you’re doing anything really custom:

  • It targets the master database
  • It’s owned by the sa account
  • It’s set to run at midnight
  • It has no failure emails or alerts set up

This is a very vanilla starting place. It’s up to you to make it yours.

To report any issues with Ola’s scripts, head over to this GitHub repo.

To get the Agent Job script, head over to my GitHub repo

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Partitioning and Parallelism: Query Performance Oddities In SQL Server

Facing Up


Yesterday we looked at how parallel scans grab rows from partitioned tables. All of those queries were performing full scans of the Votes table, and in some cases threads were given no work to do. That could look troubling in a parallel query, because we’ve given worker threads to this query and they appear to be doing nothing.

Despite my oft frustration with Repartition Streams, it can come in quite handy, especially here.

In today’s experiment, we’re going to look at how plans running at different DOPs can be faster when seeking to partitions.

I’ll let you decide if these queries are being run on Standard Edition, or Developer/Enterprise Edition.

The setup


Because I want to test seeking into partitions, and a potentially more realistic query scenario with a couple joins, the query form is changing a little bit.

SELECT
    DATEPART(YEAR, vp.CreationDate) AS VoteYear,
    DATEPART(MONTH, vp.CreationDate) AS VoteMonth,
    COUNT_BIG(DISTINCT vp.PostId) AS UniquePostVotes,
    SUM(vp.BountyAmount) AS TotalBounties
FROM dbo.Votes16_p AS vp
JOIN dbo.Comments AS c
    ON c.PostId = vp.PostId
JOIN dbo.Users AS u
    ON u.Id = c.UserId
WHERE vp.CreationDate BETWEEN '2013-01-01 00:00:00.000'
                          AND '2013-12-31 00:00:00.000'
GROUP BY
    DATEPART(YEAR, vp.CreationDate),
    DATEPART(MONTH, vp.CreationDate)

Very realistically bad, that. You people.

The way this is written, the query will access two partitions that contain data for the year 2013.

CREATE PARTITION FUNCTION VoteYear16_pf(DATETIME)
    AS RANGE RIGHT FOR VALUES
    (
        '20080101',
        '20080601',
        '20090101',
        '20090601',	    
        '20100101',
        '20100601',
        '20110101',
        '20110601',
        '20120101',
        '20120601',
        '20130101',
        '20130601',
        '20140101',
        '20140601'
    );
GO

At DOP 4


All four threads seeking into the Votes table get rows to work on, and the entire thing runs for close enough to 14 seconds for me not to care.

buggy
splitted

If one were to find themselves in the midst of having nothing to do, one might find that two threads each got groovy with two partitions.

At DOP 8


Things get a little awkward.

?

We end up with three threads that don’t do anything, sort of like in yesterday’s post when we had to touch empty partitions.

Despite the seek into the Votes table here taking about 350ms longer, the query overall runs about 5 seconds faster.

Fortunately, there’s a helpful repartition streams after the index seek that rebalances rows on threads.

thanks, robot

I’m not adding all those up for you.

At DOP 16


Cha-ching, etc.

dopophobe

Eight threads get work, and eight don’t. The seek now takes a touch longer, but the query itself now finishes in 7.6 seconds.

Just like above, a repartition streams after the seek evens out row distributions.

Thinkin’ Tree


Even though higher DOPs are technically less helpful seeking into the Votes table, they obviously have some benefit to other operations in the query plan.

Whether or not it’s always worth the sacrifice takes some testing, and it might change based on how many partitions you’re touching.

Don’t be too concerned about lopsided parallelism at the seek or scan, as long as you have a repartition streams that adequately rebalances things afterwards.

But as people not from Australia are fond of saying, you should never run queries above DOP 8 anyway ?

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Table Partitioning And Parallelism In SQL Server Standard Edition

Jammed, Not Jellied


UPDATE 2021-04-14: Microsoft has updated the documentation for all 2016+ versions of SQL Server to indicate that parallelism is available for partitioned tables in non-Enterprise versions.

There’s been a note in the documentation since SQL Server 2016 SP1 brought certain Programmability Features™ to Standard Edition with regard to parallelism differences between Standard and Enterprise Edition.

howdy

For the sake of completeness, I did all my testing across both Standard and Developer Editions of SQL Server and couldn’t detect a meaningful difference.

Additionally, documentation about Parallel Query Execution Strategy for Partitioned Objects doesn’t note any differences in strategy between the two Editions.

There may be scenarios outside of the ones I tested that do show a difference, but, uh. I didn’t test those.

Obviously.

Every table is going to test this query at different DOPs.

SELECT
    DATEPART(YEAR, vp.CreationDate) AS VoteYear,
    DATEPART(MONTH, vp.CreationDate) AS VoteMonth,
    COUNT_BIG(DISTINCT vp.PostId) AS UniquePostVotes,
    SUM(vp.BountyAmount) AS TotalBounties
FROM dbo.Votes_p AS vp
GROUP BY
    DATEPART(YEAR, vp.CreationDate),
    DATEPART(MONTH, vp.CreationDate);

Two Partitions


Here’s the setup:

CREATE PARTITION FUNCTION VoteYear2013_pf(DATETIME)
    AS RANGE RIGHT FOR VALUES
    (
        '20130101'
    );
GO


CREATE PARTITION SCHEME VoteYear2013_ps
    AS PARTITION VoteYear2013_pf
    ALL TO ([PRIMARY]);

DROP TABLE IF EXISTS dbo.Votes2013_p;

CREATE TABLE dbo.Votes2013_p
(
    Id int NOT NULL,
    PostId int NOT NULL,
    UserId int NULL,
    BountyAmount int NULL,
    VoteTypeId int NOT NULL,
    CreationDate datetime NOT NULL,
    CONSTRAINT PK_Votes2013_p_Id
        PRIMARY KEY CLUSTERED (CreationDate, Id)
) ON VoteYear2013_ps(CreationDate);


INSERT dbo.Votes2013_p WITH(TABLOCK)
    (Id, PostId, UserId, BountyAmount, VoteTypeId, CreationDate)
SELECT v.Id,
       v.PostId,
       v.UserId,
       v.BountyAmount,
       v.VoteTypeId,
       v.CreationDate
FROM dbo.Votes AS v;

The data split looks like this:

SQL Server Query Results
not a good use of partitioning

Running our test query at DOP 4, there are slight differences in counts across threads, but slight timing differences can explain that.

SQL Server Query Plan
bonker

Standard Edition is on top, Developer Edition is at the bottom. There is a ~200ms difference here, but averaged out over multiple runs things end up pretty dead even.

Even looking at the row counts per thread, the distribution is close across both versions. I think it’s decently clear that the four threads work cooperatively across both partitions. A similar pattern continues at higher DOPs, too. I tested 8 and 16, and while there were slight differences in row counts per thread, there was a similar distribution pattern as at DOP 4.

Eight Partitions


Using a different partitioning function:

CREATE PARTITION FUNCTION VoteYear_pf(DATETIME)
    AS RANGE RIGHT FOR VALUES
    (
        '20080101',
        '20090101',
        '20100101',
        '20110101',
        '20120101',
        '20130101',
        '20140101'
    );
GO

We’re going to jump right to testing the query at DOP 8.

SQL Server Query Plan
dartford

Again, different threads end up getting assigned the work, but row counts match exactly across threads that did get work, and those numbers line up exactly to the number of rows in each partition.

SQL Server Query Results
pattern forming

In both queries, two threads scanned a partition with no rows and did no work. Each thread that did scan a partition scanned only one partition.

At DOP 16, the skew gets a bit worse, because now four threads do no work.

SQL Server Query Plan
crap

The remaining threads all seem to split the populated partitions evenly, though again there are slight timing differences that result in different row counts per thread, but it’s pretty clear that there is cooperation here.

At DOP 4, things get a bit more interesting.

SQL Server Query Plan
bedhead

In both queries, two threads scan exactly one partition.

The rows with arrows pointing at them represent numbers that exactly match the number of rows in a single partition.

The remaining threads have exactly the same row counts across versions.

Fifteen Partitions


The results here show mostly the same pattern as before, so I’m keeping it short.

CREATE PARTITION FUNCTION VoteYear16_pf(DATETIME)
    AS RANGE RIGHT FOR VALUES
    (
        '20080101',
        '20080601',
        '20090101',
        '20090601',	    
        '20100101',
        '20100601',
        '20110101',
        '20110601',
        '20120101',
        '20120601',
        '20130101',
        '20130601',
        '20140101',
        '20140601'
    );
GO

At DOP 4 and 8, threads work cooperatively across partitions. Where things get interesting (sort of) is at DOP 16.

SQL Server Query Results
craptastic

The four empty partitions here result in 4 threads doing no work in Developer/Enterprise Edition, and 5 threads doing no work in Standard Edition.

SQL Server Query Plan
donkey

At first, I thought this might be a crack in the case, so I did things a little bit differently. In a dozen or so runs, the 5 empty threads only seemed to occur in the Standard Edition query. Sometimes it did, sometimes it didn’t. But it was at least something.

Fifteen Partitions, Mostly Empty


I used the same setup as above, but this time I didn’t fully load data from Votes in:

INSERT dbo.Votes16e_p WITH(TABLOCK)
    (Id, PostId, UserId, BountyAmount, VoteTypeId, CreationDate)
SELECT v.Id,
       v.PostId,
       v.UserId,
       v.BountyAmount,
       v.VoteTypeId,
       v.CreationDate
FROM dbo.Votes AS v
WHERE v.CreationDate >= '20130101';

And… Scene!

SQL Server Query Plan
flop

That’s Just Great


Aside from one case where an extra thread got zero rows in Standard Edition, the behavior across the board looks the same.

Most of the behavior is sensible, but cases where multiple threads get no rows and don’t move on to other partitions is a little troubling.

Not that anyone has partitioning set up right anyway.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

A Parameterization Performance Puzzle With TOP PERCENT in SQL Server

Lawdy


There was a three-part series of posts where I talked about a weird performance issue you can hit with parameterized top. While doing some query tuning for a client recently, I ran across a funny scenario where they were using TOP PERCENT to control the number of rows coming back from queries.

With a parameter.

So uh. Let’s talk about that.

Setup Time


Let’s start with a great index. Possibly the greatest index ever created.

CREATE INDEX whatever 
ON dbo.Votes
    (VoteTypeId, CreationDate DESC)
WITH
(
    MAXDOP = 8,
    SORT_IN_TEMPDB = ON
);
GO

Now let me show you this stored procedure. Hold on tight!

CREATE OR ALTER PROCEDURE dbo.top_percent_sniffer 
(
    @top bigint, 
    @vtid int
)
AS
SET NOCOUNT, XACT_ABORT ON;
BEGIN
    SELECT TOP (@top) PERCENT
        v.*
    FROM dbo.Votes AS v
    WHERE v.VoteTypeId = @vtid
    ORDER BY v.CreationDate DESC;

END;

Cool. Great.

Spool Hardy


When we execute the query, the plan is stupid.

EXEC dbo.top_percent_sniffer
    @top = 1,
    @vtid = 6;
GO
SQL Server Query Plan
the louis vuitton of awful

We don’t use our excellent index, and the optimizer uses an eager table spool to hold rows and pass the count to the TOP operator until we hit the correct percentage.

This is the least ideal situation we could possibly imagine.

Boot and Rally


A while back I posted some strange looking code on Twitter, and this is what it ended up being used for (among other things).

The final version of the query looks like this:

CREATE OR ALTER PROCEDURE dbo.top_percent_sniffer 
(
    @top bigint, 
    @vtid int
)
AS
SET NOCOUNT, XACT_ABORT ON;
BEGIN;
    
    WITH pct AS
    (
        SELECT
            records = 
                CONVERT(bigint, 
                    CEILING(((@top * COUNT_BIG(*)) / 100.)))
        FROM dbo.Votes AS v
        WHERE v.VoteTypeId = @vtid
    )
    SELECT
        v.*
    FROM pct
    CROSS APPLY
    (
        SELECT TOP (pct.records)
            v.*
        FROM dbo.Votes AS v
        WHERE v.VoteTypeId = @vtid
        ORDER BY v.CreationDate DESC
    ) AS v;

END;
GO
SQL Sever Query Plan
better butter

Soul Bowl


This definitely has drawbacks, since the expression in the TOP always gives a 100 row estimate. For large numbers of rows, this plan could be a bad choice and we might need to do some additional tuning to get rid of that lookup.

There might also be occasions when using a column store index to generate the count would be benefit, and the nice thing here is that since we’re accessing the table in two different ways, we could use two different indexes.

But for reliably small numbers of rows, this is a pretty good solution.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.