How Useful Is Column Store Indexing In SQL Server Standard Edition?

Speed Limit


When I’m blogging about performance tuning, most of it is from the perspective of Enterprise Edition. That’s where you need to be if you’re serious about getting SQL Server to go as fast as possible. Between the unrealistic memory limits and other feature restrictions, Standard Edition just doesn’t hold up.

Sure, you can probably get by with it for a while, but once performance becomes a primary concern it’s time to fork over an additional 5k a core for the big boat.

They don’t call it Standard Edition because it’s The Standard, like the hotel. Standard is a funny word like that. It can denote either high or low standing through clever placement of “the”.  Let’s try an experiment:

  • Erik’s blogging is standard for technical writing
  • Erik’s blogging is the standard for technical writing

Now you see where you stand with standard edition. Not with “the”, that’s for sure. “The” has left the building.

Nerd Juice


A lot of the restrictions for column store in Standard Edition are documented, but:

  • DOP limit of two for queries
  • No parallelism for creating or rebuilding indexes
  • No aggregate pushdown
  • No string predicate pushdown
  • No SIMD support

Here’s a comparison for creating a nonclustered column store index in Standard and Enterprise/Developer Editions:

SQL Server Query Plan
your fly is down

The top plan is from Standard Edition, and runs for a minute in a full serial plan. There is a non-parallel plan reason in the operator properties: MaxDOPSetToOne.

I do not have DOP set to one anywhere, that’s just the restriction kicking in. You can try it out for yourself if you have Standard Edition sitting around somewhere. I’m doing all my testing on SQL Server 2019 CU9. This is not ancient technology at the time of writing.

The bottom plan is from Enterprise/Developer Edition, where the the plan is able to run partially in parallel, and takes 28 seconds (about half the time as the serial plan).

Query Matters


One of my favorite query tuning tricks is getting batch mode to happen on queries that process a lot of rows. It doesn’t always help, but it’s almost always worth trying.

The problem is that on Standard Edition, if you’re processing a lot of rows, being limited to a DOP of 2 can be a real hobbler. In many practical cases, a batch mode query at DOP 2 will end up around the same as a row mode query at DOP 8. It’s pretty unfortunate.

In some cases, it can end up being much worse.

SELECT 
    MIN(p.Id) AS TinyId,
    COUNT_BIG(*) AS records
FROM dbo.Posts AS p WITH(INDEX = ncp)
JOIN dbo.Votes AS v
    ON p.Id = v.PostId
WHERE p. OwnerUserId = 22656;

SELECT 
    MIN(p.Id) AS TinyId,
    COUNT_BIG(*) AS records
FROM dbo.Posts AS p WITH(INDEX = 1)
JOIN dbo.Votes AS v
    ON p.Id = v.PostId
WHERE p. OwnerUserId = 22656;

Here’s the query plan for the first one, which uses the nonclustered column store index on Posts. There is no hint or setting that’s keeping DOP at 2, this really is just a feature restriction.

SQL Server Query Plan
drop it like it’s dop

Higher Ground


The second query, which is limited by the MAXDOP setting to 8, turns out much faster. The batch mode query takes 3.8 seconds, and the row mode query takes 1.4 seconds.

SQL Server Query Plan
it’s a new craze

In Enterprise Edition, there are other considerations for getting batch mode going, like memory grant feedback or adaptive joins, but those aren’t available in Standard Edition.

In a word, that sucks.

Dumb Limit


The restrictions on creating and rebuilding column store indexes to DOP 1 (both clustered and nonclustered), and queries to DOP 2 all seems even more odd when we consider that there is no restriction on inserting data into a table with a column store index on it.

As an example:

SELECT 
    p.*
INTO dbo.PostsTestLoad
FROM dbo.Posts AS p
WHERE 1 = 0;

CREATE CLUSTERED COLUMNSTORE INDEX pc ON dbo.PostsTestLoad;

SET IDENTITY_INSERT dbo.PostsTestLoad ON;

INSERT dbo.PostsTestLoad WITH(TABLOCK)
(
    Id, AcceptedAnswerId, AnswerCount, Body, ClosedDate, 
    CommentCount, CommunityOwnedDate, CreationDate, 
    FavoriteCount, LastActivityDate, LastEditDate, 
    LastEditorDisplayName, LastEditorUserId, OwnerUserId, 
    ParentId, PostTypeId, Score, Tags, Title, ViewCount 
)
SELECT TOP (1024 * 1024)
    p.Id, p.AcceptedAnswerId, p.AnswerCount, p.Body, p.ClosedDate, p.
    CommentCount, p.CommunityOwnedDate, p.CreationDate, p.
    FavoriteCount, p.LastActivityDate, p.LastEditDate, p.
    LastEditorDisplayName, p.LastEditorUserId, p.OwnerUserId, p.
    ParentId, p.PostTypeId, p.Score, p.Tags, p.Title, p.ViewCount 
FROM dbo.Posts AS p;

SET IDENTITY_INSERT dbo.PostsTestLoad OFF;
SQL Server Query Plan
smells like dop spirit

Unsupportive Parents


These limits are asinine, plain and simple, and I hope at some point they’re reconsidered. While I don’t expect everything from Standard Edition, because it is Basic Cable Edition, I do think that some of the restrictions go way too far.

Perhaps an edition somewhere between Standard and Enterprise would make sense. When you line the two up, the available features and pricing are incredibly stark choices.

There are often mixed needs as well, where some people need Standard Edition with fewer HA restrictions, and some people need it with fewer performance restrictions.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Another Reason Why I Love Dynamic SQL IN SQL Server: OUTPUT Parameters Can Be Input Parameters

Well-Treaded


A lot has been written about dynamic SQL over the years, but I ran into a situation recently where I needed to rewrite some code that needed it with minimal disruption to other parts of a stored procedure.

The goal was to set a bunch of variables equal to column values in a table, but the catch was that some of the values that needed to be set also needed to be passed in as search arguments. Here’s a really simplified example:

DECLARE 
    @i int = 4,
    @s nvarchar(MAX) = N'';

SET 
    @s += N'
SELECT TOP (1) 
    @i = d.database_id
FROM sys.databases AS d
WHERE d.database_id > @i
ORDER BY d.database_id;
'
EXEC sys.sp_executesql
    @s,
  N'@i INT OUTPUT',
    @i OUTPUT;
    
SELECT 
    @i AS input_output;

The result is this:

2021 03 12 16 59 01
sinko

All Points In Between


Since we declare @i outside the dynamic SQL and set it to 4, it’s known to the outer scope.

When we execute the dynamic SQL, we tell it to expect the @i parameter, so we don’t need to declare a separate holder variable inside.

We also tell the dynamic SQL block that we expect to output a new value for @i.

While we’re also passing in @i as a parameter.

Mindblowing.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Trace Flag 8048 May Still Be Necessary After SQL Server 2016

Big Box


Way back when, SQL Servers with lots of cores could hit some weird contention on CMEMTHREAD. To fix that and a lot of other issues associated with running on a large server, people would resort to all sorts of trace flags.

My dear friend L_____ (b|t) has a list. Maybe not the most up to date list, but there are a lot of good references in the post.

Something we’ve both observed working with large servers is that Trace Flag 8048 might still be necessary under some circumstances.

Two Reasons


Starting with SQL Server 2016, it partitions memory dynamically. That’s cool.

Beats what it used to do by a country mile. Maybe even a continent mile.

But there are two problems you can run into:

  • You don’t meet the dynamic threshold, but still generate enough activity to run into contention
  • You run into issues faster than the dynamic threshold gets hit and starts taking effect

Many tools in the First Responder Kit will warn you about high CMEMTHREAD waits. You may even see them from heavy Query Store use.

If you’re running a large server, this trace flag may still be of value even after SQL Server 2016.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

A Useful Rewrite Using APPLY To Improve Query Performance In SQL Server

More To The Matter


In the year 950 B.C., Craig Freedman write a post about subqueries in CASE expressions. It’s amazing how relevant so much of this stuff stays.

In today’s post, we’re going to look at a slightly different example than the one given, and how you can avoid performance problems with them by using APPLY.

Like most query tuning tricks, this isn’t something you always need to employ, and it’s not a best practice. It’s just something you can use when a scalar subquery doesn’t perform as you’d like it to.

How Much Wood


Our starting query looks like this. The point of it is to determine the percentage of answered questions per month.

SELECT 
    x.YearPeriod,
    MonthPeriod = 
        RIGHT('00' + RTRIM(x.MonthPeriod), 2),
    PercentAnswered = 
        CONVERT(DECIMAL(18, 2), 
           (SUM(x.AnsweredQuestion * 1.) /
           (COUNT_BIG(*) * 1.)) * 100.)
FROM
(
    SELECT 
        YearPeriod = YEAR(p.CreationDate),
        MonthPeriod = MONTH(p.CreationDate),
        CASE 
            WHEN EXISTS
                 ( 
                     SELECT 
                         1/0
                     FROM dbo.Votes AS v
                     WHERE v.PostId = p.AcceptedAnswerId
                     AND   v.VoteTypeId = 1 
                 ) 
            THEN 1
            ELSE 0
        END AS AnsweredQuestion
    FROM dbo.Posts AS p
    WHERE p.PostTypeId = 1
) AS x
GROUP BY 
    x.YearPeriod, 
    x.MonthPeriod
ORDER BY 
    x.YearPeriod ASC, 
    x.MonthPeriod ASC;

Smack in the middle of it, we have a case expression that goes looking for rows in the Votes table where a question has an answer that’s been voted as the answer.

Amazing.

To start with, we’re going to give it this index.

CREATE INDEX p 
    ON dbo.Posts(PostTypeId, AcceptedAnswerId) 
    INCLUDE(CreationDate) 
WITH(MAXDOP = 8, SORT_IN_TEMPDB = ON);

Planpains


In all, this query will run for about 18 seconds. The majority of it is spent in a bad neighborhood.

SQL Server Query Plan
but first

Why does this suck? Boy oh boy. Where do we start?

  • Sorting the Votes table to support a Merge Join?
  • Choosing Parallel Merge Joins ever?
  • Choosing a Many To Many Merge Join ever?
  • All of the above?

Bout It


If we change the way the query is structured to use OUTER APPLY instead, we can get much better performance in this case.

SELECT 
    x.YearPeriod,
    MonthPeriod = 
        RIGHT('00' + RTRIM(x.MonthPeriod), 2),
    PercentAnswered = 
        CONVERT(DECIMAL(18, 2), 
           (SUM(x.AnsweredQuestion * 1.) /
           (COUNT_BIG(*) * 1.)) * 100.)
FROM
(
    SELECT 
        YearPeriod = YEAR(p.CreationDate),
        MonthPeriod = MONTH(p.CreationDate),
        oa.AnsweredQuestion
    FROM dbo.Posts AS p
    OUTER APPLY 
    (
        SELECT 
            AnsweredQuestion = 
                CASE 
                    WHEN v.Id IS NOT NULL 
                    THEN 1 
                    ELSE 0 
                END
        FROM dbo.Votes AS v
        WHERE v.PostId = p.AcceptedAnswerId
        AND   v.VoteTypeId = 1
    ) oa
    WHERE p.PostTypeId = 1
) AS x
GROUP BY 
    x.YearPeriod, 
    x.MonthPeriod
ORDER BY 
    x.YearPeriod ASC, 
    x.MonthPeriod ASC;

This changes the type of join chosen, and runs for about 3 seconds total.

SQL Server Query Plan
buttercup

We avoid all of the problems that the parallel many-to-many Merge Join brought us.

Thanks, Hash Join.

It’s also worth noting that the OUTER APPLY plan asks for an index that would help us a bit, though like most missing index requests it’s a bit half-baked.

USE [StackOverflow2013]
GO
CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[Votes] ([VoteTypeId])
INCLUDE ([PostId])
GO

Index Plus


Any human could look at this query and realize that having the PostId in the key of the index would be helpful, since we’d have it in secondary order to the VoteTypeId column

CREATE INDEX v 
    ON dbo.Votes(VoteTypeId, PostId) 
WITH(MAXDOP = 8, SORT_IN_TEMPDB = ON);

If we add that index, we can make the subquery fairly competitive, at about 4.5 seconds total.

SQL Server Query Plan
bloop join

But the issue here is now rather than poorly choosing a Sort > Merge Join, we go into a Nested Loops join for ~6 million rows. That’s probably not a good idea.

This index doesn’t leave as profound a mark on the APPLY version of the query. It does improve overall runtime by about half a second, but I don’t think I’d create an index just to get a half second better.

SQL Server Query Plan
astro weiner

But hey, who knows? Maybe it’d help some other queries, too.

Indexes are cool like that.

Back On The Map


If you’ve got subqueries in your select list that lead to poor plan choices, you do have options. Making sure you have the right indexes in place can go a long way.

You may be able to get competitive performance gains by rewriting them as OUTER APPLY. You really do need to use OUTER here though, because it won’t restrict rows and matches the logic of the subquery. CROSS APPLY would act like an inner join and remove any rows that don’t have a match. That would break the results.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

SQL Server Spool Operators Are Just Crappy Temp Tables

But My Tempdb


Using the scenario from yesterday’s post as an example of why you might want to think about rewriting queries with Table Spools in them to use temp tables instead, look how the optimizer chooses a plan with an Eager Table Spool.

The “Eager” part means the entire set of rows is loaded into a temporary object at once.

SQL Server Query Plan
drugas

That’s a lot of rows, innit? Stick some commas in there, and you might just find yourself staring down the barrel of a nine digit number.

Worse, we spend a long time loading data into the spool, and doing so in a serial zone. There’s no good way to know exactly how long the load is because of odd operator times.

If you recall yesterday’s post, the plan never goes back to parallel after that, either. It runs for nearly 30 minutes in total.

Yes Your Tempdb


If you’re gonna be using that hunka chunka tempdb anyway, you might as well use it efficiently. Unless batch mode is an option for you, either as Batch Mode On Rowstore, or tricking the optimizer, this might be your best bet.

Keep in mind that Standard Edition users have an additional limitation where Batch Mode queries are limited to a DOP of 2, and don’t have access to Batch Mode On Rowstore as of this writing. The DOP limitation especially might make the trick unproductive compared to alternatives that allow for MOREDOP.

For example, if we dump that initial join into a temp table, it only takes about a minute to get loaded at a DOP of 8. That is faster than loading data into the spool (I mean, probably. Just look at that thing.).

SQL Server Query Plan
sweet valley high

The final query to do the distinct aggregations takes about 34 seconds.

SQL Server Query
lellarap

Another benefit is that each branch that does a distinct aggregation is largely in a parallel zone until the global aggregate.

SQL Server Query
muggers

In total, both queries finish in about a 1:45. A big improvement from nearly 30 minutes relying on the Eager Table Spool and processing all of the distinct aggregates in a serial zone. The temp table here doesn’t have that particular shortcoming.

In the past, I’ve talked a lot about Eager Index Spools. They have a lot of problems too, many of which are worse. Of course, we need indexes to fix those, not temp tables.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Multiple Distinct Aggregates: Still Harm Performance Without Batch Mode In SQL Server

Growler


Well over 500 years ago, Paul White wrote an article about distinct aggregates. Considering how often I see it while working with clients, and that Microsoft created column store indexes and batch mode rather than allow for hash join hints on CLR UDFs, the topic feels largely ignored.

But speaking of all that stuff, let’s look at how Batch Mode fixes multiple distinct aggregates.

Jumbo Size


A first consideration is around parallelism, since you don’t pay attention or click links, here’s a quote you won’t read from Paul’s article above:

Another limitation is that this spool does not support parallel scan for reading, so the optimizer is very unlikely to restart parallelism after the spool (or any of its replay streams).

In queries that operate on large data sets, the parallelism implications of the spool plan can be the most important cause of poor performance.

What does that mean for us? Let’s go look. For this demo, I’m using SQL Server 2019 with the compatibility level set to 140.

SELECT
   COUNT_BIG(DISTINCT v.PostId) AS PostId,
   COUNT_BIG(DISTINCT v.UserId) AS UserId,
   COUNT_BIG(DISTINCT v.BountyAmount) AS BountyAmount,
   COUNT_BIG(DISTINCT v.VoteTypeId) AS VoteTypeId,
   COUNT_BIG(DISTINCT v.CreationDate) AS CreationDate
FROM dbo.Votes AS v;

In the plan for this query, we scan the clustered index of the Votes table five times, or once per distinct aggregate.

SQL Server Query Plan
skim scan

In case you’re wondering, this results in one intent shared object lock on the Votes table.

<Object name="Votes" schema_name="dbo">
  <Locks>
    <Lock resource_type="OBJECT" request_mode="IS" request_status="GRANT" request_count="9" />
    <Lock resource_type="PAGE" page_type="*" index_name="PK_Votes__Id" request_mode="S" request_status="GRANT" request_count="14" />
  </Locks>
</Object>

This query runs for 38.5 seconds, as the crow flies.

SQL Server Query Plan
push the thing

A Join Appears


Let’s join Votes to Posts for no apparent reason.

SELECT
   COUNT_BIG(DISTINCT v.PostId) AS PostId,
   COUNT_BIG(DISTINCT v.UserId) AS UserId,
   COUNT_BIG(DISTINCT v.BountyAmount) AS BountyAmount,
   COUNT_BIG(DISTINCT v.VoteTypeId) AS VoteTypeId,
   COUNT_BIG(DISTINCT v.CreationDate) AS CreationDate
FROM dbo.Votes AS v
JOIN dbo.Posts AS p
    ON p.Id = v.PostId;

The query plan now has two very distinct (ho ho ho) parts.

SQL Server Query Plan
problemium

This is part 1. Part 1 is a spoiler. Ignoring that Repartition Streams is bizarre and Spools are indefensible blights, as we meander across the execution plan we find ourselves at a stream aggregate whose child operators have executed for 8 minutes, and then a nested loops join whose child operators have run for 20 minutes and 39 seconds. Let’s go look at that part of the plan.

SQL Server Query Plan
downstream

Each branch here represents reading from the same spool. We can tell this because the Spool operators do not have any child operators. They are starting points for the flow of data. One thing to note here is that there are four spools instead of five, and that’s because one of the five aggregates was processed in the first part of the query plan we looked at.

The highlighted branch is the one that accounts for the majority of the execution time, at 19 minutes, 8 seconds. This branch is responsible for aggregating the PostId column. Apparently a lack of distinct values is hard to process.

But why is this so much slower? The answer is parallelism, or a lack thereof. So, serialism. Remember the 500 year old quote from above?

Another limitation is that this spool does not support parallel scan for reading, so the optimizer is very unlikely to restart parallelism after the spool (or any of its replay streams).

In queries that operate on large data sets, the parallelism implications of the spool plan can be the most important cause of poor performance.

Processing that many rows on a single thread is painful across all of the operators.

Flounder Edition


With SQL Server 2019, we get Batch Mode On Row store when compatibility level gets bumped up to 150.

The result is just swell.

 

SQL Server Query Plan
yes you can

The second query with the join still runs for nearly a minute, but 42 seconds of the process is scanning that big ol’ Posts table.

Grumpy face.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Performance Tuning Memory Queries For SQL Server

I am a video



Thanks for watching!

Video Summary

In this video, I dive into a real-world scenario where I used my SSMS tuning setup to optimize a query for one of my help reviews. The query in question was particularly slow and caused significant delays during live demos due to its execution time. After analyzing the execution plan, which took over 2 minutes to run, I decided to tackle the issue by rewriting the query to eliminate unnecessary join conditions and reduce redundant operations. This approach not only sped up the query significantly but also provided a clearer, more efficient execution path for SQL Server to follow. By sharing this experience, I hope to offer some practical insights into optimizing complex queries for smoother performance in real-world scenarios.

Full Transcript

Oh boy, oh my goodness. This is, this is quite a day, quite a Friday. Uh, Erik Darling here with Erik Darling Data. It would be Darling Data, but someone else, someone else got there first. I mean, for the domain. I mean, I still have the company. Whatever. I mean, no one else got here first. That would be awkward. Like, hostile takeover of my own company. Anyway. Uh, today we are drinking something red that my wife walked in and handed me, so I have no details. Uh, it smells nice though. And, uh, this video is, uh, going to build, oh geez, we’re building things, we’re building on things. There’s a pattern forming here. Uh, where we’re going to, uh, build a little bit on a video that I recorded, I think, last week. I don’t remember.

The weeks are still, time is still very strange to me. Uh, talking about, you know, how I set up SSMS to tune queries. And here’s an example of actually when I used it to do that same, to do that very thing. Same very thing. Uh, recently. To, uh, one of my help reviews. Uh, what’s up memory? And, uh, I don’t know why I named it that. I was, I guess because who is memory sounded stupid. Uh, so, yeah. Uh, so this, this, this help review will tell you, uh, what’s in, what’s in your buffer pool currently. And that’s helpful, I guess, because everyone wants to know what’s in their buffer pool constantly all the time.

Uh, I use it for demos that I do for things that, uh, I find helpful. So, I don’t know. Maybe you will too. I don’t know. If you don’t, go do something else with your life. I don’t care that much. Um, but, what I found is every time I wanted to use this thing on a, uh, on an instance with a good chunk of memory, it would be, it was slow. Terribly, dreadfully, painfully slow. If we look at the execution plan, uh, we see that it took about 2 minutes and 44 seconds to run.

And, uh, who could wait that long for anything? Uh, and it made, uh, certain demos very difficult to do live or even to, like, sort of do off-the-cuff recording. I would have to, like, pre-stage everything so that it would be set up here. So, we were just, like, like, sitting around waiting for this and awkwardly staring at the camera.

Or, like, doing a cut scene or a montage, lifting weights in my short shorts, running on a beach, karate-kicking seagulls, whatever. Uh, but, I mean, just, you know, kind of, like, digging through the execution plan. Again, very helpfully looking at the operator times under these things, right?

So, we no longer have to care about percentages because, I mean, not that they meant anything anyway. Shut up, car. But, it was helpful, it’s much more helpful to sort of, like, follow the yellow brick road of operator times to when things sort of drop off and pick up.

So, kind of framing this a little bit, we have this nested loops join that hits 2.15. And we have this concatenation that is at 0.001. And down here, we have this table spool, which takes a minute and 36 seconds.

Prior to that table spool, we do some nonsense where we hit this view, sysalloc units. And, uh, I’m mispronouncing that would have been deadly. Uh, and this one down here, sys.buffer descriptors.

And we sort of join those together. And then we spend a bunch of time in the spool and a bunch of time in this nested loops join. And this, and the spool, of course, is trying to save us some trouble with the nested loops join.

It’s like, oh, I don’t know how repetitive this stuff is going to be. I don’t know if I want to do everything down here over and over again. I’m going to use a spool to cache some information and reuse it if I can.

Uh, you know, I don’t know how helpful this spool actually was. We look at the properties of the spool. And we look at the, uh, the rebinds and the rewinds.

Well, you know, I guess, I guess it’s okay. Deal with it. But, uh, I don’t like this spool. And if you look at the nested loops join, and kind of zoom in on the tool tip, it’s all happening because of this very, very difficult predicate.

Uh, there’s a lot of and and or logic in this join condition. Let’s go zoom in on what the join condition looked like beforehand. Uh, and this is, you know, sort of many times looking at, uh, queries and looking at, uh, you know, things that people wrote up in the real world.

This is like the enemy of performance. This is not a good thing to do if you want queries to run quickly. Uh, especially, you know, bigger queries, more rows, things you have to worry about, performance. Ah, terrifying.

So, uh, you know, my initial reaction. And I guess we can just come over here. My initial reaction was just to try a query hint on here for no performance spools. Zero performance spools.

Which would get rid of, uh, actually gets rid of a few different kinds of spools in query plans. And get rid of row count spools. Uh, lazy table spools. And, uh, lazy index spools. It does not get rid of eager index spools.

Uh, the only thing that does that is a, uh, uh, turning off certain query rules via other things that we’re not talking about here. But, um, no, with the no performance spool hit, if we look at the query plan, we get a faster one. Still faster.

56 seconds. So we did better. It still takes a full minute. It’s still a lot of time to kill. We don’t feel like talking, really. It’s, uh, you know, kind of dragging along the query plan. You know, it’s kind of the same, same set of yuckiness.

Like, why does it take five seconds to seek into an index? Like, what, what are you doing? You know, who designs these things? Add an index.

Add a good index. Why is this? This is silly. Uh, and then this one, or three seconds. And, you know, like, like focusing in on the operator times. There’s a thing that’s like, we have big jumps, right? Not like, we don’t want like incremental ones. Incremental ones are boring.

What if that big jump to where we got up there? And, you know, I guess like a six second jump is pretty big, right? 5.7 seconds up to there. Okay. Well, you know, something to think about. Uh, but, you know, kind of, again, following this road, right?

So now we have that same join, I think. Uh, but we have, uh, like a bunch of stuff down here that just sort of takes a lot of, without the, without the spool, we have to do a lot of, the optimizer chooses a much different plan.

Whereas like a bunch of different stuff happens. Right. And, uh, I don’t know. I’m kind of got lost with what I was going to say about this one, but you know, it’s Friday and you’re not paying attention anyway.

So it doesn’t matter all that much. Does this do? No. Outer references. So this time we get a part, we get an apply nested loops where, uh, rather than the regular nested loops where everything happened at the join, where it was with the spool. Now we have, we get an apply nested loops where it takes, uh, the type and the, uh, owner ID and it pushes things down.

If I’m, if I’m looking at the right join, even at this point, I’m, I forget. Anyway, uh, that’s how I tune queries. I forget things. Uh, but yeah. So, uh, what I, what I, what I thought looking at it was, geez, this is a performance problem that I’ve solved for other people.

A lot of them, a whole bunch of times. Why don’t I solve it for myself for once? And so I rewrote the query and I’ll show you what I did in a minute. But now the execution plan, I, this finishes in about seven seconds.

So now everything that’s slow in here is purely Microsoft’s fault. There’s no like really, really big jumps in the query plan. Uh, you know, we still get, have no buffer descriptors was kind of annoying, but not the biggest deal in the world.

But anyway, let’s get onto the rewrite. Uh, all the stuff up here is superfluous. I mean, that’s just like display level stuff. Uh, what really ended up making a difference was, so the first time I was thinking about rewriting this, I thought that I would just do this thing in here and separate out each of the join conditions.

So in this one, I have one explicit join condition with no, uh, this space there. Ugh, I stink. Uh, and then another one with this explicit join condition. So no like and or in crap in here.

And, you know, looking there could cool. And at first I thought that I would do this and I would, uh, just preserve the original join logic, which up here is, uh, from buffer descriptors joined over to allocation units.

But, uh, I found that hitting that buffer descriptors view three times was painful. So I ended up pulling the buffer descriptors part out and just doing one join to it at the end once I had everything else, uh, done out here.

So now the query is just grabbing all this stuff, right? Having fun doing things, pulling out information. And then, uh, when we get outside of that sort of inner query where we, after we just, I just called that X because, uh, I don’t know, Mr. X.

And then, uh, we do our grouping and our ordering out here. And we do all the summing in whatnot up here with our case expressions here where they’re far less dangerous.

And of course, uh, repeating this query with three explicit joins is a lot faster. Now it’s a lot faster for reasons that I’ve talked about in other posts where, you know, if we’re writing a single purpose queries, it’s a lot easier for the optimizer to go, uh, to deal with that and to make an efficient query plan.

Um, I’ll, I’ll probably blog more about that in the future, but for now, that’s what you get because we’re about at the 10 minute mark and I’m about to get my Friday evening started. And, uh, I hope, well, I mean, by the time you see this, it’ll be Tuesday, but that’s your fault.

Anyway, thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Updates to sp_PressureDetector and WhatsUpMemory

GIRTFT


Just a couple minor updates:

  • sp_PressureDetector now includes statement start and end offsets, in case you want to build plan guides for queries
  • WhatsUpMemory got a huge performance tuning, which I’ll talk about in tomorrow’s post!

Nothing else here. Happy downloading and analyzing.

Or analysing, if you’re from a refined country.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Compressed Indexes And The Buffer Pool In SQL Server

Mail Drag


After my smash hit double diamond post about index tuning, I got a question questioning my assertion that compressed indexes are also compressed in the buffer pool.

Well, this should be quick. A quick question. Eighty hours later.

First, two indexes with no compression:

CREATE INDEX o
ON dbo.Posts
    (OwnerUserId);

CREATE INDEX l
ON dbo.Posts
    (LastEditorDisplayName);

Looking at what’s in memory:

2021 03 04 19 56 09
jot’em

Now let’s create a couple indexes with compression:

CREATE INDEX o
ON dbo.Posts
    (OwnerUserId)
WITH(DATA_COMPRESSION = ROW);

CREATE INDEX l
ON dbo.Posts
    (LastEditorDisplayName)
WITH(DATA_COMPRESSION = PAGE);

I’m choosing compression based on what I think would be sensible for the datatypes involved.

For the integer column, I’m using row compression, and for the string column I’m using page compression.

2021 03 04 19 59 46
got’em

Now in memory: way less stuff.

So there you go.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Sort Thread Numbers Correctly in SSMS Operator Properties

Dyslexical



Sort Thread Numbers Correctly in Operator Properties

Thanks for watching!

Video Summary

In this video, I delve into a minor but irritating graphical quirk in SQL Server Management Studio (SSMS) that has been bugging me for some time now. It’s not just an SSMS issue; it seems to manifest in other places within the SQL Server ecosystem as well. The problem arises when looking at execution plans, specifically in parallel operators, where the sorting of properties pane entries becomes erratic and frustratingly out of order—especially when dealing with more than nine threads. I walk through a workaround for this issue, explaining how to navigate around it by opening tooltips and manually selecting rows, but ultimately, I encourage viewers to voice their frustration through Microsoft’s User Voice platform. By voting on the relevant issue, we can collectively push for improvements in SSMS, making it a more pleasant environment for query tuning and analysis.

Full Transcript

Erik Darling. Bum, bum, bum. He still does not own ErikDarling.com because I’m a cheapskate and I don’t want to pay $1,500 for my own name on the internet. That seems a little outlandish to me. Maybe when it gets down under $1,000, I’ll do it. Anyway, it’s now Sunday, apparently, and today’s broadcast is brought to you by Dom Perignon 2009. And this was brought home from the lovely restaurant, per se. This is their wine stopper thing that they gave us for the bottle. I don’t know that they normally give these out, but it’s the coolest thing that I own now. So, if they want to sponsor me in other ways, I would happily take their sponsorship. But anyway, today’s video is about a minor graphical annoyance that I have with SQL Server Management Studio. Now, this is not the only place that you’ll see an oddity like this. There are many other places within SQL Server. Gosh, this smells good.

Oh, this is not the only place that you can do. Within SQL Server Management Studio where you’ll see this. Maybe it happens in Azure Data Studio, too. I don’t know. I refuse to install that. Anyway, here’s what it is. So, we have this query that does dumb things, admittedly dumb things. I know it’s dumb. I know it’s stupid looking. I know it serves absolutely no purpose in what it does. But when we look at the execution plan, more importantly, when we dive into the parallel operators of the execution plan, something aggravating happens. So, let’s go do what we should do as professional query tuners, always.

Open up the properties pane and let’s open this up. Now, completely ignoring the fact that this is the best parallel skew demo ever written. Open up the properties pane. What the hell is going on here? And then down here. Why? Why? Now, the other place where this used to come up is in the wait stats. So, I think it was… Maybe it was… I can’t remember if it was an SSMS version that fixed it or a SQL Server version that fixed it. It would make sense that it was an SSMS version that fixed it.

But who knows? If you looked at the wait stats, the wait stats would be sorted incorrectly, too. So, that actually is apparently fixed now. But, you know, good. But we still have this problem. And this problem persists across… Ah, come on. So, here’s another funny thing. Sometimes when you go to get the properties, the tooltip shows up. And then you can’t get them because the tooltip shows up over the properties.

And who knows where that is? So, what you have to do is be sneaky, a sneaky little rabbit. And you have to get the tooltip and then hit properties and then crack that open and then highlight that row so that we preserve it. We preserve the positioning across all these operators and go click. And look, they all sort wonky.

The sorting is a muck. Well, except on this one where there’s nothing happening because that’s a gather stream. So, if we crack this one open… Well, this one… I mean, at least the skew is gone here. Alright? At least we don’t have the skew here, but we still have this quite unfortunate sorting.

Anyway, if you’re the type of person who doesn’t like that, doesn’t like the way that looks, think that looks foolish, I have a user voice item that you can vote on. You can express your contempt and disdain for the current state of sorting parallel threads above 9, I guess, in SQL Server Management Studio.

And the link will be attached to the YouTube video and probably in the blog post as well. I just haven’t written it yet, but it’s there. So, watch out for that. Vote for it. Help me make SQL Server Management Studio a nicer place to live and do business.

Sorry, I can’t stop staring at this wine stopper. It’s like the neatest contraption. I bet that like… I could just order these from like a restaurant wholesale place. I bet it’s not even like some extra fancy thing.

I’m probably… It’s probably overblown just because it has a fancy restaurant name etched in the top. Anyway, I’m gonna go get back to enjoying the smell of this champagne on yet another rainy day. Thank you for watching. Thank you for hopefully voting on this issue.

And hopefully, democracy will win and we’ll get this whole sorting thing straightened out. Have a nice day.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.