sql temp table Archives | Page 109 of 138

Understand Your Plan: Reading Operator Times In SQL Server Execution Plans

Posted on February 28, 2022May 16, 2022 by Erik Darling

Wiseacre

When I was sitting down to figure out what to write about this, one of the topics that I thought would be interesting was the width of lines/arrows in query plans.

I sat here for half a day pecking here and there getting the post started, writing some demos, and… realized I couldn’t remember the last time I really looked at them. I could explain it, but… I’d be lying to you if I said I really cared.

Unless you have an actual execution plan, line thickness is based on estimates. They could be totally wrong, and chances are that if you’ve got a query plan in front of you it’s not because it’s running well.

If you’re going to run it and look at the actual execution plan anyway, you’d have to be seven levels obtuse to look at line widths instead of operator times.

They’re your best friends these days, and help you focus on the parts of the plan that need help.

Looney Tunes

At this point in 2022, most people are going to be looking at plans running only in row mode. As folks out there start to embrace:

Column store
SQL Server 2019 Enterprise Edition
Compatibility level 150+

We’ll start to see more batch mode operators, especially from batch mode on row store. I only mention this because row mode and batch mode operators track time differently, and you need to be really careful when analyzing operator times in actual execution plans.

In plans that contain a mix of row mode and batch mode operators, timing might look really funny in some places.

Let’s chat about that!

Row Mode

As of this writing, all row mode query plan operators accumulate time, going from right to left. That means each operator tracks it’s own time, along with all of the child operators under it.

Where this is useful is for following time accumulation in a query plan to where it spikes. I sort of like this because it makes tracking things down a bit easier than the per-operator times in batch mode plans.

Let’s use this plan as an example:

SQL Server Query Plan — oh i forgot it’s summertime

Yes, it’s intentionally bad. I made it that way. You’re welcome. There are two things I’d like to point out here:

Just about every line looks equally thick
Costing is a bit weird, aside from the Eager Index Spool

But you know what? If you look at operator times, you can get a pretty good idea about where things went wrong.

Maybe that Key Lookup wasn’t the greatest use of time
Boy howdy, that eager index spool took about 90 seconds

I’m not gonna go into how to fix this, I just want you to understand where time is taken, and how it adds up across operators.

Batch Mode

In batch mode plans, the story is a whole lot different.

Times for operators are all over the place. They’re not wrong, they’re just different. Each operator that executes in batch mode tracks it’s own time, without the child operator times.

Nothing adds up until the very end, when you hit the Gather Streams operator. Right now, none of the parallel exchange operators can run in batch mode, so they’ll add up all of the child operator times.

If you’ve got a plan with a lot of operators mixed between row and batch mode, things will look even weirder.

Thanks for reading!

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

No, Really: Don’t Optimize For Ad Hoc Workloads As A Best Practice

Posted on February 25, 2022May 16, 2022 by Erik Darling

Better Late

A few weeks back, my friend Randolph (B|T) and I collaborated a bit on a blog post about Optimize For Ad Hoc Workloads.

Now that I’m a Microsoft Approved Blogger™️ and you all have to take me Very Seriously™️, I’d like to make a few of the points here in my own words, though I highly suggest reading Randolph’s post because it’s much better.

Randolph writes for a living. I’m just a consultant with some free time.

The result you want from Optimize For Ad Hoc Workloads is probably closer to what Forced Parameterization does

When people complain about lots of “single use” plans, they probably want there to be fewer of them. Turning on Forced Parameterization will promote plan reuse. Turning on Optimize For Ad Hoc Workloads won’t do that.

With Forced Parameterization, literal values are replaced (when possible) with parameters, which promotes plan re-use
Turning on Optimize For Ad Hoc Workloads compiles a new plan for queries with literal values, and then just caches a stub

Parameterization, and further Parameter Sniffing, has an ominous meaning for many of you.

I see it all the time, folks jumping through every hoop in the world to not have a parameter get sniffed. Recompiling, Unknown-ing, Local Variable-ing. Sad. Very sad.

Bottom line: If you want fewer single use plans, parameterize your damn queries.

Turning Optimize For Ad Hoc Workloads on can make workload analysis difficult

You know how sometimes you have a performance issue, and then you go look in the plan cache, and queries don’t have execution plans? This setting can make that happen.

Why? Because stubs aren’t query plans. Never have been. Never will be.

If we run these queries, they’ll both get a stub, because each one has a different literal value. These are “single use” plans, even though SQL Server compiled and use the “same” plan for each one.

SELECT
    c = COUNT_BIG(*)
FROM dbo.Votes AS v
WHERE v.VoteTypeId = 2
AND 1 = (SELECT 1); --Skip a Trivial Plan/Simple Parameterization
GO

SELECT
    c = COUNT_BIG(*)
FROM dbo.Votes AS v
WHERE v.VoteTypeId = 4
AND 1 = (SELECT 1); --Skip a Trivial Plan/Simple Parameterization
GO

Here are the plans:

But here’s why they’re “different” and each one gets a stub:

Even though the Query Hash and Query Plan Hash are identical, each one has a different SQL Handle.

I’m using a simple demo to show you the end result, but you can probably see why this would be more difficult to deal with in real life.

The “same” query might run more than once, but if a literal in the where clause is different, you’ll have two stubs

Thinking about the situation above:

Did we really want to compile the same execution plan twice?
What’s the point of having two stubs for what amounts to the exact same query and plan?

If we had turned on Forced Parameterization, that’s what would have happened: The where clause would have had the literal 4 and 2 values replaced with a parameter, and we would have gotten a cached and re-used plan.

The problem we hit is that we got two different SQL Handles because the 4 and 2 hash out differently.

That’s a problem that gets solved with parameterization, forced or fixed in code.

You’ll still compile full plans for queries to execute with

This is another misconception I run into a lot. For every time a query runs that SQL Server can’t find a matching SQL Handle for, you’ll get a new plan generated, even if only a stub is cached.

If you have a lot of big, complicated queries that take a long time to compile, that can be unpleasant.

And look, I’m not afraid of recompiling plans. In the right place, that can be awesome.

This can also be annoying if you have queries constantly coming in and compiling new plans.

It’s been a while since I’ve seen this scenario cause CPU to hit 100% (or even close to it), but in today’s Unfortunate Cloud Age© any CPU reduction you can make to end up on smaller, less expensive instances can make you look like a Bottom Line Hero™️.

Constant query compilation, and long query compilation, can contribute to that.

The stubs still count towards the total number of plans you can have cached

By default, you have a limit of 160,036 total plans allowed in the cache, and other limits in total size based on available memory.

Plan stubs still contribute towards the total count. Lots of single use stubs can still cause “bloat”.

The questions you have to ask here, are:

How much larger are my query plans than the stubs?
Is that savings worth it to not have plans in cache available to analyze?
Do you capture query plans in a different way? (Query Store, monitoring tool, etc.)

Lots of stubs aren’t necessarily more useful.

When you end up with “the same” unparameterized query running over and over again, you might miss out on aggregate resource usage for it

One way that it can be useful to look at your server’s workload is seeing which queries:

Execute the most
Use the most total CPU
Are attached to missing index requests

A lot of plan cache analysis scripts (even ones I’ve written and worked on) will group by or correlate on SQL Handle for different things.

If you have a different SQL Handle for every execution of “the same” query with different literals, you’ll miss out on them.

You’ll have to look by something else, like Query Hash, Query Plan Hash, or Plan Handle instead.

You can run into this regardless of Optimize For Ad Hoc Workloads being enabled, but a lot of folks out there tend to ignore rows returned by analysis scripts that don’t also have a query plan.

Stubs Ahoy.

The Setting Is Poorly Named

Last but not least: this is a poorly named setting.

I still run into folks who think that enabling this gives the optimizer some special powers for ad hoc queries.

It doesn’t. The only thing it does is cache a plan stub instead of the entire plan on “first execution”.

This isn’t a documentation problem either, this is a not-reading-the-documentation problem.

Thanks for reading!

Going Further

Getting The Top Value Per Group With Multiple Conditions In SQL Server: Row Number vs. Cross Apply With MAX

Posted on February 24, 2022May 16, 2022 by Erik Darling

Throat Music

In yesterday’s post, we compared a simple situation trying to find the post scoring post for each user.

In today’s post, we’re going to add another condition: we want the highest scoring post for each type of post someone has made.

Now look, most people don’t get involved with any of these things, but whatever. It just poses an interesting and slightly more complicated problem.

Slightly Different Index

Since we’re going to be using PostTypeId in the window function, we need it in the key of our index:

CREATE INDEX p ON dbo.Posts(OwnerUserId, PostTypeId, Score DESC)
WITH(SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);

Now our query looks like this:

SELECT
    u.DisplayName,
    u.Reputation,
    p.PostTypeId,
    p.Score
FROM dbo.Users AS u
JOIN
(
    SELECT
        p.*,
        n = 
            ROW_NUMBER() OVER
            (
                PARTITION BY
                    p.OwnerUserId,
                    p.PostTypeId --This is new!
                ORDER BY
                    p.Score DESC
            )
    FROM dbo.Posts AS p
) AS p
    ON  p.OwnerUserId = u.Id
    AND p.n = 1
WHERE u.Reputation > 50000
ORDER BY 
    u.Reputation DESC,
    p.Score DESC;

S’good? S’good. Let’s go.

Row Number Query Plan

Similar to yesterday’s plan, this one is rather slow, rather serial, and generally not how we want to be spending our precious time.

Let’s look at the apply method, because we have to change our query a little bit to accomplish the same thing.

Cross Apply With MAX

Rather than go with TOP (1), we’re going to GROUP BY OwnerUserId and PostTypeId, and get the MAX(Score).

SELECT
    u.DisplayName,
    u.Reputation,
    p.PostTypeId,
    p.Score
FROM dbo.Users AS u
CROSS APPLY
(
    SELECT 
        p.OwnerUserId,
        p.PostTypeId,
        Score = MAX(p.Score)
    FROM dbo.Posts AS p
    WHERE p.OwnerUserId = u.Id
    GROUP BY 
        p.OwnerUserId, 
        p.PostTypeId
) AS p
WHERE u.Reputation > 50000
ORDER BY 
    u.Reputation DESC,
    p.Score DESC;

This will give us the same results, but a lot faster. Again.

Cross Apply Query Plan

Like I was saying…

Down to ~400ms now. Not bad, right?

Lower Selectivity

If we take those same queries and lower the reputation filter from 50,000 to 1, some interesting changes to the query plans happen.

Repartition streams and I have had some problems in the past. It’s not necessarily “to blame”, it just has a tough time with some data distributions, especially, it seems, when it’s order preserving.

The cross apply with aggregation works really well. It’s kinda neat that both queries get slower by the same amount of time, but the ROW_NUMBER query is still much, much slower.

All of this is interesting and all, but you know what? We haven’t look at batch mode. Batch mode fixes everything.

Sort of. Don’t quote me on that. It’s just really helpful in the kind of BIG QUERY tuning that I end up doing.

Batch Mode

This is the only thing that makes the ROW_NUMBER query competitive in this scenario, owing to the fact that batch mode often removes Repartition Streams, and we’re eligible for the Window Aggregate operator.

I’m dumping these results to #temp tables because I don’t want to wait for SSMS to render the large result set, but you can still see the positive overall effect.

The poorly performing ROW_NUMBER query is now very competitive with the CROSS APPLY query.

Thanks for reading!

Going Further

Getting The Top Value Per Group In SQL Server: Row Number vs. Cross Apply Performance

Posted on February 23, 2022May 16, 2022 by Erik Darling

Introductions

There are many ways to express queries in SQL. How different rewrites perform will largely be a function of:

You not doing anything ridiculous
Queries having good indexes in place
The optimizer not seeing through your tricks and giving you the same query plan

The first rule of rewrites is that they have to produce the same results, of course. Logical equivalency is tough.

In today and tomorrow’s posts I’m going to compare a couple different scenarios to get the top value.

There are additional ways to rewrite queries like this, of course, but I’m going to show you the most common anti-pattern I see, and the most common solution that tends to work better.

Right And Proper Indexing

For today’s post, we’re going to use this index:

CREATE INDEX p ON dbo.Posts(OwnerUserId, Score DESC) INCLUDE(PostTypeId)
WITH(SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);

Which is going to give this query proper support. Sure, we could also add an index to the Users table, but the one scan is trivially fast, and probably not worth it here.

SELECT
    u.DisplayName,
    u.Reputation,
    p.PostTypeId,
    p.Score
FROM dbo.Users AS u
JOIN
(
    SELECT
        p.*,
        n = 
            ROW_NUMBER() OVER
            (
                PARTITION BY
                    p.OwnerUserId
                ORDER BY
                    p.Score DESC
            )
    FROM dbo.Posts AS p
) AS p
    ON  p.OwnerUserId = u.Id
    AND p.n = 1
WHERE u.Reputation > 50000
ORDER BY 
    u.Reputation DESC,
    p.Score DESC;

The general idea is to find all users with a reputation over 50,000, along with their highest scoring post.

I know, you’re looking at this and thinking “jeez Erik, why are you selecting * here? don’t you know how bad and dumb you are for that?”

Well, SQL Server is smart enough to ignore that and only deal with the columns in the outer select.

The Query Plan

If you create the index and run this, the query plan looks something like this:

The majority of the time spent in this plan is the ~11 seconds between the scan of the Posts table and the Filter operator.

The filter is there to remove rows where the result of the ROW_NUMBER function are greater than 1.

I guess you could say less than 1, too, but ROW_NUMBER won’t produce rows with 0 or negative numbers naturally. You have to make that happen by subtracting.

A Better Query Pattern?

Since only ~2400 rows are leaving the Users table, and we have a good index on the Posts table, we want to take advantage of it.

Rather than scan the entire Posts table, generate the ROW_NUMBER, apply the filter, then do the join, we can use CROSS APPLY to push things down to where we touch the Posts table.

SELECT
    u.DisplayName,
    u.Reputation,
    p.PostTypeId,
    p.Score
FROM dbo.Users AS u
CROSS APPLY
(
    SELECT TOP (1)
        p.*
    FROM dbo.Posts AS p
    WHERE p.OwnerUserId = u.Id
    ORDER BY p.Score DESC
) AS p
WHERE u.Reputation > 50000
ORDER BY 
    u.Reputation DESC,
    p.Score DESC;

This is logically equivalent, but it blows the other query out of the water, performance-wise.

A Better Query Plan?

Here’s the query plan for the cross apply query:

Why Is This better?

In this case, having a good index to use, and a small outer result from the Users table, the cross apply query is way better.

This is also due to the Id column of Users being the Primary Key of the table. For this sort of one to many join, it works beautifully. If it were a many to many scenario, it could be a toss up, or ROW_NUMBER could blow it out of the water.

The way this type of Nested Loops Join works (Apply Nested Loops), is to take each row from the outer input (Users table) and seek to it in the Posts table.

Without that good index up here, this would likely be a disaster with an Eager Index Spool in the plan. We definitely don’t want that, here.

But you know, there are many different types of posts. We might want to know someone’s high score for questions, answers, and more.

In tomorrow’s post, we’ll look at how to do that, and performance tune the query.

Thanks for reading!

Going Further

MAXDOP: The T-Shirt

Posted on February 22, 2022May 16, 2022 by Erik Darling

Buttery Biscuits

In yesterday’s post, I showed you the promo video and abstract for my SQLBits precon.

In today’s post, I’m gonna offer you a little bribery to show up.

First, all attendees will get one of these lovely T-Shirts. I have options for MAXDOP 8 and 0 available. Because those are the only right answers. Ha ha ha.

If you were thinking about coming to my @SQLBits precon but needed a bribe, how about a T-shirt? pic.twitter.com/Hj9VHK7PRq

— Erik Darling Data (@erikdarlingdata) February 10, 2022

I’m also throwing in free access to my SQL Server training library, which includes a video recap of the precon material.

See you there!

Going Further

Going To SQLBits? Come To My Precon: The Professional Performance Tuning Blueprint!

Posted on February 21, 2022May 16, 2022 by Erik Darling

Learn-A-Lot

Check out my Events page for the full details, and of course head over to SQLBits.com to register.

Searching the internet for every problem isn’t cutting it. You need to be more proactive and efficient when it comes to finding and solving database performance fires.

I work with consulting customers around the world to put out SQL Server performance fires. In this day of learning, I will teach you how to find and fix your worst SQL Server problems using the same modern tools and techniques which I use every week.

You’ll learn tons of new and effective approaches to common performance problems, how to figure out what’s going on in your query plans, and how indexes really work to make your queries faster. Together, we’ll tackle query rewrites, batch mode, how to design indexes, and how to gather all the information you need to analyze performance.

This day of learning will teach you cutting edge techniques which you can’t find in training by folks who don’t spend time in the real world tuning performance. Performance tuning mysteries can easily leave you stumbling through your work week, unsure if you’re focusing on the right things. You’ll walk out of this class confident in your abilities to fix performance issues once and for all.

If you want to put out SQL Server performance fires, this is the precon you need to attend. Anyone can have a plan, it takes a professional to have a blueprint.

Going Further

Software Vendor Mistakes With SQL Server: Wrap Up – For Now!

Posted on February 18, 2022May 16, 2022 by Erik Darling

Finger On It

All good things must come to an end. When I started this series, I wasn’t entirely sure where that would be.

Turns out, seven weeks about covers all the major stuff I see regularly when working with clients.

And this is just the code and indexes; it doesn’t even start on the SQL Server settings, maintenance practices, table design, and hardware configurations I see getting in the way.

Perhaps there is a part 2, but I’m going to let this stand for now.

Happy clicking! Thanks for reading!

Going Further

Software Vendor Mistakes With SQL Server: Not Enforcing Code Formatting

Posted on February 17, 2022May 16, 2022 by Erik Darling

On Top Of Spaghetti

I’m going to start sending dead fish to anyone who tells me that a Common Table Expressions make queries more readable. If your code is formatted like crap, they don’t do anything.

Good formatting is the basis for code consistency. If things are a sloppy mess, no one will ever want to fix them.

In this post, I’m going to cover two things:

What you should always do to make code easy to understand
What you should consider doing to format your code so it’s easy to understand

The first is fact, the second is opinion. Laziness is not an argument in opposition to either one.

Let’s boogie.

SQL Formatting Facts

Here are a list of things that make your code look good:

Capitalize things consistently: SELECT, select, SeLeCt — pick one and stick to it
Schema prefix all of your objects, even if everything is in dbo
Alias columns consistently, and…
Give your tables meaningful aliases, not just a, b, c, d
Embrace new lines for things like FROM, JOIN, ON, WHERE, etc.
Indent things appropriately so logically connected blocks are easier to spot
Ending your queries with a semicolon (;) makes it easier to figure out when a query actually ends

Sure, there’s probably more for specific things like stored procedure, where I’d say you should handle all your variable declarations and table creations in the same blocks of code, but whatever.

SQL Formatting Preferences

Here are a list of things that make your code look good to me:

Putting commas at the end of column names
Indenting columns four spaces in from select
Using alias = expression rather than expression AS alias
Consistently upper casing key words
Not upper casing data types
Using TOP (n) rather than TOP n
Putting the ON clause of joins on a new line

There’s more, and some of proper formatting is situational. For example, I like to really use new lines and indenting for complicated expressions with nested functions to make the inputs clear, but for short ones I usually won’t spread things out.

All this has developed over years of writing and reading code to learn what works best for me, and what I think looks right. I don’t expect everyone to agree on every point, of course, but things like old-style joins and ordering by ordinal positions just looks bad.

I have a long-standing disagreement with a dear friend about table aliases being capitalized. I don’t think they should be; he thinks they should.

Despite that, I can still read his code just fine.

Not A Nit Pick

I have these opinions because I write and review a lot of T-SQL. The harder code is to read, the harder it is to tune, rewrite, or spot nonsense.

These days, there’s almost no excuse for it, either. There are a half-dozen free and paid T-SQL formatters that work with a variety of IDEs.

I don’t have a specific recommendation here, because I haven’t been able to find a single tool that gets things right. Most get you 80% of the way there, and the rest is up to you.

What I’d really like is a tool that could also format and make suggestions in dynamic SQL, but I can’t imagine there’s enough money in that to warrant the effort.

Thanks for reading!

Going Further

Software Vendor Mistakes With SQL Server: Not Enforcing Code Consistency

Posted on February 16, 2022May 16, 2022 by Erik Darling

Practice Makes Perfect

Often when working with clients, the coding practices read like geological stratum. You can tell when and who wrote something based on the practices they used.

This is bad news, though, because it speaks to lone-wolf style development. That might be okay if all your SQL developers are well-trained, etc. But that’s mostly not the case.

It’s especially important for junior developers to have code reviewed by others so they don’t introduce bad/worst practices into production code, and for legacy code to get reviewed to make sure it’s up to current best practices.

Otherwise, you’re always going to have problems. Here’s where I see folks making mistakes, mostly because they don’t have any internal “style guide” for queries.

I don’t mean formatting — we’ll talk about that tomorrow — I mean the choices you make while you’re writing queries.

The Style Council

Consider just some of the options you have when writing queries:

Temp Table or Table Variable
Common Table Expression or Derived Table
SELECT INTO or INSERT SELECT
Inline Table Valued Function or View
ISNULL or COALESCE
Cursor or Set-Based Code

For all of these options, there are good use cases, and bad ones.

Most of the time you want to use #temp tables, but code executed at high frequency could benefit from @table variables
Common Table Expressions can be re-used, but the entire query inside them is executed every time they’re referenced
SELECT INTO is convenient and may get a fully parallel insert without a tablock hint, but may also incorrectly interpret certain attributes like data type or length
Views may be more intuitive to write, but inline table valued functions can be more useful for parameterized queries
ISNULL is easier to spell, but it only takes two arguments and there are differences in how data types are interpreted between the two
Often, set-based code will be more efficient, but there are many valid uses for cursors

Futuristic

If your application is using newer versions of SQL Server:

Are you using STRING_SPLIT?
Are you using STRING_AGG?

Or are you still using older methods to split strings or aggregate strings?

You could extend this to a few other things, too, like the additional programmability features added to SQL Server 2016 starting with SP1.

There’s a lot to consider and keep up with in SQL Server. Having internal documentation about this stuff is the key to making sure your code:

Aligns with current best practices
Comments thoroughly on deviations
Takes advantage of new paradigms and patterns in SQL Server

If that’s the kind of thing you need help with, hit the link below to set up a free sales call to discuss your needs.

Thanks for reading!

Going Further

Software Vendor Mistakes With SQL Server: Not Using Batch Mode

Posted on February 15, 2022May 16, 2022 by Erik Darling

Just Add Batch Mode

There are a few ways to get Batch Mode involved in a query:

Be on SQL Server 2019+ Enterprise Edition and hope Batch Mode On Rowstore heuristics think your query is eligible
Be on any supported version of SQL Server Enterprise Edition and…
- Left join to an empty table with a column store index on it on 1 = 0
- Create an empty filtered nonclustered columnstore index on a table in your query
- Create an actual columnstore index on your table

Since I’ve used the prior tricks in many posts many times, I’m going to actually create an index this time:

CREATE NONCLUSTERED COLUMNSTORE INDEX magick 
    ON dbo.Posts
(
    OwnerUserId, PostTypeId, Score
);

And now our query plan looks like this:

When Should I Use Batch Mode?

Batch Mode can be really powerful under the right conditions, especially when you have a column store index as a data source for your query.

Here’s some generic guidelines for when you should try things out:

Column store indexes
- Big tables, over a couple million rows or so
- Tables in data warehouses
- Tables that support custom user searches
- That wouldn’t otherwise fit in the buffer pool uncompressed
Queries
- That process millions of rows
- With multiple DISTINCT aggregate
- That aggregate large numbers of rows
- That are generated from custom user searches
Query plans
- That have a tough time figuring out the right memory grant and would benefit from Batch Mode Memory Grant Feedback
- That may be parameter-sensitive and benefit from Adaptive Joins
- That use windowing functions and may benefit from Window Aggregate operators

Even if the end query result is not millions of rows, Batch Mode can be useful to get you to your result faster.

Thanks for reading!