SQL Server Archives | Page 126 of 164

Experiment With Table Variable Deferred Compilation Without SQL Server 2019

Posted on June 30, 2020May 16, 2022 by Erik Darling

I Have A Secret To Tell You

If you haven’t heard by now, Table Variables have some ✌features✌ that can cause performance issues pretty generally in SQL Server.

One row estimates unless you recompile (or that darn trace flag)
No column-level statistics (even with indexes)
Modifications to them can’t go parallel (without sorcery)

But in SQL Server 2019, Microsoft fixed one of those things, kind of, with Table Variable Deferred Compilation.

Rather than just give you that one row estimate, it’ll wait until you’ve loaded data in, and then it will use table cardinality for things like joins to the table variable. Just be careful when you use them in stored procedures.

That can be a lot more helpful than what you currently get, but the guesses aren’t quite as helpful when you start using a where clause, because there still aren’t column-level statistics. You get the unknown guess for those.

How Can You Test It Out Before SQL Server 2019?

You can use #temp tables.

That’s right, regular old #temp tables.

They’ll give you nearly the same results as Table Variable Deferred Compilation in most cases, and you don’t need trace flags, hints, or or SQL Server 2019.

Heck, you might even fall in love with’em and live happily ever after.

The Fine Print

I know, some of you are out there getting all antsy-in-the-pantsy about all the SQL Jeopardy differences between temp tables and table variables.

I also realize that this may seem overly snarky, but hear me out:

Sure, there are some valid reasons to use table variables at times. But to most people, the choice about which one to use is either a coin flip or a copy/paste of what they saw someone else do in other parts of the code.

In other words, there’s not a lot of thought, and probably no adequate testing behind the choice. Sort of like me with tattoos.

Engine enhancements like this that benefit people who can’t change the code (or who refuse to change the code) are pretty good indicators of just how irresponsible developers have been with certain ✌features✌. I say this because I see it week after week after week. The numbers in Azure had to have been profound enough to get this worked on, finally.

I can’t imagine how many Microsoft support tickets have been RCA’d as someone jamming many-many rows in a table variable, with the only reasonable solution being to use a temp table instead.

I wish I could say that people learned the differences a lot faster when they experienced some pain, but time keeps proving that’s not the case. And really, it’s hardly ever the software developers who feel it with these choices: it’s the end users.

Thanks for reading!

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

More Dates For Online Training: July 10th and 24th

Posted on June 29, 2020May 14, 2022 by Erik Darling

Missing Persons

If you missed out on my online class, I’m offering two more dates for it in July.

You can catch all the goodness of Premium Performance Tuning on July 10th and 24th.

In this class, I’ll teach you about my strategy for tuning horrible queries, and show you practical examples for dealing with a variety of performance issues. It’s a full day of learning with no fluff.

I’ll be covering topics that you’ll face day to day, like dealing with parameter sniffing, functions, temporary objects, complex queries, and more. You’ll learn how to get to the bottom of query performance issues like a pro by getting the right information and learning how to interpret it.

Enrolling in my full day course also gets you access to my training videos, which is a $1000 value with over 24 hours of content.

Showing Up

On the day of the training, you’ll be able to watch the video stream on my site here.

To interact with me and other students, go to SQLslack.com to get an invite, then go to https://SQLcommunity.Slack.com, or download the Slack app to join. We’ll be in the #erikdarling-tuning channel. That’s the only way to ask questions right now. Unfortunately with video streaming, there’s no built-in chat. You have to bring your own.

If you wanna follow along, you’ll need to download the StackOverflow 2013 database, along with the latest SSMS, and Developer Edition of SQL Server: 2017 | 2019. While you’re being a good DBA, don’t forget to make sure you’re up to date on patching.

Get your tickets here!

See you in class!

Going Further

Adopting An Anti-Harassment Policy For Online Training

Posted on June 26, 2020May 16, 2022 by Erik Darling

Don’t Be A

This isn’t a technical post, but give it a read anyway.

With a lot of events going online, including yours truly (I’m teaching a live class today!), those events need to stay committed to making sure people who show up have a good experience. In my case, that means they’re entertained, and they learn something about performance tuning SQL Server.

Ambitious, I know.

But beyond that, I want people to be comfortable. After all, the things that you’re gonna be learning about SQL Server are uncomfortable enough.

You won’t believe the things it reports back to Microsoft about you. smh, as the kids say.

To start things off on the right foot, I’ve adopted a general anti-harassment policy.

Rather than go on and on about things here, click that link and give it a read. If you’ve got any feedback, drop me a line.

Thanks for reading!

SQL Server CTEs Don’t Always Execute In The Order You Write Them

Posted on June 25, 2020May 14, 2022 by Erik Darling

Mythology

I’ve heard many times incorrectly over the years that CTEs somehow materialize data.

But a new one to me was that CTEs execute procedurally, and you could use that to influence plan shapes by always doing certain things first.

Unfortunately, that’s not true of them either, even when you use TOP.

Meaningful Life

Here’s the first example. Take some note of the order the CTEs are written and joined in, and the tables they touch.

Outside of the CTEs, there’s a join to a table not even in a CTE here.

WITH cte_1 AS 
(
SELECT u.Id
FROM dbo.Users AS u
WHERE u.Reputation = 1
),
     cte_2 AS
(
SELECT p.OwnerUserId, p.Id
FROM dbo.Posts AS p
WHERE p.OwnerUserId = 22656
),   cte_3 AS 
(
SELECT v.PostId
FROM dbo.Votes AS v --WITH(INDEX = three)
WHERE v.VoteTypeId = 4
)
SELECT COUNT(*)
FROM cte_1
JOIN cte_2
    ON cte_2.OwnerUserId = cte_1.Id
JOIN cte_3 
    ON cte_3.PostId = cte_2.Id
JOIN dbo.Comments AS c
    ON c.UserId = cte_1.Id;

The plan for it looks like this:

Not even close to happening in the order we wrote things in.

Darn that optimizer.

Machanically Speaking

If we use a TOP in each CTE, that doesn’t help us either.

WITH cte_1 AS 
(
SELECT TOP (2147483647) u.Id
FROM dbo.Users AS u
WHERE u.Reputation = 1
),
     cte_2 AS
(
SELECT TOP (2147483647) p.OwnerUserId, p.Id
FROM dbo.Posts AS p
WHERE p.OwnerUserId = 22656
),   cte_3 AS 
(
SELECT TOP (2147483647) v.PostId
FROM dbo.Votes AS v
WHERE v.VoteTypeId = 4
)
SELECT COUNT(*)
FROM cte_1
JOIN cte_2
    ON cte_2.OwnerUserId = cte_1.Id
JOIN cte_3 
    ON cte_3.PostId = cte_2.Id
JOIN dbo.Comments AS c
    ON c.UserId = cte_1.Id;

Tables get touched in the same order, but the plan takes an ugly turn:

Dis-spells

CTEs have no magic powers. They don’t boss the optimizer around, they don’t materialize, and they don’t fence optimization.

If you’re gonna start stacking these things together, make sure you’re doing it for a good reason.

And if you tell me it’s to make code more readable, I know you’re messing with me.

Thanks for reading!

Going Further

How SQL Server Treats NULLs In Case Expressions

Posted on June 24, 2020May 14, 2022 by Erik Darling

I Forgot

Honest, I did. It happens to the best of us.

Why, just last week I forgot everything for 3 hours.

It was lovely.

These two queries look close enough , but they aren’t.

--Numero Uno
SELECT   CASE v.UserId WHEN NULL 
                       THEN 'NULL!'
                       ELSE 'NOT NULL!'
         END AS Scrubbed,
         v.VoteTypeId,
         COUNT_BIG(*) AS records
FROM     dbo.Votes AS v
GROUP BY CASE v.UserId WHEN NULL 
                       THEN 'NULL!'
                       ELSE 'NOT NULL!'
         END,
         v.VoteTypeId
ORDER BY v.VoteTypeId;

--Numero Dos
SELECT   CASE WHEN v.UserId IS NULL THEN 'NULL!'
              WHEN v.UserId IS NOT NULL THEN 'NOT NULL!'
         END AS Scrubbed,
         v.VoteTypeId,
         COUNT_BIG(*) AS records
FROM     dbo.Votes AS v
GROUP BY CASE WHEN v.UserId IS NULL THEN 'NULL!'
              WHEN v.UserId IS NOT NULL THEN 'NOT NULL!'
         END,
         v.VoteTypeId
ORDER BY v.VoteTypeId;

Results To Gridiot

The first query returns what appears to be rather odd results.

There are many NULLs in this column, and they’re exposed when we run the second query.

See them? All the NULLs? They’re here now.

1-900-MUPPET

The first query is the same thing as saying UserId = NULL.

That doesn’t get you anywhere at all in a database.

What is this, EXCEL?

To judge the NULL and the NOT NULL, you have to use IS NULL and IS NOT NULL.

But I forgot about that in CASE expressions, so I decided to write it down.

Thanks for reading!

Going Further

How Parameter Sniffing Impacts Locking In SQL Server

Posted on June 23, 2020May 14, 2022 by Erik Darling

Make It Out

I’m going to show you something bizarre. I’m going to show it to you because I care for your well-being and long term mental health.

Someday you’ll run into this and be thoroughly baffled, and I want to be here for you. Waiting, watching, lubricating.

I have a stored procedure. It’s a wonderful stored procedure.

But something funny happens when a parameter gets sniffed.

Wrote A Little Song About It

It’s not the usual parameter sniffing thing, where you get different plans and blah blah blah. That’s dull.

This is even more infuriating. Here’s the part where care about, where we read data to insert into the #temp table.

This is the “small” version of the plan. It only moves about 8200 rows.

Now here’s the “big” version of the plan.

We move way more rows out of the seek (9.8 million), but doesn’t it seem weird that a seek would take 5.6 seconds?

I think so.

Pay special attention here, because both queries aggregate the result down to one row, and the insert to the #temp table is instant both times.

Wanna Hear It?

Let’s do what most good parameter sniffing problem solvers do, and re-run the procedure after recompiling for the “big” value.

It’s the exact same darn plan.

Normally, when you’re dealing with parameter sniffing, and you recompile a procedure, you get a different plan for different values.

Not here though. Yes, it’s faster, but it’s the same operators. Seek, Compute, Stream, Stream, Compute, Insert 1 row.

Important to note here is that the two stream aggregates take around the same about of time as before too.

The real speed up was in the Seek.

How do you make a Seek faster?

YOU NEEK UP ON IT.

Three Days Later

I just woke up from beating myself with a hammer. Sorry about what I wrote before. That wasn’t funny.

But okay, really, what happened? Why is one Seek 4 seconds faster than another seek?

Locking.

All queries do it, and we can prove that’s what’s going on here by adding a locking hint to our select query.

Now, I understand why NOLOCK would set your DBA in-crowd friends off, and how TABLOCK would be an affront to all sense and reason for a select.

So how about a PAGLOCK hint? That’s somewhere in the middle.

The Seek that took 5.6 seconds is down to 2.2 seconds.

And all this time people told you hints were bad and evil, eh?

YTHO?

It’s pretty simple, once you talk it out.

All queries take locks (even NOLOCK/READ UNCOMMITTED queries).

Lock escalation doesn’t usually happen with them though, because locks don’t accumulate with read queries the way they do with modification queries. They grab on real quick and then let go (except when…).

For the “small” plan, we start taking row locks, and keep taking row locks. The optimizer has informed the storage engine that ain’t much ado about whatnot here, because the estimate (which is correct) is only for 8,190 rows.

That estimate is preserved for the “big” plan that has to go and get a lot more rows. Taking all those additional row locks really slows things down.

No Accumulation, No Escalation.

We stay on taking 9.8 million row locks instead of escalating up to page or object locks.

When we request page locks from the get-go, we incur less overhead.

For the record:

PAGLOCK: 2.4 seconds
TABLOCK: 2.4 seconds
NOLOCK: 2.4 seconds

Nothing seems to go quite as fast as when we start with the “big” parameter, but there’s another reason for that.

When we use the “big” parameter, we get batch mode on the Seek.

SQL Server Query Plan Tool Tip — A FULL SECOND

Welcome to 2019, pal.

Thanks for reading!

Going Further

What’s The Point Of Estimated Plans In SQL Server, Anyway?

Posted on June 22, 2020May 16, 2022 by Erik Darling

Further From Truth

Estimated plans can be a bit confounding, huh?

Whether you see one by hitting CTRL+L, looking in the plan cache, or looking in Query Store, the song remains the same.

The query is slow: sometimes? always? usually? rarely?

But where? But why?

Estimated plans can hide a lot of work from you. They’re sort of like government contracts, where the optimizer is the government: it picks the cheapest plan, but then…

(Okay, maybe it picks a plan because a friend or relative or big donor was behind it. Sometimes it seems that way, with the nonsense you end up with.)

Of course once execution is under way, you run into all sorts of unexpected costs: Everything is much more expensive, and takes a whole lot longer than projected.

Glad Ways

From a practical perspective, estimated plans can be useful to get a general idea of the plan shape that the optimizer is coming up with.

Despite all the lying estimated plans can do, you may be able to spot some obvious anti-patterns.

Of more utility is when the query is particularly long running, or when the query will modify underlying data.

When I’m tuning modification queries, there’s usually some pain involved when you need to reset things.

For example:

Most people don’t want to wait 5-10 minutes just to see if a plan changed after making changes
Deleting data that doesn’t exist
Updates that don’t change anything
Not being able to insert duplicate data a second time

…Are all hindrances to getting things done.

And so you need to restore or roll things back another way (and boy howdy has SQL Server 2019 spoiled me for rollbacks).

Thanks for reading!

Going Further

Query Tuning SQL Server 2019 Part 5: I’m Not Going Back

Posted on June 19, 2020May 14, 2022 by Erik Darling

Butt Out Bag

There was one thing that I didn’t talk about earlier in the week.

You see, there’s a mystery plan.

It only shows up once in a while, like Planet X. And when it does, we get bombarded by asteroids.

Just like when Planet X shows up.

I wouldn’t call it a good all-around plan, but it does something that we would want to happen when we run this procedure for VoteTypeId 5.

Let’s go look!

The Optimizer Discovers Aggregates, Sort Of

This isn’t a good “general” plan. In fact, for any of the previously fast values, it sucks.

It sucks because just like the “optimize for unknown” plan, it has a bunch of startup costs, does a lot of scanning, and is generally a bad choice for VoteTypeIds that produce a small number of values.

Johnny Four

If you look carefully, you can see what the problem is.

For VoteTypeIds that filter out a lot of rows (which is most of them), that predicate doesn’t get applied until after Posts and Badges have been joined.

In other words, you fully join those tables, and then the result of that join is joined to the predicate-filtered result of Votes.

For this execution, the plan was compiled initially for VoteTypeId 2. It has 130 million entries in Votes. It’s the only VoteTypeId that produces this plan naturally.

The plan you’re looking at above was re-executed with VoteTypeId 4, which has… 8,190 rows in Votes.

I can’t stress enough how difficult it would be to figure out why this is bad just looking at estimated plans.

Though one clue would be the clustered index scan + predicate, if we knew that we had a suitable index.

This kind of detail with row discrepancies only surfaces with actual plans.

But there is one thing here that wasn’t showing up in other plans, when we wanted it to: The optimizer decides to aggregate OwnerUserId coming from the Posts table prior to joining to Votes.

Johnny Five

If you recall the previously used plan, one complaint was that the result of joining Posts and Badges then joined to Votes had to probe 932 million rows.

You can sort of see that here, where the Adaptive Join prior to the highlighted Hash Match Aggregate produces >100 million rows. It’s more here because we don’t have Bitmaps against both Posts and Badges, but… We’re going off track a bit with that.

That could have been avoided if the optimizer had decided to aggregate OwnerUserId, like it does in this plan.

To compare:

The top plan has a handy green square to show you a helpful pre-join aggregation.

The bottom plan has no handy green squares because there is no helpful pre-join aggregation.

The product of the aggregation is 3.2 million rows, which is exactly what we got as a distinct count when we began experimenting with temp tables:

SELECT COUNT_BIG(DISTINCT p.OwnerUserId) AS records --3,236,013 
FROM dbo.Posts AS p 
JOIN dbo.Badges AS b 
    ON b.UserId = p.OwnerUserId 
WHERE p.PostTypeId = 1;

Outhouse

If the optimizer had chosen to aggregate OwnerUserId prior to the join to Votes, we all could have gone home early on Friday and enjoyed the weekend

Funny, that.

Speaking of which, it’s Friday. Go enjoy the weekend.

Thanks for reading!

Going Further

We’re Taking This Show Online

Posted on June 19, 2020May 14, 2022 by Erik Darling

Maybe Next Year

With SQL Saturday Chicago being canceled this year, I still wanted the nice folks who signed up to spend a Friday with me learning about SQL Server Performance Tuning to do just that.

The event will be taking place Friday, June 26th from 10AM EST – You’re Done Asking Questions.

While you won’t get to see me eat lunch at the speed of light or wonder what kind of scotch I smell like, you will get to learn all the horrible things I know about SQL Server.

Because a lot of emails from Eventbrite seem to get sent to the Great Spam Heaven, I’m writing this post for more visibility. If you’ve already bought a ticket, please check your email. If you can’t find one, get in touch with me here. I’ve been sending them, I swear.

Showing Up

On the day of the training, you’ll be able to watch the video stream on my site here.

Late Registration

Since the event is online, I’ve decided to open ticket sales back up. If you wanna join in on the fun, you have until next Thursday to sign up.

All attendees get free access to my video training, for life. That’s a $1000 value.

Thanks, and see you there!

Going Further

Query Tuning SQL Server 2019 Part 4: Long Live The Query Tuner

Posted on June 18, 2020May 14, 2022 by Erik Darling

Rumors Of My Demise

Let’s talk about some common hints that people use to fix parameter sniffing:

RECOMPILE: Won’t work here to get us a better plan for VoteTypeId 5, because it sucks when the optimizer knows what’s coming
OPTIMIZE FOR UNKNOWN: Works like once every 5 years, but people still bring it up, and really sucks here (picture below)
OPTIMIZE FOR (VALUE): Plan sharing doesn’t work great generally — if we were gonna do this, it’d have to be dynamic

This is what happens when we optimize for unknown. The density vector guess is 13,049,400.

That guess for Vote Types with very few rows ends up with a plan that has very high startup costs.

This version of the query will run for 13-17 seconds for any given parameter. That sucks in zero gravity.

Pictured above is the plan for VoteTypeId 4, which previously finished sub-second using Plan 1 and Plan 2.

With those out of the way, how can we fix this thing?

The Mint

In some circumstances, a #temp table would help if we pre-staged rows from Votes.

The problem is that for many calls, we’d be putting between 7 and 130 MILLION rows into a temp table.

Not my idea of a good time.

SQL Server Query Results — RAMDISKS NINETY NINE CENTS

But what about the other part of the query?

If count up distinct OwnerUserIds, we get about 3.2 million.

Better, we can reduce the rows further in the procedure with an EXISTS to Votes (I’ll show you that in a minute).

SELECT COUNT_BIG(DISTINCT p.OwnerUserId) AS records --3,236,013
FROM dbo.Posts AS p
JOIN dbo.Badges AS b 
    ON b.UserId = p.OwnerUserId 
WHERE  p.PostTypeId = 1

That’s not too bad, depending on:

How frequently it runs
How concurrently it runs
How overwhelmed tempdb is
Your Mom

The Product

That gives us:

CREATE OR ALTER PROCEDURE dbo.VoteSniffing ( @VoteTypeId INT )
AS
BEGIN
SET XACT_ABORT, NOCOUNT ON;

SELECT DISTINCT p.OwnerUserId
INTO #p
FROM dbo.Posts AS p
JOIN dbo.Badges AS b
    ON b.UserId = p.OwnerUserId
WHERE p.PostTypeId = 1
AND EXISTS
(
    SELECT 1/0
    FROM dbo.Votes AS v
    WHERE v.UserId = p.OwnerUserId
    AND   v.VoteTypeId = @VoteTypeId
);

SELECT   ISNULL(v.UserId, 0) AS UserId,
         SUM(CASE WHEN v.CreationDate >= '20190101'
                  AND  v.CreationDate < '20200101'
                  THEN 1
                  ELSE 0
             END) AS Votes2019,
         SUM(CASE WHEN v.BountyAmount IS NULL
                  THEN 0
                  ELSE 1
             END) AS TotalBounty,
         COUNT(DISTINCT v.PostId) AS PostCount,
         @VoteTypeId AS VoteTypeId
FROM     dbo.Votes AS v WITH(FORCESEEK)
WHERE    v.VoteTypeId = @VoteTypeId
AND      NOT EXISTS
        (   
            SELECT 1/0
            FROM #p AS p
            WHERE  p.OwnerUserId = v.UserId
        )
GROUP BY v.UserId;

END;
GO

Which works pretty well across all calls, and avoids the primary issue with VoteTypeId 5.

SQL Server Query Execution Times — Navy Blue

I’m generally happy with this, with the slight exception of VoteTypeId 8. Yeah, it beats the pants off of when we sniff Plan 2, but it’s about 7 seconds slower than when we get Plan 1.

I pulled the 17 minute execution from this graph for Plan 2/VoteTypeId 5, too, because it’s so distracting. Not having to worry about that thing is a trade off I’m willing to make for Plan 3 being about a second slower than Plan 1.

Not bad for a lazy Sunday afternoon of blogging, though.

Save One For Friday

Query tuning in SQL Server 2019 isn’t always a whole lot different from performance tuning other versions of SQL Server.

You have some more help from optimizer features (especially if you’re on Enterprise Edition), but they don’t solve every problem, and you can run into some very common problems that you’re already used to solving.

You may even be able to use some very familiar techniques to fix things.

In tomorrow’s post, I want to look at a quirk that would have thrown us way off course to explore on our way here.

Thanks for reading!