SQL Server Archives | Page 133 of 164

SQL Saturday Chicago Training Day

Posted on January 24, 2020May 16, 2022 by Erik Darling

Great Thaw

I know, it’s tough to think about the tropical days of March in Chicago while you’re chattering your way through January. But for some of you, this is about how long it takes for you to get an expense approved and a day off. You’ll wanna be first in line once AP’s ink pads thaw out.

If your job is performance tuning SQL Server, and you’ve had it with struggling from one CPU fire to another, then you need to come to my precon. It’s taking place Friday March 20, in the lead up to SQL Saturday Chicago, which is a fantastic event.

I’m going to teach you how to solve the tough problems you have today with a mix of query and index tuning. You’ll walk out the door with new ways to think about performance tuning, and creative approaches to those queries that just seem totally untuneable.

For a limited time, use the coupon code “newyorkpizza” for $50 off the price of admission. That expires in February, so use it while you can.

Thanks for reading!

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

When Do Non-SARGable Predicates Not Cause Performance Problems In SQL Server?

Posted on January 21, 2020May 16, 2022 by Erik Darling

Short Answer

If you have selective predicates earlier in the index that filter a lot of rows, the SARGability of trailing predicates matters less.

	CREATE INDEX shorty ON dbo.a_table(selective_column, non_selective_column);

	SELECT COUNT(*) AS records
	FROM dbo.a_table AS a
	WHERE selective_column = 1
	AND ISNULL(non_selective_column, 'whatever') = 'whatever';

Am I saying you should do this? Am I saying that it’s a good example to set?

No. I’m just saying you can get away with it in this situation.

Longer Answer

The less selective other predicates are, the less you can get away with it.

Take these two queries:

	SELECT COUNT(*) AS records
	FROM dbo.Users AS u
	WHERE u.Id = 8
	AND ISNULL(u.Location, N'') = N'';

    SELECT COUNT(*) AS records
    FROM dbo.Users AS u
    WHERE u.Id BETWEEN 8 AND 9693617
	AND ISNULL(u.Location, N'') = N'';

The first one has an equality predicate on the Id, the primary key of the table. It’s going to touch one row, and then evaluate the residual predicate on Location.

The second query has a very non-selective range predicate on Id — still a selective column, just not a selective predicate anymore — so, we do a lot more work (relatively).

If we have this index, and we look at how four logically equivalent queries perform:

CREATE UNIQUE INDEX fast_lane ON dbo.Users(Id, Location);

SELECT COUNT(*) AS records
FROM   dbo.Users AS u
WHERE  u.Id = 8
AND    ISNULL(u.Location, N'') = N'';

SELECT COUNT(*) AS records
FROM   dbo.Users AS u
WHERE  u.Id BETWEEN 8 AND 9693617
AND    ISNULL(u.Location, N'') = N'';

SELECT COUNT(*) AS records
FROM   dbo.Users AS u
WHERE  u.Id = 8
AND   (   u.Location = N''
      OR   u.Location IS NULL );

SELECT COUNT(*) AS records
FROM   dbo.Users AS u
WHERE  u.Id BETWEEN 8 AND 9693617
AND   (   u.Location = N''
      OR   u.Location IS NULL );

The query plans tell us enough:

It really doesn’t matter if we obey the laws of SARGability here.

Expect Depression

There have been many times when explaining SARGability to people that they went back and cleaned up code like this to find it didn’t make much of a difference to performance. That’s because SARGability depends on indexes that can support seekable predicates. Without those indexes, it makes no practical difference how you write these queries.

Again, I’m not condoning writing Fast Food Queries when you can avoid it. Like I said earlier, it sets a bad example.

Once this kind of code creeps into your development culture, it’s hard to keep it contained.

There’s no reason to not avoid it, but sometimes it hurts more than others. For instance, if Location were the first column in the index, we’d have a very different performance profile across all of these queries, and other rewrites might start to make more sense.

Thanks for reading!

Going Further

An Unfortunate Performance Side Effect Of The OUTPUT Clause In SQL Server

Posted on January 20, 2020May 14, 2022 by Erik Darling

Put Out

I see people using OUTPUT to audit modifications from time to time, often because “triggers are bad” or “triggers are slow”.

Well, sometimes, sure. But using OUTPUT can be a downer, too.

Let’s look at how.

A Process Emerges

Say we’ve got a table that we’re using to track user high scores for their questions.

CREATE TABLE dbo.HighQuestionScores
(
    Id INT PRIMARY KEY CLUSTERED,
    DisplayName NVARCHAR(40) NOT NULL,
    Score BIGINT NOT NULL
);

To test the process, let’s put a single user in the table:

INSERT dbo.HighQuestionScores WITH (TABLOCK)
        (Id, DisplayName, Score)
SELECT u.Id, u.DisplayName, p.Score
FROM dbo.Users AS u
JOIN 
(
    SELECT p.OwnerUserId, 
	       MAX(p.Score) AS Score
    FROM dbo.Posts AS p
    WHERE p.PostTypeId = 1
    GROUP BY p.OwnerUserId
) AS p ON p.OwnerUserId = u.Id
WHERE u.Id = 22656;

To exacerbate the problem, I’m not going to create any helpful indexes here. This is a good virtual reality simulator, because I’ve seen your indexes.

Yes you. Down in front.

The relevant part of the query plan is the scan of the Posts table:

It’s parallel, and takes 1.8 seconds.

Aw, dit

Now let’s add in an OUTPUT clause.

I’m going to skip over inserting the output into any structure, because I want you to understand that the target doesn’t matter.

INSERT dbo.HighQuestionScores WITH (TABLOCK)
        (Id, DisplayName, Score)
OUTPUT Inserted.Id,
       Inserted.DisplayName,
       Inserted.Score
SELECT u.Id, u.DisplayName, p.Score
FROM dbo.Users AS u
JOIN 
(
    SELECT p.OwnerUserId, MAX(p.Score) AS Score
    FROM dbo.Posts AS p
    WHERE p.PostTypeId = 1
    GROUP BY p.OwnerUserId
) AS p ON p.OwnerUserId = u.Id
WHERE u.Id = 22656;

The relevant part of the plan now looks like this:

We’ve lost parallelism, and inspecting the properties of the Insert operator tells us why:

SQL Server Query Plan Tool Tip — Less Successful

We’ve got a Non Parallel Plan Reason. Why aren’t there any spaces? I don’t know.

Why can’t that go parallel? I also don’t know.

What About Triggers?

If we create a minimal trigger on the table, we can see if it has the same overhead.

CREATE OR ALTER TRIGGER dbo.hqs_insert ON dbo.HighQuestionScores
AFTER INSERT
AS
BEGIN
SET NOCOUNT ON;

    SELECT Inserted.Id,
           Inserted.DisplayName,
           Inserted.Score
    FROM Inserted;

END

Let’s go back to the original insert, without the output! We care about two things:

Is the parallel portion of the insert plan still there?
Is there any limitation on parallelism with the Inserted (and by extension, Deleted) virtual tables?

The answers are mostly positive, too. The insert plan can still use parallelism.

I’m not gonna post the same picture here, you can scroll back fondly.

Though the select from the Inserted table within the trigger doesn’t go parallel, it doesn’t appear to limit parallelism for the entire plan. It does appear that reads from the Inserted table can’t use parallelism (sort of like the table variable in a MSTVF).

Let’s modify the trigger slightly:

CREATE OR ALTER TRIGGER dbo.hqs_insert ON dbo.HighQuestionScores
AFTER INSERT
AS
BEGIN
SET NOCOUNT ON;

    DECLARE @Id INT
	DECLARE @DisplayName NVARCHAR(40)
	DECLARE @Score BIGINT

    SELECT @Id = Inserted.Id,
           @DisplayName = Inserted.DisplayName,
           @Score = Inserted.Score
    FROM Inserted
	JOIN dbo.Comments AS c 
	    ON c.UserId = Inserted.Id;

END

And for variety, let’s insert a lot more data into our table:

TRUNCATE TABLE dbo.HighQuestionScores;

INSERT dbo.HighQuestionScores WITH (TABLOCK)
        (Id, DisplayName, Score)
SELECT u.Id, u.DisplayName, p.Score
FROM dbo.Users AS u
JOIN 
(
    SELECT p.OwnerUserId, MAX(p.Score) AS Score
    FROM dbo.Posts AS p
    WHERE p.PostTypeId = 1
    GROUP BY p.OwnerUserId
) AS p ON p.OwnerUserId = u.Id
WHERE u.Id < 500000;

Here’s the query plan:

The read from Inserted is serial, but the remainder of the plan fully embraces parallelism like a long lost donut.

Togetherness

Given a well-tuned workload, you may not notice any particular overhead from using OUTPUT to audit certain actions.

Of course, if you’re using them alongside large inserts, and those large inserts happen to run for longer than you’d like, it might be time to see how long they take sans the OUTPUT clause. It’s entirely possible that using a trigger instead would cause fewer performance issues.

Thanks for reading!

Going Further

Your Resolution This Year: Get A SQL Server Monitoring Tool That Helps

Posted on January 17, 2020May 16, 2022 by Erik Darling

Yes We Have No Bananas

I speak with a lot of people who have SQL Server performance problems. Sometimes they’re constant, sometimes they’re obvious, and sometimes they’re intermittent.

Constant and obvious problems are easy to observe, though not necessarily easy to resolve without experience. Still, they’re a bit less frustrating, because at least you’ve got some idea what’s going on. It’s those intermittent problems that can be a real pain.

SQL Server is very catch-me-if-you-can when it comes to most issues. I’ve talked about this before I’m sure, but if you’re not hitting F5 at in the right place at the right time, you could totally miss what happened when you had a problem, and very there’s no time sliced data collection (except in query store) that can help with any reliability.

Not wait stats, not perfmon, not index usage. The plan cache can be okay, but is often unreliable, and has many blind spots.

Whatcha Got?

The people I talk to generally fall into two groups:

We don’t have a monitoring tool at all
We have one, but we can’t figure anything out with it

If you don’t have one, Sentry One is a good place to start. It’s not perfect, but I tend to have good luck with it.

If you do have one, make the most of it. Have a sales tech give you some training. That’s part of what you pay for. Take it a step further: tell them about the kind of problems you have, and ask them to show you how to find them using their monitoring tool. If they can’t do that, it might be time to shop around for one that can.

Old Block

This isn’t just for you, either. This request is sort of selfish.

Let’s say you call someone like me in to look at your server. If you’ve got intermittent issues, it’s a whole lot easier to show me what was happening in the monitoring tool the last time you had an issue than to try to explain to me what happened. It’s a lot less annoying than hoping it happens again while we’re looking at the server together, or trying to recreate it.

Thanks for reading!

Going Further

Quickly Detecting CPU and Memory Pressure On A SQL Server With sp_PressureDetector

Posted on January 16, 2020May 14, 2022 by Erik Darling

Options Ahoy

There are lots of great options out there for figuring out what’s happening on a server. If you’re the kind of person still running sp_who, or sp_who2, or sp_who8045, it’s time to stop.

You’ve got much better options, like sp_WhoIsActive, or the First Responder Kit. They’re both great, but sometimes I needed this very specific information very quickly without all the bells and whistles.

I’ve had to do some pretty weird troubleshooting in my life, where a server was totally jammed up, and these help me figure out what’s going on.

Sir-Procs-A-Lot

I call it the pressure detector. It’s four simple DMV queries that will give you different levels of detail about memory and CPU usage currently taking place on a server. Like a lot of these scripts, when a server is Baltimoring, you’re better off running them using the DAC. Otherwise, you’re kinda stuck in the same place all your other queries are.

So what does this help you with? Situations where you’re running out of worker threads, or when you’re running out of memory.

When you hit those, you’ll get smacked with a couple nasty waits:

THREADPOOL: Running out of worker threads
RESOURCE_SEMAPHORE: Running out of query memory

The thing is, sometimes it’s hard to see what’s happening, what’s causing problems, and what’s on the receiving end.

Check out this video to walk you through the procedure results.

Video Killed

Thanks for reading!

Going Further

Chicago Precon Discount Code!

Posted on January 15, 2020May 16, 2022 by Erik Darling

Wet Wednesday, Thirsty Thursday

Everyone loves a deal, and if you’re attending SQL Saturday Chicago, here’s a great one on my precon, Premium Performance Tuning.

The coupon code “whatever” will get you $75 off. It’s only got 10 uses, so hurry on over and use it before you miss out.

GO GO GO

Thanks for reading!

Going Further

Unclogging SQL Server Performance

Posted on January 14, 2020May 14, 2022 by Erik Darling

Oft Evil

I had a client recently with, wait for it, a performance problem. Or rather, two problems.

The OLTP part was working fine, but there was a reporting element that was dog slow, and would cause all sorts of problems on the server.

When we got into things, I noticed something rather funny: All of their reporting queries had very high estimated costs, and all the plans were totally serial.

The problem came down to two functions that were used in the OLTP portion, which were reused in the reporting portion.

Uh Ohs

I know what you’re thinking: 2019 would have fixed it.

Buuuuuuuuuuut.

No.

As magnificent and glorious as FROID is, there are a couple limitations that are pretty big gotchas:

The UDF does not invoke any intrinsic function that is either time-dependent (such as GETDATE()) or has side effects³ (such as NEWSEQUENTIALID()).

And

¹ SELECT with variable accumulation/aggregation (for example, SELECT @val += col1 FROM table1) is not supported for inlining.

Which is what both were doing. One was doing some date math based on GETDATE, the other was assembling a string based on some logic, and not the kind of thing that STRING_AGG would have helped with, unfortunately.

They could both be rewritten with a little bit of work, and once we did that and fixed up the queries using them, things looked a lot different.

Freeee

For these plans, it wasn’t just that they were forced to run on one CPU that was harming performance. In some cases, these functions were in WHERE clauses. They were being used to filter data from tables with many millions of rows.

Yes, there was a WHERE clause that looked like AND dbo.function(somecol) LIKE ‘%thing%’, which was… Brave?

Getting rid of those bottlenecks relieved quite a lot of pain.

If you want to find stuff like this on your own, here’s what you can do:

Looking at the execution plan, hit get the properties of the select operator and look for a “NonParallelPlanReason”
Run sp_BlitzCache and look for “Forced Serialization” warnings
Inspect Filter operators in your query plans (I’m almost always suspicious of these things)
Review code for scalar valued function calls

Thanks for reading!

Going Further

Let’s Design A SQL Server Index Together Part 3

Posted on January 13, 2020May 14, 2022 by Erik Darling

Previously

We had a couple queries we wanted to make fast, but SQL Server’s missing index request had mixed results.

Our job now is to figure out how to even things out. To do that, we’re gonna need to mess with out index a little bit.

Right now, we have this one:

CREATE INDEX whatever 
    ON dbo.Posts(PostTypeId, LastActivityDate) 
	    INCLUDE(Score, ViewCount);

Which is fine when we need to Sort a small amount of data.

SELECT TOP (5000)
       p.LastActivityDate,
       p.PostTypeId,
       p.Score,
       p.ViewCount
FROM dbo.Posts AS p
WHERE p.PostTypeId = 4
AND   p.LastActivityDate >= '20120101'
ORDER BY p.Score DESC;

There’s only about 25k rows with a PostTypeId of 4. That’s easy to deal with.

The problem is here:

SELECT TOP (5000)
       p.LastActivityDate,
       p.PostTypeId,
       p.Score,
       p.ViewCount
FROM dbo.Posts AS p
WHERE p.PostTypeId = 1
AND   p.LastActivityDate >= '20110101'
ORDER BY p.Score DESC;

Theres 6,000,223 rows with a PostTypeId of 1 — that’s a question.

Don’t get me started on PostTypeId 2 — that’s an answer — which has 11,091,349 rows.

Change Management

What a lot of people try first is an index that leads with Score. Even though it’s not in the WHERE clause to help us find data, the index putting Score in order first seems like a tempting fix to our problem.

CREATE INDEX whatever 
    ON dbo.Posts(Score DESC, PostTypeId, LastActivityDate) 
	    INCLUDE(ViewCount)

The result is pretty successful. Both plans are likely fast enough, and we could stop here, but we’d miss a key point about B-Tree indexes.

What’s a bit deceptive about the speed is the amount of reads we do to locate our data.

We only need to read 15k rows to find the top 5000 Questions — remember that these are very common.

We need to read many more rows to find the top 5000… Er… Whatever a 4 means.

Nearly the entire index is read to locate these Post Types.

Meet In The Middle

The point we’d miss if we stopped tuning there is that when we add key columns to a B-Tree index, the index is first ordered by the leading key column. If it’s not unique, then the second column is ordered within each range of values.

SQL Server Index Visualization — Pale Coogi Wave

Putting this together, let’s change our index a little bit:

CREATE INDEX whatever 
    ON dbo.Posts(PostTypeId, Score DESC, LastActivityDate) 
	    INCLUDE(ViewCount) WITH (DROP_EXISTING = ON);

With the understanding that seeking to a single PostTypeId column will bring us to an ordered Sort column for that range of values.

Now our plans look like this:

Which allows us to both avoid the Sort and keep reads to a minimum.

SQL Server Query Plan Tool Tips — reed les

Interior Design

When designing indexes, it’s important to keep the goal of queries in mind. Often, predicates should be the primary consideration.

Other times, we need to take ordering and grouping into account. For example, if we’re using window functions, performance might be unacceptable without indexing the partition by and order by elements, and we may need to move other columns to parts of the index that may not initially seem ideal.

Thanks for reading!

Going Further

Let’s Design A SQL Server Index Together Part 2

Posted on January 10, 2020May 14, 2022 by Erik Darling

Once Upon A Time

I asked you to design one index to make two queries fast.

If we look at the plans with no supporting indexes, we’ll see why they need some tuning.

In both queries, the optimizer will ask for a “missing index”. That’s in quotes because, gosh darnit, I wouldn’t miss this index.

SQL Server Missing Index Request — Green Screen

Nauseaseated

If we add it, results are mixed, like cheap scotch.

Sure, there’s some improvement, but both aren’t fast. The second query does a lot of work to sort data.

We have an inkling that if we stopped doing that, our query may get quicker.

Let’s stop and think here: What are we ordering by?

Of course, it’s the thing in the order by: Score DESC.

Where Do We Go Now?

It looks like that missing index request was wrong. Score shouldn’t have been an included column.

Columns in the include list are only ordered by columns in the key of the index.

If we wanna fix that Sort, we need to make it a key column.

But where?

Get to work.

Going Further

Let’s Design A SQL Server Index Together Part 1

Posted on January 9, 2020May 14, 2022 by Erik Darling

Just One Index

I want both of these queries to be fast.

SELECT TOP (5000)
       p.LastActivityDate,
       p.PostTypeId,
       p.Score,
       p.ViewCount
FROM dbo.Posts AS p
WHERE p.PostTypeId = 4
AND   p.LastActivityDate >= '20120101'
ORDER BY p.Score DESC;


SELECT TOP (5000)
       p.LastActivityDate,
       p.PostTypeId,
       p.Score,
       p.ViewCount
FROM dbo.Posts AS p
WHERE p.PostTypeId = 1
AND   p.LastActivityDate >= '20110101'
ORDER BY p.Score DESC;

Get to work.