I know, it’s tough to think about the tropical days of March in Chicago while you’re chattering your way through January. But for some of you, this is about how long it takes for you to get an expense approved and a day off. You’ll wanna be first in line once AP’s ink pads thaw out.
If your job is performance tuning SQL Server, and you’ve had it with struggling from one CPU fire to another, then you need to come to my precon. It’s taking place Friday March 20, in the lead up to SQL Saturday Chicago, which is a fantastic event.
I’m going to teach you how to solve the tough problems you have today with a mix of query and index tuning. You’ll walk out the door with new ways to think about performance tuning, and creative approaches to those queries that just seem totally untuneable.
For a limited time, use the coupon code “newyorkpizza” for $50 off the price of admission. That expires in February, so use it while you can.
Thanks for reading!
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.
If you have selective predicates earlier in the index that filter a lot of rows, the SARGability of trailing predicates matters less.
CREATE INDEX shorty ON dbo.a_table(selective_column, non_selective_column);
SELECT COUNT(*) AS records
FROM dbo.a_table AS a
WHERE selective_column = 1
AND ISNULL(non_selective_column, 'whatever') = 'whatever';
Am I saying you should do this? Am I saying that it’s a good example to set?
No. I’m just saying you can get away with it in this situation.
Longer Answer
The less selective other predicates are, the less you can get away with it.
Take these two queries:
SELECT COUNT(*) AS records
FROM dbo.Users AS u
WHERE u.Id = 8
AND ISNULL(u.Location, N'') = N'';
SELECT COUNT(*) AS records
FROM dbo.Users AS u
WHERE u.Id BETWEEN 8 AND 9693617
AND ISNULL(u.Location, N'') = N'';
The first one has an equality predicate on the Id, the primary key of the table. It’s going to touch one row, and then evaluate the residual predicate on Location.
The second query has a very non-selective range predicate on Id — still a selective column, just not a selective predicate anymore — so, we do a lot more work (relatively).
If we have this index, and we look at how four logically equivalent queries perform:
CREATE UNIQUE INDEX fast_lane ON dbo.Users(Id, Location);
SELECT COUNT(*) AS records
FROM dbo.Users AS u
WHERE u.Id = 8
AND ISNULL(u.Location, N'') = N'';
SELECT COUNT(*) AS records
FROM dbo.Users AS u
WHERE u.Id BETWEEN 8 AND 9693617
AND ISNULL(u.Location, N'') = N'';
SELECT COUNT(*) AS records
FROM dbo.Users AS u
WHERE u.Id = 8
AND ( u.Location = N''
OR u.Location IS NULL );
SELECT COUNT(*) AS records
FROM dbo.Users AS u
WHERE u.Id BETWEEN 8 AND 9693617
AND ( u.Location = N''
OR u.Location IS NULL );
The query plans tell us enough:
Toasty
It really doesn’t matter if we obey the laws of SARGability here.
Expect Depression
There have been many times when explaining SARGability to people that they went back and cleaned up code like this to find it didn’t make much of a difference to performance. That’s because SARGability depends on indexes that can support seekable predicates. Without those indexes, it makes no practical difference how you write these queries.
Again, I’m not condoning writing Fast Food Queries when you can avoid it. Like I said earlier, it sets a bad example.
Once this kind of code creeps into your development culture, it’s hard to keep it contained.
There’s no reason to not avoid it, but sometimes it hurts more than others. For instance, if Location were the first column in the index, we’d have a very different performance profile across all of these queries, and other rewrites might start to make more sense.
Thanks for reading!
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.
I see people using OUTPUT to audit modifications from time to time, often because “triggers are bad” or “triggers are slow”.
Well, sometimes, sure. But using OUTPUT can be a downer, too.
Let’s look at how.
A Process Emerges
Say we’ve got a table that we’re using to track user high scores for their questions.
CREATE TABLE dbo.HighQuestionScores
(
Id INT PRIMARY KEY CLUSTERED,
DisplayName NVARCHAR(40) NOT NULL,
Score BIGINT NOT NULL
);
To test the process, let’s put a single user in the table:
INSERT dbo.HighQuestionScores WITH (TABLOCK)
(Id, DisplayName, Score)
SELECT u.Id, u.DisplayName, p.Score
FROM dbo.Users AS u
JOIN
(
SELECT p.OwnerUserId,
MAX(p.Score) AS Score
FROM dbo.Posts AS p
WHERE p.PostTypeId = 1
GROUP BY p.OwnerUserId
) AS p ON p.OwnerUserId = u.Id
WHERE u.Id = 22656;
To exacerbate the problem, I’m not going to create any helpful indexes here. This is a good virtual reality simulator, because I’ve seen your indexes.
Yes you. Down in front.
The relevant part of the query plan is the scan of the Posts table:
Practice
It’s parallel, and takes 1.8 seconds.
Aw, dit
Now let’s add in an OUTPUT clause.
I’m going to skip over inserting the output into any structure, because I want you to understand that the target doesn’t matter.
INSERT dbo.HighQuestionScores WITH (TABLOCK)
(Id, DisplayName, Score)
OUTPUT Inserted.Id,
Inserted.DisplayName,
Inserted.Score
SELECT u.Id, u.DisplayName, p.Score
FROM dbo.Users AS u
JOIN
(
SELECT p.OwnerUserId, MAX(p.Score) AS Score
FROM dbo.Posts AS p
WHERE p.PostTypeId = 1
GROUP BY p.OwnerUserId
) AS p ON p.OwnerUserId = u.Id
WHERE u.Id = 22656;
The relevant part of the plan now looks like this:
Golf Coach
We’ve lost parallelism, and inspecting the properties of the Insert operator tells us why:
Less Successful
We’ve got a Non Parallel Plan Reason. Why aren’t there any spaces? I don’t know.
Why can’t that go parallel? I also don’t know.
What About Triggers?
If we create a minimal trigger on the table, we can see if it has the same overhead.
CREATE OR ALTER TRIGGER dbo.hqs_insert ON dbo.HighQuestionScores
AFTER INSERT
AS
BEGIN
SET NOCOUNT ON;
SELECT Inserted.Id,
Inserted.DisplayName,
Inserted.Score
FROM Inserted;
END
Let’s go back to the original insert, without the output! We care about two things:
Is the parallel portion of the insert plan still there?
Is there any limitation on parallelism with the Inserted (and by extension, Deleted) virtual tables?
The answers are mostly positive, too. The insert plan can still use parallelism.
I’m not gonna post the same picture here, you can scroll back fondly.
Though the select from the Inserted table within the trigger doesn’t go parallel, it doesn’t appear to limit parallelism for the entire plan. It does appear that reads from the Inserted table can’t use parallelism (sort of like the table variable in a MSTVF).
Let’s modify the trigger slightly:
CREATE OR ALTER TRIGGER dbo.hqs_insert ON dbo.HighQuestionScores
AFTER INSERT
AS
BEGIN
SET NOCOUNT ON;
DECLARE @Id INT
DECLARE @DisplayName NVARCHAR(40)
DECLARE @Score BIGINT
SELECT @Id = Inserted.Id,
@DisplayName = Inserted.DisplayName,
@Score = Inserted.Score
FROM Inserted
JOIN dbo.Comments AS c
ON c.UserId = Inserted.Id;
END
And for variety, let’s insert a lot more data into our table:
TRUNCATE TABLE dbo.HighQuestionScores;
INSERT dbo.HighQuestionScores WITH (TABLOCK)
(Id, DisplayName, Score)
SELECT u.Id, u.DisplayName, p.Score
FROM dbo.Users AS u
JOIN
(
SELECT p.OwnerUserId, MAX(p.Score) AS Score
FROM dbo.Posts AS p
WHERE p.PostTypeId = 1
GROUP BY p.OwnerUserId
) AS p ON p.OwnerUserId = u.Id
WHERE u.Id < 500000;
Here’s the query plan:
Wrecking Ball
The read from Inserted is serial, but the remainder of the plan fully embraces parallelism like a long lost donut.
Togetherness
Given a well-tuned workload, you may not notice any particular overhead from using OUTPUT to audit certain actions.
Of course, if you’re using them alongside large inserts, and those large inserts happen to run for longer than you’d like, it might be time to see how long they take sans the OUTPUT clause. It’s entirely possible that using a trigger instead would cause fewer performance issues.
Thanks for reading!
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.
I speak with a lot of people who have SQL Server performance problems. Sometimes they’re constant, sometimes they’re obvious, and sometimes they’re intermittent.
Constant and obvious problems are easy to observe, though not necessarily easy to resolve without experience. Still, they’re a bit less frustrating, because at least you’ve got some idea what’s going on. It’s those intermittent problems that can be a real pain.
SQL Server is very catch-me-if-you-can when it comes to most issues. I’ve talked about this before I’m sure, but if you’re not hitting F5 at in the right place at the right time, you could totally miss what happened when you had a problem, and very there’s no time sliced data collection (except in query store) that can help with any reliability.
Not wait stats, not perfmon, not index usage. The plan cache can be okay, but is often unreliable, and has many blind spots.
Whatcha Got?
The people I talk to generally fall into two groups:
We don’t have a monitoring tool at all
We have one, but we can’t figure anything out with it
If you don’t have one, Sentry One is a good place to start. It’s not perfect, but I tend to have good luck with it.
If you do have one, make the most of it. Have a sales tech give you some training. That’s part of what you pay for. Take it a step further: tell them about the kind of problems you have, and ask them to show you how to find them using their monitoring tool. If they can’t do that, it might be time to shop around for one that can.
Old Block
This isn’t just for you, either. This request is sort of selfish.
Let’s say you call someone like me in to look at your server. If you’ve got intermittent issues, it’s a whole lot easier to show me what was happening in the monitoring tool the last time you had an issue than to try to explain to me what happened. It’s a lot less annoying than hoping it happens again while we’re looking at the server together, or trying to recreate it.
Thanks for reading!
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.
There are lots of great options out there for figuring out what’s happening on a server. If you’re the kind of person still running sp_who, or sp_who2, or sp_who8045, it’s time to stop.
You’ve got much better options, like sp_WhoIsActive, or the First Responder Kit. They’re both great, but sometimes I needed this very specific information very quickly without all the bells and whistles.
I call it the pressure detector. It’s four simple DMV queries that will give you different levels of detail about memory and CPU usage currently taking place on a server. Like a lot of these scripts, when a server is Baltimoring, you’re better off running them using the DAC. Otherwise, you’re kinda stuck in the same place all your other queries are.
So what does this help you with? Situations where you’re running out of worker threads, or when you’re running out of memory.
When you hit those, you’ll get smacked with a couple nasty waits:
THREADPOOL: Running out of worker threads
RESOURCE_SEMAPHORE: Running out of query memory
The thing is, sometimes it’s hard to see what’s happening, what’s causing problems, and what’s on the receiving end.
Check out this video to walk you through the procedure results.
Video Killed
Thanks for reading!
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.
The coupon code “whatever” will get you $75 off. It’s only got 10 uses, so hurry on over and use it before you miss out.
GO GO GO
Thanks for reading!
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.
I had a client recently with, wait for it, a performance problem. Or rather, two problems.
The OLTP part was working fine, but there was a reporting element that was dog slow, and would cause all sorts of problems on the server.
When we got into things, I noticed something rather funny: All of their reporting queries had very high estimated costs, and all the plans were totally serial.
The problem came down to two functions that were used in the OLTP portion, which were reused in the reporting portion.
Uh Ohs
I know what you’re thinking: 2019 would have fixed it.
Buuuuuuuuuuut.
No.
As magnificent and glorious as FROID is, there are a couple limitations that are pretty big gotchas:
The UDF does not invoke any intrinsic function that is either time-dependent (such as GETDATE()) or has side effects3 (such as NEWSEQUENTIALID()).
And
1SELECT with variable accumulation/aggregation (for example, SELECT @val += col1 FROM table1) is not supported for inlining.
Which is what both were doing. One was doing some date math based on GETDATE, the other was assembling a string based on some logic, and not the kind of thing that STRING_AGG would have helped with, unfortunately.
They could both be rewritten with a little bit of work, and once we did that and fixed up the queries using them, things looked a lot different.
Freeee
For these plans, it wasn’t just that they were forced to run on one CPU that was harming performance. In some cases, these functions were in WHERE clauses. They were being used to filter data from tables with many millions of rows.
Yes, there was a WHERE clause that looked like AND dbo.function(somecol) LIKE ‘%thing%’, which was… Brave?
Getting rid of those bottlenecks relieved quite a lot of pain.
If you want to find stuff like this on your own, here’s what you can do:
Looking at the execution plan, hit get the properties of the select operator and look for a “NonParallelPlanReason”
Run sp_BlitzCache and look for “Forced Serialization” warnings
Inspect Filter operators in your query plans (I’m almost always suspicious of these things)
Review code for scalar valued function calls
Thanks for reading!
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.
Our job now is to figure out how to even things out. To do that, we’re gonna need to mess with out index a little bit.
Right now, we have this one:
CREATE INDEX whatever
ON dbo.Posts(PostTypeId, LastActivityDate)
INCLUDE(Score, ViewCount);
Which is fine when we need to Sort a small amount of data.
SELECT TOP (5000)
p.LastActivityDate,
p.PostTypeId,
p.Score,
p.ViewCount
FROM dbo.Posts AS p
WHERE p.PostTypeId = 4
AND p.LastActivityDate >= '20120101'
ORDER BY p.Score DESC;
There’s only about 25k rows with a PostTypeId of 4. That’s easy to deal with.
The problem is here:
SELECT TOP (5000)
p.LastActivityDate,
p.PostTypeId,
p.Score,
p.ViewCount
FROM dbo.Posts AS p
WHERE p.PostTypeId = 1
AND p.LastActivityDate >= '20110101'
ORDER BY p.Score DESC;
Theres 6,000,223 rows with a PostTypeId of 1 — that’s a question.
Don’t get me started on PostTypeId 2 — that’s an answer — which has 11,091,349 rows.
Change Management
What a lot of people try first is an index that leads with Score. Even though it’s not in the WHERE clause to help us find data, the index putting Score in order first seems like a tempting fix to our problem.
CREATE INDEX whatever
ON dbo.Posts(Score DESC, PostTypeId, LastActivityDate)
INCLUDE(ViewCount)
The result is pretty successful. Both plans are likely fast enough, and we could stop here, but we’d miss a key point about B-Tree indexes.
It’s not so bad.
What’s a bit deceptive about the speed is the amount of reads we do to locate our data.
Scan-Some
We only need to read 15k rows to find the top 5000 Questions — remember that these are very common.
We need to read many more rows to find the top 5000… Er… Whatever a 4 means.
Imaginary Readers
Nearly the entire index is read to locate these Post Types.
Meet In The Middle
The point we’d miss if we stopped tuning there is that when we add key columns to a B-Tree index, the index is first ordered by the leading key column. If it’s not unique, then the second column is ordered within each range of values.
Pale Coogi Wave
Putting this together, let’s change our index a little bit:
CREATE INDEX whatever
ON dbo.Posts(PostTypeId, Score DESC, LastActivityDate)
INCLUDE(ViewCount) WITH (DROP_EXISTING = ON);
With the understanding that seeking to a single PostTypeId column will bring us to an ordered Sort column for that range of values.
Now our plans look like this:
???
Which allows us to both avoid the Sort and keep reads to a minimum.
reed les
Interior Design
When designing indexes, it’s important to keep the goal of queries in mind. Often, predicates should be the primary consideration.
Other times, we need to take ordering and grouping into account. For example, if we’re using window functions, performance might be unacceptable without indexing the partition by and order by elements, and we may need to move other columns to parts of the index that may not initially seem ideal.
Thanks for reading!
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.
I asked you to design one index to make two queries fast.
If we look at the plans with no supporting indexes, we’ll see why they need some tuning.
Get a job
In both queries, the optimizer will ask for a “missing index”. That’s in quotes because, gosh darnit, I wouldn’t miss this index.
Green Screen
Nauseaseated
If we add it, results are mixed, like cheap scotch.
Keep Walking
Sure, there’s some improvement, but both aren’t fast. The second query does a lot of work to sort data.
We have an inkling that if we stopped doing that, our query may get quicker.
Let’s stop and think here: What are we ordering by?
Of course, it’s the thing in the order by: Score DESC.
Where Do We Go Now?
It looks like that missing index request was wrong. Score shouldn’t have been an included column.
Columns in the include list are only ordered by columns in the key of the index.
If we wanna fix that Sort, we need to make it a key column.
But where?
Get to work.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.
SELECT TOP (5000)
p.LastActivityDate,
p.PostTypeId,
p.Score,
p.ViewCount
FROM dbo.Posts AS p
WHERE p.PostTypeId = 4
AND p.LastActivityDate >= '20120101'
ORDER BY p.Score DESC;
SELECT TOP (5000)
p.LastActivityDate,
p.PostTypeId,
p.Score,
p.ViewCount
FROM dbo.Posts AS p
WHERE p.PostTypeId = 1
AND p.LastActivityDate >= '20110101'
ORDER BY p.Score DESC;
Get to work.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.