Why Index Tuning Is An Iterative Process In SQL Server

Here And There


For many people, index tuning means occasionally adding an index when there’s a report about a slow query. Those indexes might come from a query plan, or from the missing index DMVs, where SQL Server stores every complaint the optimizer files when it thinks an index might make a query better.

Sure, there are some people who think index tuning means rebuilding indexes or running DTA and checking all the boxes, but I ban those IP addresses.

Of course, there’s a whole lot more to index tuning. Adding indexes is fine to a point, but you really should spring clean those suckers once in a while, too.

Look for overlapping indexes, unused indexes, and check for any Heaps that may have snuck in there. sp_BlitzIndex is a pretty cool tool for that.

But even for adding indexes, sometimes it takes more than one pass, especially if you’re taking advice from query plans and DMVs.

How The What


Let’s say you’re looking at a server for the first time, or you’re not quite comfortable with designing your own indexes. No judgment, there.

You see a query plan for some piece of code that’s running slowly, and it has a missing index request.

SQL Server Query Plan
Sugar Sugar Sugar

There’s only one missing index request — there’s not a bunch of hidden ones like in some plans — and it looks moderately helpful so you decide to try it.

SQL Server Missing Index Request
Treefiddy

The thing is that as far as “stuff I want to go faster” in the plan, the clustered index scan on Posts is about 3x faster than the clustered index scan on Comments.

SQL Server Query Plan
Deal with it

And the index that’s being asked for is only going to help us find PostTypeId = 1. It’s not going to help with the rest of or join or filtering very much.

CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[Posts] ([PostTypeId])
INCLUDE ([CreationDate],[OwnerUserId])
SQL Server Query
Una Posta

We still need to filter on CreationDate, and join on OwnerUserId later. Sometimes this index will be “good enough” and other times it “won’t”.

If PostTypeId were really selective, or if this query were searching for a particularly selective PostTypeId, then it’d probably be okay-ish.

But we’re not, so we may settle on this index instead.

CREATE NONCLUSTERED INDEX p
ON [dbo].[Posts] ([PostTypeId], [CreationDate], [OwnerUserId]);

With that in place, we only get marginal improvement in the timing of the plan. It’s about 1.5 seconds faster.

Probably not what we’d wanna report to end users.

SQL Server Query Plan
Hella

But we have new green text! This time it’s for the Comments table, which is where our pain point lies time-wise.

CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[Comments] ([PostId],[CreationDate])

We add that, and reduce our query runtime to less than half of what it was originally.

SQL Server Query Plan
Babewood

Is 2.6 seconds good? Or great? All depends on expectations.

Could we keep going and experimenting? Sure.

It all depends what we’re allowed to change, what our skill level is (mine is quite low, ho ho ho), and what our priorities are (these are also quite low).

This Is Just One Query


And since we had the luxury of having it in front of us, running it, adding an index, running it again to test the index, etc., we were able to spot the second index request that ended up helping even more than the first one.

If you don’t have that luxury, or if you just poke around the missing index DMVs every 3-6 months, you could miss stuff like this. Sure, that first request would be there, and it might look tempting enough for you to add, but the second one wouldn’t appear until after that. That’s the one that really helped.

Whenever you’re tuning indexes, or releasing code that’s going to use existing data in new ways, you’d be doing yourself a big favor to check in on this stuff at least weekly.

You might be an index tuning wiz and not need to — if you are, I’d be amazed if you made it this far into my blog post, though — or you may catch “obvious” new indexes during development.

But I’m going to tell you something about end users: they’re devious, mischievous, and they’re out to make you look bad.

As soon as they start using those new features of yours, they’re going to abuse them. They’re going to do all sorts of horrible things that you never would have dreamed of. And I’ll bet some different indexes would help you keep your good name.

Or at least your job.

Thanks for reading!

As a postscript to this: I don’t want you to think that missing index requests are the end-all be-all of indexing wisdom. There are lots of limitations, and suggested column order isn’t perfect. But if you’re just getting started, they’re a great way to start to understand indexing, and see the problems they do and don’t solve. And look, the only way to make them better would be to spend longer during compilation thinking about things. That’s not how the optimizer should be spending its time. We’re lucky to get these for free, and you should view them as a learning tool.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Rewriting Multi Statement Table Valued Functions To Fix SQL Server Performance Problems

Hey You Should Do This


Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Tuning Performance Problems With Aggregates In SQL Server, Part 3

3/3


Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Tuning Performance Problems With Aggregates In SQL Server, Part 2

2/3


Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Tuning Performance Problems With Aggregates In SQL Server, Part 1

1/3


Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

But WHY Does ISNULL Have To Scan A SQL Server Index?

No Answers



I wish I had an answer here, because sometimes when I think about these things it feels a bit silly.

Maybe someday.

This is the post I wrote over at Brent’s site that I reference in the video.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

When Should You Index Temp Tables In SQL Server?

What I Mean Is


You already know that your temp table needs an index. Let’s say there’s some query plan ouchie from not adding one. You’ve already realized that you should probably use a clustered index rather than a nonclustered index. Adding a nonclustered index leaves you with a heap and an index, and there are a lot of times when nonclustered indexes won’t be used because they don’t cover the query columns enough.

Good. We’ve fixed you.

But, like, when should you create the index?

Options


You can do one of these things:

  • Inline, when you create the table
  • After you create the table
  • After you load data into the table

This requires a bit of testing to get right.

Inline


In many cases, this is the best option, for reasons outlined by Pam Lahoud.

Do not explicitly drop temp tables at the end of a stored procedure, they will get cleaned up when the session that created them ends.
Do not alter temp tables after they have been created.
Do not truncate temp tables
Move index creation statements on temp tables to the new inline index creation syntax that was introduced in SQL Server 2014.

Where it can be a bad option is:

  • If you can’t get a parallel insert even with a TABLOCK hint
  • Sorting the data to match index order on insert could result in some discomfort

After Creation


This is almost always not ideal, unless you want to avoid caching the temp table, and for the recompilation to occur for whatever reason.

It’s not that I’d ever rule this out as an option, but I’d wanna have a good reason for it.

Probably even several.

After Insert


This can sometimes be a good option if the query plan you get from inserting into the index is deficient in some way.

Like I mentioned up above, maybe you lose parallel insert, or maybe the DML Request Sort is a thorn in your side.

This can be awesome! Except on Standard Edition, where you can’t create indexes in parallel. Which picks off one of the reasons for doing this in the first place, and also potentially causes you headaches with not caching temp tables, and statement level recompiles.

One upside here is that if you insert data into a temp table with an index, and then run a query that causes statistics generation, you’ll almost certainly get the default sampling rate. That could potentially cause other annoyances. Creating the index after loading data means you get the full scan stats.

Hooray, I guess.

This may not ever be the end of the world, but here’s a quick example:

DROP TABLE IF EXISTS #t;
GO 

--Create a table with an index already on it
CREATE TABLE #t(id INT, INDEX c CLUSTERED(id));

--Load data
INSERT #t WITH(TABLOCK)
SELECT p.OwnerUserId
FROM dbo.Posts AS p;

--Run a query to generate statistics
SELECT COUNT(*)
FROM #t AS t
WHERE t.id BETWEEN 1 AND 10000
GO 

--See what's poppin'
SELECT hist.step_number, hist.range_high_key, hist.range_rows, 
    hist.equal_rows, hist.distinct_range_rows, hist.average_range_rows
FROM tempdb.sys.stats AS s
CROSS APPLY tempdb.sys.dm_db_stats_histogram(s.[object_id], s.stats_id) AS hist
WHERE OBJECT_NAME(s.object_id, 2) LIKE '#t%'
GO 
DROP TABLE #t;


--Create a query with no index
CREATE TABLE #t(id INT NOT NULL);

--Load data
INSERT #t WITH(TABLOCK)
SELECT p.OwnerUserId
FROM dbo.Posts AS p;

--Create the index
CREATE CLUSTERED INDEX c ON #t(id);

--Run a query to generate statistics
SELECT COUNT(*)
FROM #t AS t
WHERE t.id BETWEEN 1 AND 10000

--See what's poppin'
SELECT hist.step_number, hist.range_high_key, hist.range_rows, 
    hist.equal_rows, hist.distinct_range_rows, hist.average_range_rows
FROM tempdb.sys.stats AS s
CROSS APPLY tempdb.sys.dm_db_stats_histogram(s.[object_id], s.stats_id) AS hist
WHERE OBJECT_NAME(s.object_id, 2) LIKE '#t%'
GO 
DROP TABLE #t;
Neckin’ Neck

On the left is the first 20 steps from the first histogram, and on the right is the first 20 from the second one.

You can see some big differences — whether or not they end up helping or hurting performance would take a lot of different tests. Quite frankly, it’s probably not where I’d start a performance investigation, but I’d be lying if I told you it never ended up there.

All Things Considerateded


In general, I’d stick to using the inline index creation syntax. If I had to work around issues with that, I’d create the index after loading data, but being on Standard Edition brings some additional considerations around parallel index creation.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

How Functions Can Make Performance Tuning SQL Server Queries Harder

Sensational


I’ve posted quite a bit about how cached plans can be misleading.

I’m gonna switch that up and talk about how an actual plan can be misleading, too.

In plans that include calling a muti-statement table valued function, no operator logs the time spent in the function.

Here’s an example:

SELECT TOP (100)
     p.Id AS [Post Link],
     vs.up,
     vs.down
FROM dbo.VoteStats() AS vs --The function
JOIN dbo.Posts AS p
    ON vs.postid = p.Id
WHERE vs.down > vs.up_multiplier
AND   p.CommunityOwnedDate IS NULL
AND   p.ClosedDate IS NULL
ORDER BY vs.up DESC

When I run the query, it drags on for 30-ish seconds, but the plan says that it only ran for about 2.7 seconds.

SQL Server Query Plan
As we proceed

But there it is in Query Time Stats! 29 seconds. What gives?

SQL Server Query Times From Execution Plan
Hi there!

Estimations


If we look at the estimated plan for the function, we can see quite a thick arrow pointing to the table variable we populate for our results.

SQL Server Query Plan
Meatballs

That process is all part of the query, but it doesn’t show up in any of the operators. It really should.

More specifically, I think it should show up right here.

SQL Server Query Plan

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

My Favorite SQL Server Cursor Options: Local Static

Cursory


Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.