Found A Picture Of You
The way that many people will tell you to design indexes is something along the lines of:
- Equality predicates first =
- Inequality predicates next >, >=, <, <=, <>
- This point intentionally left blank
- Put all your select list columns as includes
Well, okay. That probably works with one of those tiny example databases that people keep coming up with creative ways to make bigger.
Real life is often larger than 100MB, though. And besides, real life queries are far more complicated.
Familiar
We don’t have to get too complicated to explose some of the issues with those rules, as queries go a little beyond the where clause.
SELECT TOP (5000) v.Id, v.PostId, v.UserId, v.BountyAmount, v.VoteTypeId, v.CreationDate FROM dbo.Votes AS v WHERE v.VoteTypeId = ? AND v.CreationDate >= ? ORDER BY v.BountyAmount DESC;
If we were to faithfully follow the Rules of Indexing™, we might end up with an index like this:
CREATE INDEX an_attempt_was_made ON dbo.Votes (VoteTypeId, CreationDate) INCLUDE(BountyAmount, PostId, UserId);
Which, aside from Include column order (which we know doesn’t matter), is about what SQL Server’s missing index recommendation would be.
Could it be that conventional index design wisdom is based on a faulty algorithm?
Could be. Could be.
Strange
But what happens when we add the index, according to ancient index law, and run queries with different values that have different selectivities?
SELECT TOP (5000) v.Id, v.PostId, v.UserId, v.BountyAmount, v.VoteTypeId, v.CreationDate FROM dbo.Votes AS v WHERE v.VoteTypeId = 7 --Not a lot of these AND v.CreationDate >= '20080101' ORDER BY v.BountyAmount DESC; SELECT TOP (5000) v.Id, v.PostId, v.UserId, v.BountyAmount, v.VoteTypeId, v.CreationDate FROM dbo.Votes AS v WHERE v.VoteTypeId = 2 --Yes a lot of these AND v.CreationDate >= '20080101' ORDER BY v.BountyAmount DESC;
This isn’t parameter sniffing, because we used literal values, and got different executions plans. They even have correct estimates.
Both queries used our index, too. It did okay for a small amount of data, but for a larger amount of data, we got totally sunk.
Order: Equality vs Inequality Predicates
We’ve talked about how indexes make searching easier by putting data in order, and how each key column in an index has a dependency for that order on the column before it. Obviously having the BountyAmount column as an include isn’t going to be helpful, because those columns aren’t stored in any order.
But there’s an important difference in where we put the BountyAmount column in the key of the index. For example, if we change it to look like this:
CREATE INDEX an_attempt_was_made ON dbo.Votes (VoteTypeId, CreationDate, BountyAmount) INCLUDE(PostId, UserId) WITH(DROP_EXISTING = ON);
We’re not going to do any better. Why? The predicate on CreationDate is an inequality. BountyAmount is only in order within groups of duplicate dates.
Once we cross a date boundary, the order resets, just like we talked about in yesterday’s post. If we want to tune this index to make this query fast for large or small predicates on VoteTypeId, we need to change our index to look like this:
CREATE INDEX an_attempt_was_made ON dbo.Votes (VoteTypeId, BountyAmount, CreationDate) INCLUDE(PostId, UserId) WITH(DROP_EXISTING = ON);
Rules Of The Road
For index key columns, data stored in order can make searching, joining, grouping, and ordering in queries a lot more efficient. That’s probably just the sort of thing you’re after when query tuning. And index tuning is, after all, part of query tuning.
But I bet you’re asking: why focus so much on Sorts? Tune in tomorrow to find out why!
Thanks for reading!
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.
Related Posts
- Starting SQL: SARGability, Or Why Some SQL Server Queries Will Never Seek
- Starting SQL: Fixing Parameter Sensitivity Problems With SQL Server Queries
- Starting SQL: How Parameters Can Change Which Indexes SQL Server Chooses
- Starting SQL: Why Is My SQL Server Query Suddenly Slower Than It Was Yesterday?