Home Star
For all the good indexes do, sometimes you just don’t need them to cover all of the data in a table for various reasons.
There are a number of fundamentals that you need to understand about them (that any good consultant will tell you about), but I wanna cover them here just in case you had an unfortunate run-in with a less-than-stellar consultant.
First, there are some session-level settings that need to be appropriately applied for them to be considered by the optimizer. This is especially important if you’re putting any logic into a SQL Server Agent job, because it uses the wrong settings for some reason.
Here are the correct settings:
- QUOTED_IDENTIFIER ON
- ANSI_NULLS ON
- ANSI_PADDING ON
- ANSI_WARNINGS ON
- ARITHABORT ON
- CONCAT_NULL_YIELDS_NULL ON
- NUMERIC_ROUNDABORT OFF
Second, you need to be careful about parameterized queries. If your filtered index and query looks like this:
CREATE INDEX u ON dbo.Users(Reputation) WHERE Reputation > 100000; SELECT c = COUNT_BIG(*) FROM dbo.Users AS u WHERE u.Reputation > @Reputation;
The optimizer will not use your filtered index, because it has to pick a safe and cache-able plan that works for any input to the @Reputation parameter (this goes for local variables, too).
To get around this, you can use:
- Recompile hint
- Literal value
- Potentially unsafe dynamic SQL
Third, you need to have the columns in your filtering expression be somewhere in your index definition (key or include) to help the optimizer choose your index in some situations.
Let’s say you have a filtered index that looks like this:
CREATE INDEX u ON dbo.Users(DisplayName) WHERE Reputation > 100000;
As thing stand, all the optimizer knows is that the index is filtered to Reputation values over 100k. If you need to search within that range, like 100k-200k, or >= 500k, it has to get those values from somewhere, and it has the same options as it does for other types of non-covering indexes:
- Ignore the index you thoughtfully created and use another index
- Use a key lookup to go back to the clustered index to filter specific values
General Uses
The most common uses that I see are:
- Indexing for soft deletes
- Indexing unique values and avoiding NULLs
- Indexing for hot data
- Indexing skewed data to create a targeted histogram
There are others, but one thing to consider when creating filtered indexes is how much of your data will be excluded by the definition.
If more than half of your data is going to end up in there, you may want to think hard about what you’re trying to accomplish.
Another potential use that I sometimes find for them is using the filter to avoid needing an overly-wide index key.
Let’s say you have a super important query that looks like this:
SELECT u.DisplayName, TotalScore = SUM(p.Score) FROM dbo.Posts AS p JOIN dbo.Users AS u ON p.OwnerUserId = u.Id WHERE p.PostTypeId = 1 AND p.AcceptedAnswerId > 0 AND p.ClosedDate IS NULL AND p.Score > @Score AND p.OwnerUserId = @OwnerUserId ORDER BY TotalScore DESC;
You’re looking at indexing for one join clause predicate and four additional where clause predicates. Do you really want five key columns in your index? No?
How about this?
CREATE INDEX p ON dbo.Posts (OwnerUserId, Score) INCLUDE (PostTypeId, AcceptedAnswerId, ClosedDate) WHERE (PostTypeId = 1 AND AcceptedAnswerId > 0 AND ClosedDate IS NULL) WITH (SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);
This will allow your query to pre-seek to the the literal value predicates, and then evaluate the parameterized predicates in the key of the index.
Crappy Limitations
There are lots of things you might want to filter an index on, like an expression. But you can’t.
I think one of the best possible use cases for a filtered index that is currently not possible is to isolate recent data. For example, I’d love to be able to create a filtered index like this:
CREATE INDEX u ON dbo.Users(DisplayName) WHERE CreationDate > DATEADD(DAY, -30, CONVERT(date, SYSDATETIME()));
So I could just just isolate data in the table that was added in the last 30 days. This would have a ton of applications!
But then the index would have to be self-updating, churning data in and out on its own.
For something like that, you’ll need a computed column. But even indexing those can be tricky, so we’ll talk about those tomorrow.
Thanks for reading!
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.
Instead of a truly sliding window filtered index that needs to update itself as rows expire from the 30 days, you could do it manually with a scheduled job. Imagine a daily job that ran this script:
“`
DROP INDEX IF EXISTS AT_LEAST_30_DAYS_TEMP ON dbo.USERS;
declare @SQL NVARCHAR(1000) = N’
CREATE INDEX AT_LEAST_30_DAYS_TEMP ON dbo.Users(DisplayName)
WHERE CreationDate > ”’ + CAST(DATEADD(DAY, -30, GETUTCDATE()) AS NVARCHAR(100)) + ”’;’;
print @SQL;
exec sp_executesql @SQL;
DROP INDEX IF EXISTS AT_LEAST_30_DAYS ON dbo.Users;
exec sp_rename N’dbo.Users.AT_LEAST_30_DAYS_TEMP’, N’AT_LEAST_30_DAYS’, N’INDEX’;
“`
The creation of the new index uses the previous index directly so it’s pretty efficient.
That is a cool idea! In the past I’ve added a bit column and updated or rebuilt based on that value.