Buffet Tables
If you’ve ever spent any time reading about relational database theory, you’ve probably read all sorts of things about normal forms and all that good stuff. You’ve also probably immediately forgotten all of it.
At least that’s what I’d guess is what happened based on what I see when I look at database designs out there. Tables are often dozens of columns wide, with all sorts of things in there that should be in their own table.
- Columns with a common prefix, like Customer_Name, Customer_Address, etc.
- Columns with a common suffix, like PhoneNumber_1, PhoneNumber_2, etc.
Even fairly-well designed tables often get somewhat overloaded with information, especially when years of additional development requirements lead to a choice between adding one more column to a table and normalizing several column out to a new table.
Sound familiar?
Let’s move on.
Indexing
The more columns you have in a table, the more potential column combinations there are for indexes. Much like columns, indexes tend to get added following the path of least resistance.
Very rarely does someone consider current indexes when deciding to add an index. They might be doing something like:
- Evaluating missing index requests in a query plan
- Evaluating missing index requests in the DMVs
- Running DTA like a darn fool and saying yes to everything
I’ll often run into tables that have indexes like these:
CREATE INDEX p1 ON dbo.Posts(OwnerUserId); CREATE INDEX p2 ON dbo.Posts(OwnerUserId, PostTypeId); CREATE INDEX p3 ON dbo.Posts(OwnerUserId, CreationDate, LastActivityDate); CREATE INDEX p4 ON dbo.Posts(OwnerUserId, PostTypeId, CreationDate); CREATE INDEX p5 ON dbo.Posts(OwnerUserId, PostTypeId, Score); CREATE INDEX p6 ON dbo.Posts(CreationDate, LastActivityDate, Score);
I’m leaving off includes here, because you can combine those in any order without disturbing much.
Key columns are a different story, of course. Order matters quite a bit, because via the magic of co-dependency key column order defines how queries can efficiently access data.
Merging
Looking at this list of indexes, we know a couple things immediately:
- The index on just OwnerUserId isn’t of any obvious value, and we can get rid of it
- The index that leads on CreationDate serves a totally different type of query, and we should leave it alone for now
Assuming all of these indexes have a not-crappy read to write ratio, what do we do next?
- Make sure none of them are unique
- Make sure none of them are filtered
These indexes all have multiple key columns that sort of line up:
CREATE INDEX p2 ON dbo.Posts(OwnerUserId, PostTypeId); CREATE INDEX p3 ON dbo.Posts(OwnerUserId, CreationDate, LastActivityDate); CREATE INDEX p4 ON dbo.Posts(OwnerUserId, PostTypeId, CreationDate); CREATE INDEX p5 ON dbo.Posts(OwnerUserId, PostTypeId, Score);
Of these, three of them are on OwnerUserId and PostTypeId, and one isn’t. We can throw that one out.
CREATE INDEX p2 ON dbo.Posts(OwnerUserId, PostTypeId); CREATE INDEX p4 ON dbo.Posts(OwnerUserId, PostTypeId, CreationDate); CREATE INDEX p5 ON dbo.Posts(OwnerUserId, PostTypeId, Score);
Okay, now what? Just looking at the definitions here, we can safely get rid of the first index, because any query that uses it can safely use the other two indexes.
But for the second two, we can’t do much. It’s not necessarily safe to add the third key column in either one as a fourth key column.
We have to look at how they’re used, first. If one index is unused or rarely used, we could make that change.
Things might get more difficult to figure out if we have another index like this:
CREATE INDEX p4 ON dbo.Posts(OwnerUserId, PostTypeId, CreationDate); CREATE INDEX p5 ON dbo.Posts(OwnerUserId, PostTypeId, Score); CREATE INDEX p7 ON dbo.Posts(OwnerUserId, PostTypeId, CreationDate, LastActivityDate);
Sure, it means we could get rid of the index that ends with CreationDate, but it makes the picture a little less clear about how we should deal with Score.
Ending
To be safe, I’d probably stop after just getting rid of the ones where the key columns are a match as far as order goes, and are a subset of wider indexes.
CREATE INDEX p1 ON dbo.Posts(OwnerUserId); --Drop this one CREATE INDEX p2 ON dbo.Posts(OwnerUserId, PostTypeId); --Drop this one CREATE INDEX p3 ON dbo.Posts(OwnerUserId, CreationDate, LastActivityDate); --Keep this one CREATE INDEX p4 ON dbo.Posts(OwnerUserId, PostTypeId, CreationDate); --Drop this one CREATE INDEX p5 ON dbo.Posts(OwnerUserId, PostTypeId, Score); --Keep this one CREATE INDEX p6 ON dbo.Posts(CreationDate, LastActivityDate, Score); --Keep this one CREATE INDEX p7 ON dbo.Posts(OwnerUserId, PostTypeId, CreationDate, LastActivityDate); --Keep this one
After that, I’d probably look at usage stats, to see if any of them just aren’t holding up their end of the bargain.
Thanks for reading!
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.