What Does And Doesn’t Belong In A SQL Server Data Warehouse

No Way, No How

This is a list of things I see in data warehouses that make me physically ill:

  • Unique constraints of any kind: Primary Keys, Indexes, etc. Make things unique during your staging process. Don’t make your indexes do that work.
  • Foreign Keys: Referential integrity should be guaranteed from your data source. If it can’t be, there’s no sense in making it happen in your data warehouse. Foreign Keys in SQL Server suck anyway, and slow the hell out of large data loads.
  • Clustered row store indexes: At this point in time, when you need a clustered index, it ought to be a clustered column store index.
  • Tables with “lots” of nonclustered row store indexes: They’ll only slow down your load times a whole bunch. Replace them with nonclustered column store indexes.
  • Indexed views: This isn’t 2012 anymore. Column store, column store, column store.
  • Standard Edition: The CPU limit of 24 cores is probably fine, but the buffer pool cap of 128GB and strict limitations on column store/batch mode are horrendous.

I know what you’re thinking looking at this list: I can drop and re-create things like unique constraints, foreign keys, and nonclustered indexes. You sure can, but you’re wasting a ton of time.

Data warehouses have a completely different set of needs from transaction systems. The sooner you stop treating data warehouses like AdventureWorks, the better.

That’s all.

Thanks for reading!

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

5 thoughts on “What Does And Doesn’t Belong In A SQL Server Data Warehouse

  1. Hi Erik,

    In Brent Ozars course about Column store indexes he talks about when CCI is appropriate and not, and he has a very nice site, columnscore.com, that can be used to do some evaluation of tables and their readiness for CCI. Are you saying that if it is a DW we should just use CCI no matter what?

    Two of my favourite SQL-people seemingly not agreeing? (Probably just me misunderstanding, I know 🙂

    Do you have a suggestion for where to learn more about the latest best practices when it comes to datawarehouse setup in SQL Server?


    1. Flat out yes, you should be using column store in a data warehouse. Looking at the questions there — if any aren’t true of your data warehouse — you probably don’t have a true data warehouse.

      Even for the insert question, trickle inserts can be handled gracefully.

      I don’t have any suggestion on materials though — I haven’t come across any.

Comments are closed.