Rule Of Nerds
When you’re designing the back end of a database, people will have all sorts of ideas.
Normalization, partitioning, referential integrity, and, usually, trying to figure out what to do when you have more than one client.
If your application is user-focused (like Stack Overflow), you don’t have to struggle too much with the idea of isolation. But when your application is geared more towards supporting multiple entities that have one or more users, things change. Sort of like how Stack Overflow manages all the other Stack Network sites.
Were you to ask me which model I prefer, it would be every tenant getting their own database. Your other options are:
- Everyone all mixed in together like gen-pop
- Using separate schemas inside a single database
It’s hard to see any real upside to those propositions unless you’ve poorly chosen a cloud solution with limitations placed on number of databases. Yes, I’m looking at you.
The problems you run into with everyone mixed in together are just far too grand, especially if the concept of permissions comes into play within a tenant.
If a single user can belong to multiple tenants, forget it. You couldn’t get screwed harder if you bought a time share in Atlantis.
Embrace The Suck
Since we’ve discarded the lunchroom in Oz approach, let’s talk a little but about why the schema-per-tenant approach doesn’t work.
It doesn’t buy you anything from a maintenance perspective without dumping a whole lot of awful complexity in your lap. Worse, there’s no good tooling available to analyze anything in your database schema-by-schema.
Is your client in schema1138 unhappy? Okay, try to find out what’s going on with them in the plan cache or Query Store. I’ll wait. Look at how their indexes are used. I’ll be here. Wait stats? Blocking? Deadlocks? Hey, I’m comfy. Take your time.
Every client I have who started with this pattern has inevitably built a tool to move bigger tenants out to their own database to isolate them and figure out their problems, or move them to their own SQL Server instance. You will, too, if you choose… poorly.
While I hate SQL Server’s plan cache for finding performance problems, we are totally stuck with it as part of the query execution process. As much as I wish Microsoft would replace it with Query Store, they’re too busy trying to find ways to squeeze cloud customers for all they’re worth without the benefit of a lubricant to work on anything practically useful to SQL Server users.
So you end up with the same problem as you do with the database-per-tenant approach. Your plan cache has limitations on size and number of plans. When the same query executes from a different schema or database, a new entry in the plan cache is made.
This leads to a lot of plan duplication in there, and a lot of plans getting flushed out when those limitations are hit. The difference here is that it’s a whole lot easier to figure out what went raunchy for a specific database using Query Store than it is to figure out what’s gone sour in a specific schema in a specific database using… anything. Anything at all.
Perhaps thoughts and prayers are in order.
Rebalancing
I’ve run into far too many clients struggling to keep the lights on with the all-in-one approach, and the schema-per-tenant approach. Especially with the all-in-one approach, you end up with something like a TenantId column in every table, and in every index, sometimes even filtered down to specific problem-tenants. It’s kind of awful to watch.
This is true of many different areas, whether it’s scalability, security/compliance, performance, RPO and RTO, and of course costs.
Some readers may call out things like log backups, and Availability Groups. Sure, I get that! It might be hard to get a log backup job to run across a bunch of databases every RPO-minutes. It’s also trivial to put groups of databases into multiple log backup jobs so that you don’t have to worry about that.
For Availability Groups, I’ve seen more than a few cases where they were falling behind because there were hundreds of schemas in a database all making lots of tiny changes. Keeping lots of databases synchronized is certainly more responsibility, but no less in danger of things falling dismally behind.
Let’s say you have 400 databases. That may take way more worker threads to keep up, but at least they each get worker threads to do that business. A single database with 400 schema in it doesn’t magically get a bunch of additional worker threads to keep data moving, and it’s foolish to think otherwise.
When you start with the more sensible approach — database-per-tenant — you give yourself many more options. You can move databases to new server much more easily, and you can have different databases on different tiers of service way more easily. Think Standard vs. Enterprise Edition, SLAs, RPO, RTO, HA and DR, etc.
And of course, you can charge extra for those additional services. Don’t worry, after a decade in the cloud, everyone is used to getting their pockets shaken for every additional nicety.
Thanks for reading!
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.