Software Vendor Mistakes With SQL Server: Overly Complicated Triggers

Necessary Evils


Triggers can be quite useful to perform functions that cascading foreign keys are too simple for, but you can very easily overload them. I’ve run into cases where people had written what looked like an entire application worth of business logic into triggers.

Triggers that hit tables that fire other triggers that call stored procedures across servers in a while loop. You know. Developers 🐿

One very important thing to understand is that triggers always happen in a transaction, and will roll everything back unless you explicitly SET XACT_ABORT OFF; inside them. I’m not saying you should do that, at all; just that it’s an option.

Bail Reform


There are a few things you should do early on in your triggers to let them bail out as soon as possible.

  • Check if ROWCOUNT_BIG() = 0
  • Check if there are rows in the inserted pseudo-table
  • Check if there are rows in the deleted psuedo-table

You’ll wanna do the check against ROWCOUNT_BIG() before any SET statements, because they’ll reset the counter to 0.

DECLARE @i int;
SELECT @i = COUNT_BIG(*) FROM (SELECT x = 1) AS x;
PRINT ROWCOUNT_BIG();
SET NOCOUNT ON;
PRINT ROWCOUNT_BIG();

The first will print 1, the second will print 0. Though I suppose messing that up would be an interesting performance tuning bug for your triggers.

One bug I see in plenty of triggers, though…

Multiplicity


Make sure your triggers are set up to handle multiple rows. Triggers don’t fire per-row, unless your modifications occur for a single row. So like, if your modification query is run in a cursor or loop and updates based on a single unique value, then sure, your trigger will fire for each of those.

But if your modifications might hit multiple rows, then your trigger needs to be designed to handle them. And I don’t mean with a cursor or while loop. I mean by joining to the inserted or deleted pseudo-tables, depending on what your trigger needs to do.

Note that if your trigger is for an update or merge, you may need to check both inserted and deleted. Complicated things are complicated.

One more thing to ponder as we drift along through our trigger-writing extravaganza, is that we need to be careful where we OUTPUT rows to. If you return them to a table variable or directly to the client, you’ll end up with a fully single-threaded execution plan.

You’ll wanna dump them to a #temp table or a real table to avoid that, if your triggers are being asked to handle a deluge of rows. For smaller numbers of rows, you’re unlikely to notice that being an issue.

Know When To Say END;


The longer and more complicated your trigger becomes, the harder it will be to troubleshoot performance issues with it. Since triggers are “part” of whatever your modification queries do, you can end up with locks being taken and held for far longer than intended if there’s a lot of busy work done in them.

In much the same way Bloggers Of The World™ will warn you to index your foreign keys appropriately, you need to make sure that any actions performed in your triggers are appropriately indexed for, too. They’re not so different, in that regard.

Separating triggers into specific functions and duties can be helpful, but make sure that you set the correct order of execution, if you need them to happen in a specific order.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Software Vendor Mistakes With SQL Server: Using Integers Instead Of Big Integers For Identity Columns

Schemathematics


I’ve had far too many clients get stuck running out of identity values for a table that grew a lot faster than they thought it would. The change is an ugly one for many reasons.

Though you can setup a process to make the change easier, and some changes are available as metadata-only, most people either have way more complications involved (foreign keys, triggers, etc.) or aren’t practically set up to use the metadata-only solution by having compressed indexes, and all the whatnot outlined in the linked post.

I was hoping that vNext would take care of the change being painful, but so far I haven’t heard of anything.

The integer maximum is, of course, 2,147,483,647 (2147483647). The big integer maximum is, of course 9,223,372,036,854,775,807 (9223372036854775807). The big integer maximum is around 4294967298 times larger. That gives you a lot more runway before you run out.

Of course, it comes with a trade off: you’re storing 8 bytes instead of 4. But my favorite way of explaining why that’s worth it is this: by the time you notice that 4 extra bytes of storage space, you’re probably about to run out of integers anyway.

Masters Of My Domain Knowledge


You don’t need to do this for static lists, or for things with an otherwise limited population. For example, if you were going to make a table of every one in your country, you could still use an integer. Even in the most populous countries on earth, you could probably survive a while with an integer.

The problem comes when you start tracking many to one relations.

An easy thing to imagine is transactions, where each user will likely have many credits and debits. Or if you’re more keen on the Stack Overflow database, each user will ask many questions and post many answers.

Hopefully, anyway. In reality, most users ask one terrible question and never come back, even if their terrible questions gets a really good answer.

The point is that once enough users have some degree of frequent activity, that identity column/sequence object will start racking up some pretty high scores. Forget the last page contention issues, there are much easier ways of dealing with those. Your problem is hitting that integer wall.

Aside from using a big integer, you could fiddle with resetting the identity or sequence value to the negative max value, but that makes people queasy for an entirely different set of reasons.

Wizzed’em


Any table in your database that’s set to represent individual actions by your users should use a big integer as an identity value, assuming you’re going the surrogate key route that utilizes an identity column, or a column based on a sequence object.

If you use a regular integer, you’re asking for problems later. Choose the form of your destructor:

  • Recycling identity values
  • Changing to a big integer
  • Deforming your table to use a GUID
  • Let’s call the whole thing off

It’s not easy making data layer changes once things have grown to the point where you’re starting to hit hard limits and boundaries. Anything you can do to protect yourself from the get-go is a must.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Software Vendor Mistakes With SQL Server: Using MAX Datatypes Unnecessarily

Max For The Minimum


In an earlier post, we talked about how strings change the way SQL Server’s optimizer gives memory to queries under certain conditions. In that post, I talked about why MAX datatypes have the MAX problems.

In this post we’re going to look at a couple other issues with them:

  • You can’t put them in the key of an index
  • You can’t push predicated down to them

I know what you’re thinking, here. You’d never do that; you’re much smarter. But someday you might have to explain to someone all the reasons why they shouldn’t do that, and you might want some more in-depth reasons other than “it’s bad”.

Trust me, I have to explain this to people all the time, and I wish I had a few great resources for it.

Like these posts, I guess.

Maxamonium


First, we have have this Very Useful™ query.

SELECT c = COUNT_BIG(*) FROM dbo.Posts AS P WHERE P.Body LIKE N'SQL%';
SQL Server Query Plan
grouch

The plan stinks and it’s slow as all get out, so we try to create an index.

CREATE INDEX not_gonna_happen
    ON dbo.Posts(Body);

But SQL Server denies us, because the Body column is nvarchar(max).

Msg 1919, Level 16, State 1, Line 7
Column 'Body' in table 'dbo.Posts' is of a type that is invalid for use as a key column in an index.

Second Chances


Our backup idea is to create this index, which still won’t make things much better:

CREATE INDEX not_so_helpful
    ON dbo.Posts(Id) INCLUDE(Body);

MAX columns can be in the includes list, but includes aren’t very effective for searching, unless they’re part of a filtered index. Since we don’t know what people will search for, we can’t create an explicit filter on the index either.

SQL Server Query Plan
ehh no

Even with a smaller index to read from, we spend a full two minutes filtering data out, because searching for N'SQL%' in our where clause can’t be pushed to when we scan the index.

And Sensibility


Let’s contrast that with a similar index and search of a column that’s only nvarchar(150). Sure, it’s not gonna find the same things. I just want you to see the difference in the query plan and time when we’re not hitting a (max) column.

This isn’t gonna help you if  you genuinely do need to store data up to ~2GB in size in a single column, but it might help people who used a max length “just to be safe”.

CREATE INDEX different_world
    ON dbo.Posts(Id) INCLUDE(Title);

SELECT c = COUNT_BIG(*) FROM dbo.Posts AS P WHERE P.Title LIKE N'SQL%';
SQL Server Query Plan
helicopter team

But if you fumbled around and found out, you might be able to downsize your columns to a byte length that actually fits the data, and do a lot better performance-wise. This search only takes about 460 milliseconds, even if we scan the entire index.

You may not like it, but this is what better performance looks like.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Software Vendor Mistakes With SQL Server: Using Scalar UDFs In Computed Columns Or Check Constraints

Days Ahead


In yesterday’s post, I taught you about the good things that come from using computed columns. In today’s post, I want to show you something terrible that can happen if you put scalar UDFs in them. The same issues arise if you use scalar UDFs in check constraints, so you can apply anything you see here to those as well.

And this isn’t something that SQL Server 2019’s UDF inlining feature, FROID, can fix for you. At least as this writing and recording.

To make things quick and easy for you to digest, here’s a training video that’s normally part of my paid classes available for free.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Software Vendor Mistakes With SQL Server: Not Using Computed Columns

Forgery


One of the most common things I see when working with vendor apps are queries that need to filter or join on some expression that traditional indexing can’t make efficient.

B-tree indexes only organize the data as it currently exists. As soon as you perform a runtime manipulation on it in a join or where clause, SQL Server’s optimizer and storage engine have far fewer good choices to make.

This comes back to a couple rules I have when it comes to performance:

  • Anything that makes your job easier makes the optimizer’s job harder
  • Store data the way you query it, and query data the way you store it

Presentation Is Everything


SQL Server has many built in functions that help you easily manipulate data for presentation. It also lets you write a variety of user defined functions if the existing set don’t do exactly what you want, or you have different needs.

None of these functions have any relational meaning. The built in ones don’t generally have any additional side effects, but user defined functions (scalar and multi-statement) have many additional performance side effects that we’ll discuss later in the series.

This practice violates both of the above rules, because you did something out of convenience that manipulated data at runtime.

You will be punished accordingly.

Snakes


These are the situations you want to avoid:

  • function(column) = something
  • column + column = something
  • column + value = something
  • value + column = something
  • column = @something or @something IS NULL
  • column like ‘%something’
  • column = case when …
  • value = case when column…
  • Mismatching data types

For a lot of these things, though, you can use a computed column to materialize the expression you want to use. They’ve been around forever, and I still barely see anyone using them.

There are a lot of misconceptions around them, usually that:

  • They cause blocking when you add them (only sometimes)
  • You can’t index them unless you persist them (you totally can!)

Known


There are some interesting things you can do with computed columns to make queries that would otherwise have a tough time go way faster. To make it quick and easy for you to learn about them, I’m making videos from my paid training available here for you to watch.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Software Vendor Mistakes With SQL Server: Explicit And Implicit Transactions

Holler And Cuss


There’s a time and a place for everything, except scalar functions. In a lot of the locking and deadlocking issues I help clients with, developers either:

  • Didn’t understand the scope of their transaction
  • Didn’t need an explicit transaction to begin with (ha ha ha)
  • Didn’t realize how god derned awful Implicit Transactions can be

In this post, I’m gonna give you access to some more of my training videos about locking and blocking for free. Holiday spirit, or something.

There’s a bunch of stuff in there that’ll help you generally with these issues, and one that covers the topic of this post specifically. Enjoy!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Software Vendor Mistakes With SQL Server: Modifying Millions of Rows At A Time

But Why?


I often see people get stuck hard by this. Even worse, it happens when they’re using a merge statement, which are like throwing SQL Server a knuckleball.

It has no idea what it might have to do with your merge — insert? update? delete? — so it has to prepare a plan for any of them that you specify.

Just don’t use merge, okay? If you take one thing from this whole series: please don’t use merge.

Okay, anyway, back to the point: large modifications can suck in a few different ways.

Locking:

The whole time those big modifications are running, other queries are gonna get blocked. Even with NOLOCK/UNCOMMITTED hints, other modification queries can get stuck behind them. Wanna make users feel some pain? Have your app be unusable for big chunks of time because you refuse to chunk your modifications. Worse, if enough queries get backed up behind one of these monsters, you can end up running out of worker threads, which is an even worse performance issue.

Transaction Logging:

The more records you need to change, the more transaction logging you have to do. Even in simple recovery, you log the same amount of changes here (unless your insert gets minimal logging). A lot of people think simple recovery means less logging, but no, it just means that SQL Server manages the transaction for you. This’ll get worse as you add more indexes to the table, because change for each of them are logged separately.

Query Performance:

The modification part of any update or delete happens single-threaded. Other parts of the query plan might go parallel, but the actual modification portion can’t. Getting a few million rows ready on a bunch of threads simultaneously might be fast, but then actually doing the modification can be pretty slow. You have to gather all those threads down to a single one.

Lock Escalation:

It goes without saying that large modifications will want object-level locks. If there are incompatible locks, they may end up blocked. If they started by taking row or page locks, and tried to escalate to an object level lock but couldn’t, you could end up gobbling up a whole lot of your lock memory, which is a finite resource. Remember, there’s no escalation or transition between row and page locks. This is another place where having a lot of indexes hanging around can hurt.

Buffer Pool Pollution:

If you’re the type of person who isn’t regularly declutter your indexes, it’s likely that you have a bunch of indexes that either don’t get used anymore, only rarely get used, or are duplicative of other indexes defined on a table. Just like with transaction logging an lock escalation, the more indexes you have around, the more of them you need to read up into SQL Server’s buffer pool to modify them. SQL Server doesn’t work with pages on disk.

Double Dollars


How fast these queries run will be partially be dictated by:

  • How much memory you have
  • How fast your disks are
  • How fast your storage networking is

There are other factors too, like the type of data you’re changing. Changing MAX data types has way more overhead than more reasonable ones, or even shorter strings.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Software Vendor Mistakes With SQL Server: Not Using An Optimistic Isolation Level

The Champ Is Here


For developers who have worked with, or who write applications that work across multiple database platforms, working with SQL Server can often be a jarring experience.

Problems that you don’t face on products that use Multi-Version Concurrency Control by default are all of a sudden the bane of your existence.

But of course, for developers who only work with SQL Server, the innate fear of commitment is one of the first things they learn.

SQL Server has two options for this that I’ve blogged about before. Head over here if you need background.

A lot of the reasoning for not using an optimistic isolation level that I’ve heard from people over the years just doesn’t hold up well anymore.

Excusify


Let’s talk through some of them.

We’re worried about tempdb contention from the additional row versioning going up there

That’s fair, but if you don’t currently have tempdb contention, it’s unlikely that it’s going to body slam you unless your application currently has problems with:

  • Modifying huge chunks of data at once (not counting inserts)
  • Long running modification transactions
  • Implicit transactions that don’t always commit
  • Updating a lot of LOB data all the time

Most people don’t have such severe problems with that stuff that things go pear shaped from the version store. When tempdb is properly set up, it can take a pretty good beating before you see the type of contention that causes performance issues.

On top of that, the version store and other temporary objects don’t use exactly the same structures in tempdb.

  • In SQL Server 2019 and up where the local version store in use by Accelerated Database Recovery is used instead of tempdb
  • In all prior releases, the row versions are sent to special append-only storage units which aren’t governed or exposed by normal system views, nor do they generate transaction log record. Those units may contain many rows, and are cleaned up in at the unit level, not the row level.
  • Online index rebuilds use a separate version store from the one used by triggers, updates, deletes, MARS, etc.

We’re worried about tempdb growing out of control from the version store

Sure, you increase the risk surface area for long running modifications/transactions creating this situation, but either of those things would cause a pretty big headache if they happened without an optimistic isolation level enabled.

I’ve seen plenty of people end up there, do something goofy like kill the long running modification without looking at how much work it had done, and then get stuck in an even longer running rollback.

And guess what? Restarting SQL Server doesn’t fix it. Accelerated Database Recovery is helpful for that, if you’re on SQL Server 2019+, but that probably ain’t you.

The “good” news is that if tempdb runs out of space, SQL Server will start shrinking the version store.

We want our product to work out of the box, and SQL Server’s tempdb needs additional configuration to handle load

Boy, have I got news for you from 2016:

  • Trace flag 1117 is the default behavior
  • Trace flag 1118 is the default behavior
  • The setup GUI lets you create multiple data files
  • The setup GUI lets you turn on instant file initialization

If that setup doesn’t keep tempdb woes at bay from enabling an optimistic isolation level, it might be your fault.

We can’t test all our code to see if there will be race conditions with optimistic locking

I hate this, because it says to me that you don’t know your own code base well enough to know if you rely on locking to queue things, etc. It’s also a very lazy because…

I see you using gobs of NOLOCK hints everywhere. You’re not gonna get the right results for that type of code, anyway. Why be afraid of working with the last known good version of a row?

If you’re doing something like generating sequences, and you’re using a homegrown sequence table instead of the sequence objects that SQL Server has had since 2012, I question your sanity.

I mean, unless you’re using the code in the post I linked. Then you’re the coolest and smartest.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Software Vendor Mistakes With SQL Server: Mistaking NOLOCK For Concurrency

Corner Boppin’


I totally understand the Cult Of NOLOCK. Blocking sucks. Fixing it is annoying. Read Committed is a garbage isolation level.

Then along comes some beleaguered colleague who whispers the magic incantation into your ear, and suddenly those problems go away. A week later you have a keyboard shortcut that inserts WITH (NOLOCK) programmed into SSMS and your muscle memory.

You’ve probably already read a thousand and one blog posts about what a bad idea NOLOCK is, too. But you’ve just never had a problem with it, whether it’s dirty reads or errors.

And besides, no one actually knows if data is correct anyway. Have you seen data? Quite a mess. If it looks weird, just run it again.

You read something scary about optimistic isolation levels twenty years ago. Why change now?

Try Harder


Let’s look at where all your NOLOCK-ing won’t save you.

CREATE TABLE
    dbo.beavis
(
    id int PRIMARY KEY, 
    heh datetime
);

CREATE TABLE
    dbo.butthead
(
    id int PRIMARY KEY, 
    huh datetime
);

INSERT 
    dbo.beavis
    (id, heh)
VALUES
    (1, GETDATE());

INSERT
    dbo.butthead
    (id, huh)
SELECT
    b.id, 
    b.heh
FROM dbo.beavis AS b;

Here’s the most common deadlock scenario I see:

/*Player one*/
BEGIN TRAN

    SET TRANSACTION ISOLATION LEVEL
        READ UNCOMMITTED;
    
    UPDATE b
        SET b.heh = GETDATE()
    FROM dbo.beavis AS b WITH(NOLOCK);
    
    /*Stop running here until you run 
      the other session code, then come 
      back and run the next update*/
    UPDATE b
        SET b.huh = GETDATE()
    FROM dbo.butthead AS b WITH(NOLOCK);

ROLLBACK;

And then:

/*Session two*/
BEGIN TRAN

    SET TRANSACTION ISOLATION LEVEL
        READ UNCOMMITTED;
    
    UPDATE b
        SET b.huh = GETDATE()
    FROM dbo.butthead AS b WITH(NOLOCK);
    
    UPDATE b
        SET b.heh = GETDATE()
    FROM dbo.beavis AS b WITH(NOLOCK);
    /*Stop running here*/

ROLLBACK;

Cautious Now


If you go look at the deadlocks that get produced by these queries, you’ll see something that looks like this:

SQL Server Locking Deadlocks
pies

Despite the isolation level being read uncommitted and us festooning NOLOCK hints all about the place, we still end up with deadlocks.

Using these hints doesn’t always help with concurrency issues, and this goes for many other situations where locking and blocking has to occur.

At best, your select queries will be able to read dirty data rather than get blocked. I’m way more in favor of using an optimistic isolation level, like Read Committed Snapshot Isolation. All NOLOCK really means is that your query doesn’t respect locks taken by other queries.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.