ISNULL, COALESCE, And Performance In SQL Server Queries

ANSI Blandard


Sometimes there are very good reasons to use either coalesce or isnull, owing to them having different capabilities, behaviors, and support across databases.

But isnull has some particular capabilities that are interesting, despite its limitations: only two arguments, specific to SQL Server, and uh… well, we can’t always get three reasons, as a wise man once said.

There is one thing that makes isnull interesting in certain scenarios. Let’s look at a couple.

Green Easy


First, we’re gonna need an index.

CREATE INDEX party 
ON dbo.Votes
    (CreationDate, VoteTypeId) 
INCLUDE
    (UserId);

Yep, we’ve got an index. Survival of the fittest.

Here are some queries to go along with it. Wouldn’t want our index getting lonely, I suppose.

All that disk entropy is probably scary enough.

SELECT TOP (10) 
    u.DisplayName
FROM dbo.Users AS u 
WHERE NOT EXISTS
( 
    SELECT 
        1/0 
    FROM dbo.Votes AS v 
    WHERE v.UserId = u.Id 
    AND v.VoteTypeId IN (1, 2, 3)
    AND ISNULL(v.CreationDate, '19000101') > '20131201' 
)
ORDER BY u.CreationDate DESC;

SELECT TOP (10) 
    u.DisplayName
FROM dbo.Users AS u 
WHERE NOT EXISTS
( 
    SELECT 
        1/0 
    FROM dbo.Votes AS v 
    WHERE v.UserId = u.Id 
    AND v.VoteTypeId IN (1, 2, 3)
    AND COALESCE(v.CreationDate, '19000101') > '20131201' 
)
ORDER BY u.CreationDate DESC;

Pager Back


The first query uses isnull, and the second query uses coalesce. Just in case that wasn’t obvious.

I know, I know — I’ve spent a long time over here telling you not to use isnull in your where clause, lest ye suffer the greatest shame to exist, short of re-gifting to the original gift giver.

Usually, when you wrap a column in a function like that, bad things happen. Seeks turn into Scans, wine turns into water, spaces turn into tabs, the face you remember from last call turns into a November Jack O’Lantern.

But in this case, the column wrapped in our where clause, which is the leading column of the index, is not nullable.

SQL Server’s optimizer, having its act together, can figure this out and produce an Index Seek plan.

SQL Server Query Plan
you’ll live forever

The null check is discarded, and end up with a Seek to the CreationDate values we care about, and a Residual Predicate on VoteTypeId.

SQL Server Query Plan Tool Tip
goodness

Big Famous


The second query, the one that uses coalesce, has a few things different about it. Let’s cut to the plan.

SQL Server Query Plan
butcher

Rather than 157ms, this query runs for over a minute by five seconds. All of the time is spent in the Top > Index Scan. We no longer get an Index Seek, either.

SQL Server Query Plan Tool Tip
burdened

Notice that the predicate on CreationDate is a full-on case expression, checking for null-ness. This could be an okay scenario if we had something to Seek to, but without proper indexing and properly written queries, it’s el disastero.

The reason that the query changes is due to the optimizer deciding that a row goal would make things better. This is why we have a Nested Loops Join, and the Top > Index Scan. It doesn’t work out very well.

This isn’t the only time you might see this, but it’s probably the worst.

SUBTLERY


You can also see this with a pattern I often advocate against, using a Left Join to find rows that don’t exist:

SELECT TOP (10) 
    u.DisplayName
FROM dbo.Users AS u
LEFT JOIN dbo.Votes AS v
    ON  v.UserId = u.Id
    AND v.VoteTypeId IN (1, 2, 3)
    AND ISNULL(v.CreationDate, '19000101') > '20131201'  
WHERE v.Id IS NULL
ORDER BY u.CreationDate DESC;

SELECT TOP (10) 
    u.DisplayName
FROM dbo.Users AS u
LEFT JOIN dbo.Votes AS v
    ON  v.UserId = u.Id
    AND v.VoteTypeId IN (1, 2, 3)
    AND COALESCE(v.CreationDate, '19000101') > '20131201'  
WHERE v.Id IS NULL
ORDER BY u.CreationDate DESC;

It’s not as bad here, but it’s still noticeable.

The plan with isnull looks about like so:

SQL Server Query Plan
you’re fast

At 163ms, there’s not a lot to complain about here.

The coalesce version does far worst, at just about 1.5 seconds.

SQL Server Query Plan
gasp!

We Learned Some Things


In SQL Server, using functions in where clauses is generally on the naughty list. In a narrow case, using the built-in isnull function results in better performance than coalesce on columns that are not nullable.

This pattern should generally be avoided, of course. On columns that are nullable, things can really go sideways in either case. Of course, this matters most when the function results in an otherwise possible Index Seek is impossible, and we can only use an Index Scan to find rows.

An additional consideration is when we can Seek to a very selective set of rows first. Say we can get things down to (for the purposes of explanation only) around 1000 rows with a predicate like Score > 10000.

For the remaining 1000 rows, it’s not likely that an additional Predicate like the ones we saw today would have added any drama to the execution time of a relatively simple query.

They may, however, lead to poor cardinality estimates in more complicated queries.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Recompile And Nested Procedures In SQL Server

Rock Sale


While I was answering a question, I had to revisit what happens when using different flavors of recompile hints with stored procedure when they call inner stored procedures. I like when this happens, because there are so many little details I forget.

Anyway, the TL;DR is that if you have nested stored procedures, recompiling only recompiles the outer one. The inner procedures — really, I should say modules, because it includes other objects that compile query plans — but hey. Now you know what I should have said.

If you want to play around with the tests, you’ll need to grab sp_BlitzCache. I’m too lazy to write plan cache queries from scratch.

Testament


The procedures:

CREATE OR ALTER PROCEDURE dbo.inner_sp
AS
BEGIN

    SELECT
        COUNT_BIG(*) AS records
    FROM sys.master_files AS mf;
END;
GO 

CREATE OR ALTER PROCEDURE dbo.outer_sp
--WITH RECOMPILE /*toggle this to see different behavior*/
AS
BEGIN

    SELECT 
        COUNT_BIG(*) AS records
    FROM sys.databases AS d;
    
    EXEC dbo.inner_sp;

END;
GO

The tests:

--It's helpful to run this before each test to clear out clutter
DBCC FREEPROCCACHE;

--Look at this with and without 
--WITH RECOMPILE in the procedure definition
EXEC dbo.outer_sp;

--Take out the proc-level recompile and run this
EXEC dbo.outer_sp WITH RECOMPILE;

--Take out the proc-level recompile and run this
EXEC sp_recompile 'dbo.outer_sp';
EXEC dbo.outer_sp;

--You should run these between each test to verify behavior
--If you just run them here at the end, you'll be disappointed
EXEC sp_BlitzCache 
    @DatabaseName = 'Crap', 
    @QueryFilter = 'procedure', 
    @SkipAnalysis = 1, 
    @HideSummary = 1;

EXEC sp_BlitzCache 
    @DatabaseName = 'Crap', 
    @QueryFilter = 'statement', 
    @SkipAnalysis = 1, 
    @HideSummary = 1;

Whatchalookinat?


After each of these where a recompile is applied, you should see the inner proc/statement in the BlitzCache results, but not the outer procedure.

It’s important to understand behavior like this, because recompile hints are most often used to help investigate parameter sniffing issues. If it’s taking place in nested stored procedure calls, you may find yourself with a bunch of extra work to do or needing to re-focus your use of recompile hints.

Of course, this is why I much prefer option recompile hints on problem statements. You get much more reliable behavior.

And, as Paul writes:

For instances running at least SQL Server 2008 build 2746 (Service Pack 1 with Cumulative Update 5), using OPTION (RECOMPILE) has another significant advantage over WITH RECOMPILE: Only OPTION (RECOMPILE) enables the Parameter Embedding Optimization.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

If You’re Using Under 20% CPU, You’re Wasting Money On SQL Server Licensing

Sensational


CPUs aren’t normally expensive, but ask them to run a query and the cost skyrockets. It doesn’t matter where you are in the world or in the cloud, prepare to get gouged.

You’d think SQL Server was an Hermes purse.

Life is a bit worse in the cloud, where CPU count and RAM amount are inexorably tied together. Sure, Azure offers constrained vCPU instances that help with that, but still.

Money is expensive, and it never goes on sale.

Slacker


If your CPU load stays consistently under 20%, and you don’t need a bunch of cushion for regularly scheduled tasks like ETL or ELT or LET me write a sentence before changing the acronym, then what’s the point of all those CPUs?

I’m not saying they have to run at 100% and you should be using them to air fry your Hot Pockets, but what’s wrong with running at 40-60%? That still leaves you a good bit of free ticks and cycles in case a parameter gets poorly sniffed or some other developer disaster befalls your server.

When a workload is pretty well-tuned, you should focus on right-sizing hardware rather than staring at your now-oversized hardware.

Bragging Rights


It’s often quite an achievement to say that you tuned this and that and got CPU down from 80% to 20%, but now what?

Can you give back some cores to the VM host? Consolidate workloads? Move to a smaller instance size?

Fashion models are great, but they’re not great role models for servers. CPUs should not be sitting around being expensive and bored.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Implicit Transactions: Why Unrelated Queries Block Each Other In SQL Server

A Bit Sensational


I don’t really mean that unrelated queries block each other, but it sure does look like they do.

Implicit Transactions are really horrible surprises, but are unfortunately common to see in applications that use JDBC drivers to connect to SQL Server, and especially with applications that are capable of using other database platforms like Oracle as a back-end.

The good news is that in the latter case, support for using Read Committed Snapshot Isolation (RCSI) is there to alleviate a lot of your problems.

Problem, Child


Let’s start with what the problem looks like.

sp_WhoIsActive Blocking
awkward

A couple unrelated select queries are blocking each other. One grabbing rows from the Users table, one grabbing rows from the Posts table.

This shouldn’t be!

Even if you run sp_WhoIsActive in a way to try to capture more of the command text than is normally shown, the problem won’t be obvious.

sp_WhoIsActive 
    @get_locks = 1;
GO 

sp_WhoIsActive 
    @get_locks = 1,
    @get_full_inner_text = 1,
    @get_outer_command = 1;
GO

What Are Locks?


If we look at the details of the locks column from the output above, we’ll see the select query has locks on Posts:

<Database name="StackOverflow2013">
  <Locks>
    <Lock request_mode="S" request_status="GRANT" request_count="1" />
  </Locks>
  <Objects>
    <Object name="Posts" schema_name="dbo">
      <Locks>
        <Lock resource_type="KEY" index_name="PK_Posts_Id" request_mode="X" request_status="GRANT" request_count="1" />
        <Lock resource_type="OBJECT" request_mode="IX" request_status="GRANT" request_count="1" />
        <Lock resource_type="PAGE" page_type="*" index_name="PK_Posts_Id" request_mode="IX" request_status="GRANT" request_count="1" />
      </Locks>
    </Object>
  </Objects>
</Database>

We may also note something else quite curious about the output. The select from Users is sleeping, with an open transaction.

sp_WhoIsActive Blocking
melatonin

How It Happens


The easiest way to show you is with plain SQL commands, but often this is a side effect of your application connection string.

In one window, step through this:

--Run this and stop
SET IMPLICIT_TRANSACTIONS ON;

--Run this and stop
UPDATE p
    SET p.ClosedDate = SYSDATETIME()
FROM dbo.Posts AS p
WHERE p.Id = 11227809;

--Run this and stop
SELECT TOP (10000)
    u.*
FROM dbo.Users AS u
WHERE u.Reputation = 2
ORDER BY u.Reputation DESC;

--Don't run these last two until you look at sp_WhoIsActive
IF @@TRANCOUNT > 0 ROLLBACK;

SET IMPLICIT_TRANSACTIONS OFF;

In another window, run this:

--Run this and stop
SET IMPLICIT_TRANSACTIONS ON;

--Run this and stop
SELECT TOP (100)
    p.*
FROM dbo.Posts AS p
WHERE p.ParentId = 0
ORDER BY p.Score DESC;

--Don't run these last two until you look at sp_WhoIsActive
IF @@TRANCOUNT > 0 ROLLBACK;

SET IMPLICIT_TRANSACTIONS OFF;

How To Fix It


Optimistically:

If you’re using implicit transactions, and queries execute together, you won’t always see the full batch text. At best, the application will be written so that queries using implicit transactions will close out immediately. At worst, there will be some bug, or some weird connection pooling going on so that sessions never actually commit and release their locks.

Fortunately, using an optimistic isolation level alleviates the issue, since readers and writers don’t block each other. RCSI is the easiest for this situation usually, because Snapshot Isolation (SI) requires queries to request it specifically.

Of course, if you’re issuing other locking hints at the query level already that enforce more strict isolation levels, like READCOMMITEDLOCK, HOLDLOCK/SERIALIZABLE, or REPEATABLE READ, RCSI won’t help. It will be overruled, unfortunately.

Programmatically:

You could very well be using this in your connection string by accident. If you have control over this sort of thing, change the gosh darn code to stop using it. You probably don’t need to be doing this, anyway. If for some reason you do require it, you probably need to dig a bit deeper in a few ways:

Going a little deeper, there could also be some issues with indexes, or the queries that are modifying data that are contributing to excess locking. Again, RCSI is a quick fix, and changing the connection string is a good idea if you can do it, but don’t ignore these long-term.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Tweaking Change Tracking To Get Auditing Information In SQL Server

I Apologize


I’ve been working with CDC and CT way too much, and even I’m annoyed with how much it’s coming out in blog posts.

I’m going to cover a lot of ground quickly here. If you get lost, or there’s something you don’t understand, your best bet is to reference the documentation to get caught up.

The idea of this post is to show a few different concepts at once, with one final goal:

  • How to see if a specific column was updated
  • How to get just the current change information about a table
  • How to add information about user and time modified
  • How to add information about which procedure did the modification

The first thing we’re gonna do is get set up.

ALTER DATABASE Crap
    SET CHANGE_TRACKING = ON;
GO 

USE Crap;
GO

--Bye Bye
DROP TABLE IF EXISTS dbo.user_perms;

--Oh, hello
CREATE TABLE dbo.user_perms
(
    permid int,
    userid int,
    permission varchar(20),
    CONSTRAINT pu PRIMARY KEY CLUSTERED (permid)
);

--Turn on CT first this time
ALTER TABLE dbo.user_perms
    ENABLE CHANGE_TRACKING  
    WITH (TRACK_COLUMNS_UPDATED = ON);

With that done, let’s stick some rows in the table, and see what we have for changes.

--Insert some rows
INSERT
    dbo.user_perms(permid, userid, permission)
SELECT
    x.c,
    x.c,
    CASE WHEN x.c % 2 = 0 
         THEN 'db_datareader'
         ELSE 'sa'
    END
FROM 
(
    VALUES (1),(2),(3),(4),(5),
           (6),(7),(8),(9),(10)
) AS x(c);

--What's in the table?
SELECT 
    cc.*
FROM dbo.user_perms AS cc

--What's in Change Tracking?
SELECT 
    ct.*
FROM CHANGETABLE(CHANGES dbo.user_perms, 0) AS ct;

This is what the output look like:

SQL Server Change Tracking
would you do it again?

What Changed?


To find this out, we need to look at CHANGE_TRACKING_IS_COLUMN_IN_MASK, but there’s a caveat about this: it only shows up if you look at specific versions of the changed data. If you just pass null or zero into CHANGETABLE, you won’t see that.

Let’s update and check on some things.

--Update one row, change it to 'sa'
UPDATE up
  SET up.permission = 'sa'
FROM dbo.user_perms AS up
WHERE up.permid = 2;

--No update?
SELECT 
    ct.*
FROM CHANGETABLE(CHANGES dbo.user_perms, NULL) AS ct;

--No update?
SELECT 
    ct.*
FROM CHANGETABLE(CHANGES dbo.user_perms, 0) AS ct;

We updated a row, but if we look at the wrong version, we won’t see the change:

SQL Server Change Tracking
hrm

But we will see that permid = 2 has a different change version than the rest of the rows.

Zooming In


If we want to see the version of things with our change, we need to figure out the current version change tracking has stored. This is database-level, but it’s also… wrong.

But to start us off, we query CHANGE_TRACKING_CURRENT_VERSION.

The max version always seems to be one version too high. Doing this will return nothing from CHANGETABLE:

--No update?
DECLARE @ctcv bigint = (SELECT CHANGE_TRACKING_CURRENT_VERSION());
SELECT  @ctcv AS CHANGE_TRACKING_CURRENT_VERSION;

SELECT 
    ct.*
FROM CHANGETABLE(CHANGES dbo.user_perms, @ctcv) AS ct;
GO

To get data back that we care about, we need to do advanced maths and subtract 1 from the current version.

--No update?
DECLARE @ctcv bigint = 
    (SELECT CHANGE_TRACKING_CURRENT_VERSION() -1);
SELECT  @ctcv AS CHANGE_TRACKING_CURRENT_VERSION;

SELECT 
    ct.*,
    CHANGE_TRACKING_IS_COLUMN_IN_MASK
    (COLUMNPROPERTY(OBJECT_ID('dbo.user_perms'), 
                    'permission', 
                    'ColumnId'), 
                    ct.SYS_CHANGE_COLUMNS) AS is_permission_column_in_mask
FROM CHANGETABLE(CHANGES dbo.user_perms, @ctcv) AS ct;
SQL Server Change Tracking
family ties

This is also when we need to bring in CHANGE_TRACKING_IS_COLUMN_IN_MASK to figure out if the permission column was updated.

Audacious


The next thing we’ll wanna do is tie an update to a particular user. We can do that with CHANGE_TRACKING_CONTEXT.

As a first example, let’s just get someone’s user name when they run an update.

--Update one row, change it to 'sa'
DECLARE @ctc varbinary(128) = 
    (SELECT CONVERT(varbinary(128), SUSER_NAME()));

WITH CHANGE_TRACKING_CONTEXT (@ctc)
UPDATE up
  SET up.permission = 'sa'
FROM dbo.user_perms AS up
WHERE up.permid = 4;

To validate things a little, we’re gonna join the results of the CHANGETABLE function to the base table.

SELECT 
    up.permid, 
    up.userid, 
    up.permission,
    ct.SYS_CHANGE_VERSION, 
    ct.SYS_CHANGE_OPERATION,
    CONVERT(sysname, ct.SYS_CHANGE_CONTEXT) as 'SYS_CHANGE_CONTEXT'
FROM CHANGETABLE (CHANGES dbo.user_perms, 0) AS ct
LEFT OUTER JOIN dbo.user_perms AS up
    ON ct.permid = up.permid;
SQL Server Change Tracking
what is this.

We get sa back from SYS_CHANGE_CONTEXT, so that’s nice. But we’re doing this a bit naively, because the SYS_CHANGE_OPERATION column is telling us that sa inserted that row.

That’s… technically wrong. But it’s wrong because we’re looking at the wrong version. This solution definitely isn’t perfect unless you really only dig into the most recent rows. Otherwise, you might blame the wrong person for the wrong thing.

Let’s add a little more information in, and get the right row out. We’re gonna go a step further and add a date to the context information.

--Update one row, change it to 'sa'
DECLARE @ctc varbinary(128) = 
    (SELECT CONVERT(varbinary(128), 
                        'User: ' +
                        SUSER_NAME() +
                        ' @ ' + 
                        CONVERT(varchar(30), SYSDATETIME())) );

WITH CHANGE_TRACKING_CONTEXT (@ctc)
UPDATE up
  SET up.permission = 'sa'
FROM dbo.user_perms AS up
WHERE up.permid = 6;


DECLARE @ctcv bigint = 
    (SELECT CHANGE_TRACKING_CURRENT_VERSION() -1);
SELECT  @ctcv AS CHANGE_TRACKING_CURRENT_VERSION;

SELECT 
    up.permid, 
    up.userid, 
    up.permission,
    ct.SYS_CHANGE_VERSION,
    ct.SYS_CHANGE_CREATION_VERSION,
    ct.SYS_CHANGE_OPERATION,
    ct.SYS_CHANGE_COLUMNS,
    CONVERT(sysname, ct.SYS_CHANGE_CONTEXT) as 'SYS_CHANGE_CONTEXT'
FROM CHANGETABLE (CHANGES dbo.user_perms, @ctcv) AS ct
LEFT OUTER JOIN dbo.user_perms AS up
    ON ct.permid = up.permid;
GO
SQL Server Change Tracking
talk is cheap

Cool! Now we have the right operation, and some auditing information correctly associated with it.

Proc-nosis Negative


To add in one more piece of information, and tie things all together, we’re going to add in a stored procedure name. You could do this with a string if you wanted to identify app code, too. Just add something in like Query: dimPerson or whatever you want to call the section of code generated the update.

CREATE OR ALTER PROCEDURE dbo.ErikIsSickOfChangeTracking
AS
BEGIN
SET NOCOUNT, XACT_ABORT ON;

DECLARE @ctc varbinary(128) = 
    (SELECT CONVERT(varbinary(128), 
                        'User: ' +
                        SUSER_NAME() +
                        ' @ ' + 
                        CONVERT(varchar(30), SYSDATETIME()) +
                        ' with ' +
                        OBJECT_NAME(@@PROCID)) );

WITH CHANGE_TRACKING_CONTEXT (@ctc)
UPDATE up
  SET up.permission = 'sa'
FROM dbo.user_perms AS up
WHERE up.permid = 8;

DECLARE @ctcv bigint = 
    (SELECT CHANGE_TRACKING_CURRENT_VERSION() -1);
SELECT  @ctcv AS CHANGE_TRACKING_CURRENT_VERSION;

SELECT 
    up.permid, 
    up.userid, 
    up.permission,
    ct.SYS_CHANGE_VERSION,
    ct.SYS_CHANGE_CREATION_VERSION,
    ct.SYS_CHANGE_OPERATION,
    CHANGE_TRACKING_IS_COLUMN_IN_MASK
        (COLUMNPROPERTY(OBJECT_ID('dbo.user_perms'), 
                        'permission', 
                        'ColumnId'), 
                        ct.SYS_CHANGE_COLUMNS) AS is_column_in_mask,  
    ct.SYS_CHANGE_COLUMNS,
    CONVERT(sysname, ct.SYS_CHANGE_CONTEXT) as 'SYS_CHANGE_CONTEXT'
FROM CHANGETABLE (CHANGES dbo.user_perms, @ctcv) AS ct
LEFT OUTER JOIN dbo.user_perms AS up
    ON ct.permid = up.permid;

END;

EXEC dbo.ErikIsSickOfChangeTracking;

Which will give us…

SQL Server Change Tracking
soap opera

It’s Over


I don’t think I have anything else to say about Change Tracking right now.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

An Important Difference Between Change Tracking And Change Data Capture In SQL Server

Overload


Sometimes documentation is overwhelming, and it’s easy to miss important distinctions between features.

One thing I’ve seen people run into is that these two technologies have very different relationships with Partitioning.

Now, I know this isn’t going to be the most common scenario, but often when you find people doing rocket surgeon stuff like tracking data changes, there are lots of other features creeping around.

Change Data Capture Lets You Choose


When you run sys.sp_cdc_enable_table to enable Change Data Capture for a table, there’s a parameter to allow for Partition Switching.

It’s fittingly named @allow_partition_switch — and it’s true by default.

That means if you’re using CDC, and you get an error while trying to switch partitions, it’s your fault. You can fix it by disabling and re-enabling CDC with the correct parameter here.

EXEC sys.sp_cdc_disable_table
    @source_schema = 'dbo',
    @source_name = 'ct_part',
    @capture_instance = 'dbo_ct_part';

EXEC sys.sp_cdc_enable_table
    @source_schema = 'dbo',
    @source_name = 'ct_part',
    @role_name = NULL, --or whatever you used before
    @allow_partition_switch = 1;

Change Tracking Just Says No


To demo things a little:

CREATE PARTITION FUNCTION ct_func (int)
AS RANGE LEFT FOR VALUES
    (1, 2, 3, 4, 5);
GO

CREATE PARTITION SCHEME ct_scheme
AS PARTITION ct_func
    ALL TO([PRIMARY]);
GO

We don’t need anything fancy for the partitioning function/scheme, or the tables:

CREATE TABLE dbo.ct_part
(
    id int PRIMARY KEY CLUSTERED ON ct_scheme(id),
    dt datetime
);

CREATE TABLE dbo.ct_stage
(
    id int PRIMARY KEY,
    dt datetime
);

Now we flip on Change Tracking:

ALTER TABLE dbo.ct_part
    ENABLE CHANGE_TRACKING  
    WITH (TRACK_COLUMNS_UPDATED = ON);

We can get a little bit fancy with how we stick data into the table (ha ha ha), and how we figure out what the “highest” partition number is so we can try to switch it out.

INSERT
    dbo.ct_part(id, dt)
SELECT
    x.c,
    x.c
FROM 
(
    VALUES (1),(2),(3),(4),(5)
) AS x(c);

SELECT 
    c.*
FROM dbo.ct_part AS c
CROSS APPLY
(
    VALUES($PARTITION.ct_func(c.id))
) AS cs (p_id);

Since we (probably against most best practices) have five partitions, all with one row in them, our highest partition number will be five.

If we try to switch data out into our staging table, we’ll get an error message:

ALTER TABLE dbo.ct_part 
    SWITCH PARTITION 5 TO dbo.ct_stage;

Msg 4900, Level 16, State 2, Line 62
The ALTER TABLE SWITCH statement failed for table 'Crap.dbo.ct_part'. 
It is not possible to switch the partition of a table that has change tracking enabled. 
Disable change tracking before using ALTER TABLE SWITCH.

We could disable Change Tracking here, do the switch, and flip it back on, but then we’d lose all the tracking data.

SELECT 
    ct.*
FROM CHANGETABLE(CHANGES dbo.ct_part, 0) AS ct;

ALTER TABLE dbo.ct_part
    DISABLE CHANGE_TRACKING;

ALTER TABLE dbo.ct_part 
    SWITCH PARTITION 5 TO dbo.ct_stage;

ALTER TABLE dbo.ct_part
    ENABLE CHANGE_TRACKING  
    WITH (TRACK_COLUMNS_UPDATED = ON);

SELECT 
    ct.*
FROM CHANGETABLE(CHANGES dbo.ct_part, 0) AS ct;
SQL Server Change Tracking
hello goodbye

Is This A Dealbreaker?


This depends a lot on how you’re using the data in Change Tracking tables, and whether or not you can lose all the data in there every time you switch data out.

If processes that rely on the data in there have all completed, and you’re allowed to start from scratch, you’re fine. Otherwise, Change Tracking might not be the solution for you. Similarly, Temporal Tables don’t do so well with Partition Switching either.

Change Data Capture handles partition switching just fine, though. It’s a nice perk, and for some people it narrows the choices down to one immediately.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

SQL Server Foreign Keys Don’t Always Improve Query Plans

Short and Lousy


This is one of the most frustrating things I’ve seen from the optimizer in quite a while.

Here are a couple tables, with a foreign key between them:

CREATE TABLE dbo.ct(id int PRIMARY KEY, dt datetime);

CREATE TABLE dbo.ct_fk(id int PRIMARY KEY, dt datetime);

ALTER TABLE dbo.ct ADD CONSTRAINT
    ct_c_fk FOREIGN KEY (id) REFERENCES dbo.ct_fk(id);

When we use the EXISTS clause, join elimination occurs normally:

SELECT COUNT_BIG(*) AS [?]
FROM dbo.ct AS c
WHERE EXISTS
(
    SELECT 1/0
    FROM dbo.ct_fk AS cf
    WHERE cf.id = c.id
);
SQL Server Query Plan
all the chickens

But when we use NOT EXISTS, it… doesn’t.

SELECT COUNT_BIG(*) AS [?]
FROM dbo.ct AS c
WHERE NOT EXISTS
(
    SELECT 1/0
    FROM dbo.ct_fk AS cf
    WHERE cf.id = c.id
);
SQL Server Query Plan
?

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

But When Did We Capture That Changed Data?

Where CDC Can Fall Short


There are two common questions people ask about changed data that CDC doesn’t support very naturally:

  • When did it change?
  • Who changed it?

While it is possible to join to cdc.lsn_time_mapping to get that information, you may be dealing with a box product that doesn’t support that functionality.

Or something.

Hypothetically.

They’re Just Tables


The kinda funny thing about the tables that all of your changed data ends up in is that… they’re regular tables.

They’re not in the sys schema, they’re in the cdc schema. You can alter them in all sorts of ways. You can drop and truncate them if you want. I mean, not in that order, but you get my point. There’s no special protection for them.

That means you can add a column like this to them to track when the rows ended up in there. This also saves you from altering production tables to account for when things change.

ALTER TABLE cdc.dbo_Posts_CT 
    ADD ChangeTime datetime DEFAULT SYSDATETIME();

Oversight


I do think it’s a pretty big oversight to not have a column like this in the table already, but maybe people use CDC in far different ways than I normally see it used.

Or maybe not, and everyone has to come up with some workaround like this to deal with it. It could be that CDC data ends up like a lot of other data, and people get really excited about having it, but never actually look at it.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Indexed Views As Filtered Indexes In SQL Server

Pssst!


If you landed here from Brent’s weekly links, use this link to get my training for 90% off.

The access is for life, but this coupon code isn’t! Get it while it lasts.

Discount applies at checkout, and you have to purchase everything for it to apply.

A Persistent Frustration


SQL Server comes with some great features for tuning queries:

  • Computed Columns
  • Filtered Indexes
  • Indexed Views

But there’s an interoperability issue when you try to use things together. You can’t create a filtered index with the filter definition on a computed column, nor can you create a filtered index on an indexed view.

If you find yourself backed into a corner, you may need to consider using an indexed view without any aggregation (which is the normal use-case).

Empty Tables


If we try to do something like this, we’ll get an error.

DROP TABLE IF EXISTS dbo.indexed_view;
GO

CREATE TABLE dbo.indexed_view
(
    id int PRIMARY KEY,
    notfizzbuzz AS (id * 2)
);
GO

CREATE INDEX n 
    ON dbo.indexed_view (notfizzbuzz) 
WHERE notfizzbuzz = 0;
GO

Yes, I’m putting the error message here for SEO bucks.

Msg 10609, Level 16, State 1, Line 19
Filtered index 'nfb' cannot be created on table 'dbo.indexed_view' because the column 'notfizzbuzz' in the filter expression is a computed column. 
Rewrite the filter expression so that it does not include this column.

An Indexed View Doesn’t Help


If we run this to create an indexed view on top of our base table, we still can’t create a filtered index, but there’s a different error message.

CREATE OR ALTER VIEW dbo.computed_column
WITH SCHEMABINDING
AS
SELECT
    iv.id, 
    iv.notfizzbuzz
FROM dbo.indexed_view AS iv;
GO 

CREATE UNIQUE CLUSTERED INDEX c 
    ON dbo.computed_column(id);

CREATE INDEX nfb 
    ON dbo.computed_column(notfizzbuzz) 
WHERE notfizzbuzz = 0;
Msg 10610, Level 16, State 1, Line 37
Filtered index 'nfb' cannot be created on object 'dbo.computed_column' because it is not a user table. 
Filtered indexes are only supported on tables. 
If you are trying to create a filtered index on a view, consider creating an indexed view with the filter expression incorporated in the view definition.

But what a thoughtful error message it is! Thanks, whomever wrote that.

Still Needs Help


We can create this indexed view just fine.

CREATE OR ALTER VIEW dbo.computed_column
WITH SCHEMABINDING
AS
SELECT
    iv.id, 
    iv.notfizzbuzz
FROM dbo.indexed_view AS iv
WHERE iv.notfizzbuzz = 0;
GO 

CREATE UNIQUE CLUSTERED INDEX c 
    ON dbo.computed_column(id);

But if we try to select from it, the view is expanded.

SELECT
    cc.id, 
    cc.notfizzbuzz
FROM dbo.computed_column AS cc
WHERE cc.notfizzbuzz = 0;
SQL Server Query Plan
upstate

The issue here is the simple parameterization that is attempted with the trivial plan.

If we run the query like this, and look at the end of the output, we’ll see a message at the bottom that our query is safe for auto (simple) parameterization. This may still happen even if the plan doesn’t remain trivial (more detail at the link above!)

DBCC FREEPROCCACHE;
GO 
DBCC TRACEON(8607, 3604);
GO 
SELECT
    cc.id, 
    cc.notfizzbuzz
FROM dbo.computed_column AS cc
WHERE cc.notfizzbuzz = 0;
DBCC TRACEOFF(8607, 3604);
GO 

********************
** Query marked as Cachable
** Query marked as Safe for Auto-Param
********************

Making It Work


The two ways we can run this query to get the indexed view to be used are like so:

SELECT
    cc.id, 
    cc.notfizzbuzz
FROM dbo.computed_column AS cc WITH(NOEXPAND)
WHERE cc.notfizzbuzz = 0;


SELECT
    cc.id, 
    cc.notfizzbuzz
FROM dbo.computed_column AS cc
WHERE cc.notfizzbuzz = 0
AND   1 = (SELECT 1);
SQL Server Query Plan
thanks i guess

A Closer Look


If we put those two queries through the ringer, we’ll still see auto (simple) parameterization from the first query:

DBCC FREEPROCCACHE;
GO 
DBCC TRACEON(8607, 3604);
GO 
SELECT
    cc.id, 
    cc.notfizzbuzz
FROM dbo.computed_column AS cc WITH(NOEXPAND)
WHERE cc.notfizzbuzz = 0;
GO 
DBCC TRACEOFF(8607, 3604);
GO 

********************
** Query marked as Cachable
** Query marked as Safe for Auto-Param
********************

DBCC FREEPROCCACHE;
GO 
DBCC TRACEON(8607, 3604);
GO 
SELECT
    cc.id, 
    cc.notfizzbuzz
FROM dbo.computed_column AS cc
WHERE cc.notfizzbuzz = 0
AND   1 = (SELECT 1);
GO 
DBCC TRACEOFF(8607, 3604);
GO 

********************
** Query marked as Cachable
********************

It’s goofy, but it’s worth noting. Anyway, if I had to pick one of these methods to get the plan I want, it would be the NOEXPAND version.

Using that hint is the only thing that will allow for statistics to get generated on indexed views.

In case you’re wondering, marking the computed column as PERSISTED doesn’t change the outcome for any of these issues.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Normalizing SQL Server Tables To Reduce Query Blocking

Chalky


I see a lot of tables that look something like this:

CREATE TABLE dbo.orders
(
    order_id int NOT NULL PRIMARY KEY
         DEFAULT (NEXT VALUE FOR dbo.order_id),
    order_date datetime NOT NULL,
    order_ship_date datetime NOT NULL,
    order_total money NOT NULL,
    order_tax money NOT NULL,
    customer_id int NOT NULL
        DEFAULT (NEXT VALUE FOR dbo.customer_id),
    customer_fullname nvarchar(250),
    customer_street nvarchar(250),
    customer_street_2 nvarchar(250),
    customer_city nvarchar(250),
    customer_state nvarchar(250),
    customer_zip nvarchar(250),
    customer_country nvarchar(250)
);

Looking at the design, there are two big problems:

  1. There are “order” columns that are going to get a lot of inserts and updates
  2. You’re going to be storing the same customer information over and over again

The more related, but not independent, data you store in the same table, the harder it becomes to effectively index that table.

A while back, I blogged about Tables Within Tables, but uh… surprisingly, the problem still exists! Usually when I blog about something, the problem disappears. Hm.

Better But Not Best


If we follow some practical guidance and put named columns into their own table, we end up with this:

CREATE TABLE dbo.orders
(
    order_id int NOT NULL PRIMARY KEY
         DEFAULT (NEXT VALUE FOR dbo.order_id),
    order_date datetime NOT NULL,
    order_ship_date datetime NOT NULL,
    order_total money NOT NULL,
    order_tax money NOT NULL,
    customer_id int NOT NULL
);

CREATE TABLE dbo.customers
(
    customer_id int NOT NULL PRIMARY KEY
        DEFAULT (NEXT VALUE FOR dbo.customer_id),
    customer_fullname nvarchar(250),
    customer_street nvarchar(250),
    customer_street_2 nvarchar(250),
    customer_city nvarchar(250),
    customer_state nvarchar(250),
    customer_zip nvarchar(250),
    customer_country nvarchar(250)
);

This is a better scenario, because we only store customer information once, and inserts/updates to order information don’t impact people working with customer data.

But this still isn’t great — what if a customer wants to send an order to a different address?

If we wanted to store everything in this table, we’d be breaking other practical rules: we’d have to have multiple rows for users, or we’d have to add columns columns to the table to deal with multiple addresses. That’s a mess both for people who don’t use all those extra columns, and for people who might have half a dozen addresses they send to.

Getting There


A better way to phrase the customer table might be like this:

CREATE TABLE dbo.customers
(
    customer_id int NOT NULL PRIMARY KEY
        DEFAULT (NEXT VALUE FOR dbo.customer_id),
    default_fullname nvarchar(250),
    default_street nvarchar(250),
    default_street_2 nvarchar(250),
    default_city nvarchar(250),
    default_state nvarchar(250),
    default_zip nvarchar(250),
    default_country nvarchar(250)
);

Most of the time, people are going to send stuff to one address — call it home if you want. It’s probably also their billing address, so it makes sense for it to be the default, and to have it be the first choice.

Then we’ll have a table of EAV data that looks like this:

CREATE TABLE dbo.customers_address_book
(
    address_id int NOT NULL
        DEFAULT (NEXT VALUE FOR dbo.address_id),
    customer_id int NOT NULL,
    address_type tinyint,
    customer_fullname nvarchar(250),
    customer_street nvarchar(250),
    customer_street_2 nvarchar(250),
    customer_city nvarchar(250),
    customer_state nvarchar(250),
    customer_zip nvarchar(250),
    customer_country nvarchar(250),
    CONSTRAINT pk_cab_id PRIMARY KEY (customer_id, address_id)
);

In a table like this, whenever a customer ships to a non-default address it gets stored off here. Now customers can have as many addresses as they want to choose from without us having to have an extra bloated table of default information plus non-default information.

Because of the way this data is modeled, we don’t need to keep adding columns to accommodate multiple addresses. We just tack rows on, and since this data isn’t likely to get updated the insert/select pattern should end up with minimal blocking.

Tomato Sauce


I know, horrifying. You might have to write a join. You poor, downtrodden developer.

Of course, this makes the most sense when you’re dealing with OLTP workloads. And sure, a lot of these columns probably don’t need to be as long as they are, but that’s a totally different post.

When you’re dealing with reporting data, de-normalizing is generally preferred. Though if you’re doing serious reporting and using column store indexes, I’d probably wanna keep the strings out as much as possible, and just key back to them in other tables. Yuck.

Have I ever mentioned that strings in databases were a mistake?

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.