create temp table sql – Page 106

Tweaking Change Tracking To Get Auditing Information In SQL Server

Posted on January 25, 2021May 16, 2022 by Erik Darling

I Apologize

I’ve been working with CDC and CT way too much, and even I’m annoyed with how much it’s coming out in blog posts.

I’m going to cover a lot of ground quickly here. If you get lost, or there’s something you don’t understand, your best bet is to reference the documentation to get caught up.

The idea of this post is to show a few different concepts at once, with one final goal:

How to see if a specific column was updated
How to get just the current change information about a table
How to add information about user and time modified
How to add information about which proc did the modification

The first thing we’re gonna do is get set up.

ALTER DATABASE Crap
    SET CHANGE_TRACKING = ON;
GO 

USE Crap;
GO

--Bye Bye
DROP TABLE IF EXISTS dbo.user_perms;

--Oh, hello
CREATE TABLE dbo.user_perms
(
    permid int,
    userid int,
    permission varchar(20),
    CONSTRAINT pu PRIMARY KEY CLUSTERED (permid)
);

--Turn on CT first this time
ALTER TABLE dbo.user_perms
    ENABLE CHANGE_TRACKING  
    WITH (TRACK_COLUMNS_UPDATED = ON);

With that done, let’s stick some rows in the table, and see what we have for changes.

--Insert some rows
INSERT
    dbo.user_perms(permid, userid, permission)
SELECT
    x.c,
    x.c,
    CASE WHEN x.c % 2 = 0 
         THEN 'db_datareader'
         ELSE 'sa'
    END
FROM 
(
    VALUES (1),(2),(3),(4),(5),
           (6),(7),(8),(9),(10)
) AS x(c);

--What's in the table?
SELECT 
    cc.*
FROM dbo.user_perms AS cc

--What's in Change Tracking?
SELECT 
    ct.*
FROM CHANGETABLE(CHANGES dbo.user_perms, 0) AS ct;

This is what the output look like:

SQL Server Change Tracking — would you do it again?

What Changed?

To find this out, we need to look at CHANGE_TRACKING_IS_COLUMN_IN_MASK, but there’s a caveat about this: it only shows up if you look at specific versions of the changed data. If you just pass null or zero into CHANGETABLE, you won’t see that.

Let’s update and check on some things.

--Update one row, change it to 'sa'
UPDATE up
  SET up.permission = 'sa'
FROM dbo.user_perms AS up
WHERE up.permid = 2;

--No update?
SELECT 
    ct.*
FROM CHANGETABLE(CHANGES dbo.user_perms, NULL) AS ct;

--No update?
SELECT 
    ct.*
FROM CHANGETABLE(CHANGES dbo.user_perms, 0) AS ct;

We updated a row, but if we look at the wrong version, we won’t see the change:

But we will see that permid = 2 has a different change version than the rest of the rows.

Zooming In

If we want to see the version of things with our change, we need to figure out the current version change tracking has stored. This is database-level, but it’s also… wrong.

But to start us off, we query CHANGE_TRACKING_CURRENT_VERSION.

The max version always seems to be one version too high. Doing this will return nothing from CHANGETABLE:

--No update?
DECLARE @ctcv bigint = (SELECT CHANGE_TRACKING_CURRENT_VERSION());
SELECT  @ctcv AS CHANGE_TRACKING_CURRENT_VERSION;

SELECT 
    ct.*
FROM CHANGETABLE(CHANGES dbo.user_perms, @ctcv) AS ct;
GO

To get data back that we care about, we need to do advanced maths and subtract 1 from the current version.

--No update?
DECLARE @ctcv bigint = 
    (SELECT CHANGE_TRACKING_CURRENT_VERSION() -1);
SELECT  @ctcv AS CHANGE_TRACKING_CURRENT_VERSION;

SELECT 
    ct.*,
    CHANGE_TRACKING_IS_COLUMN_IN_MASK
    (COLUMNPROPERTY(OBJECT_ID('dbo.user_perms'), 
                    'permission', 
                    'ColumnId'), 
                    ct.SYS_CHANGE_COLUMNS) AS is_permission_column_in_mask
FROM CHANGETABLE(CHANGES dbo.user_perms, @ctcv) AS ct;

This is also when we need to bring in CHANGE_TRACKING_IS_COLUMN_IN_MASK to figure out if the permission column was updated.

Audacious

The next thing we’ll wanna do is tie an update to a particular user. We can do that with CHANGE_TRACKING_CONTEXT.

As a first example, let’s just get someone’s user name when they run an update.

--Update one row, change it to 'sa'
DECLARE @ctc varbinary(128) = 
    (SELECT CONVERT(varbinary(128), SUSER_NAME()));

WITH CHANGE_TRACKING_CONTEXT (@ctc)
UPDATE up
  SET up.permission = 'sa'
FROM dbo.user_perms AS up
WHERE up.permid = 4;

To validate things a little, we’re gonna join the results of the CHANGETABLE function to the base table.

SELECT 
    up.permid, 
    up.userid, 
    up.permission,
    ct.SYS_CHANGE_VERSION, 
    ct.SYS_CHANGE_OPERATION,
    CONVERT(sysname, ct.SYS_CHANGE_CONTEXT) as 'SYS_CHANGE_CONTEXT'
FROM CHANGETABLE (CHANGES dbo.user_perms, 0) AS ct
LEFT OUTER JOIN dbo.user_perms AS up
    ON ct.permid = up.permid;

We get sa back from SYS_CHANGE_CONTEXT, so that’s nice. But we’re doing this a bit naively, because the SYS_CHANGE_OPERATION column is telling us that sa inserted that row.

That’s… technically wrong. But it’s wrong because we’re looking at the wrong version. This solution definitely isn’t perfect unless you really only dig into the most recent rows. Otherwise, you might blame the wrong person for the wrong thing.

Let’s add a little more information in, and get the right row out. We’re gonna go a step further and add a date to the context information.

--Update one row, change it to 'sa'
DECLARE @ctc varbinary(128) = 
    (SELECT CONVERT(varbinary(128), 
                        'User: ' +
                        SUSER_NAME() +
                        ' @ ' + 
                        CONVERT(varchar(30), SYSDATETIME())) );

WITH CHANGE_TRACKING_CONTEXT (@ctc)
UPDATE up
  SET up.permission = 'sa'
FROM dbo.user_perms AS up
WHERE up.permid = 6;


DECLARE @ctcv bigint = 
    (SELECT CHANGE_TRACKING_CURRENT_VERSION() -1);
SELECT  @ctcv AS CHANGE_TRACKING_CURRENT_VERSION;

SELECT 
    up.permid, 
    up.userid, 
    up.permission,
    ct.SYS_CHANGE_VERSION,
    ct.SYS_CHANGE_CREATION_VERSION,
    ct.SYS_CHANGE_OPERATION,
    ct.SYS_CHANGE_COLUMNS,
    CONVERT(sysname, ct.SYS_CHANGE_CONTEXT) as 'SYS_CHANGE_CONTEXT'
FROM CHANGETABLE (CHANGES dbo.user_perms, @ctcv) AS ct
LEFT OUTER JOIN dbo.user_perms AS up
    ON ct.permid = up.permid;
GO

Cool! Now we have the right operation, and some auditing information correctly associated with it.

Proc-nosis Negative

To add in one more piece of information, and tie things all together, we’re going to add in a stored procedure name. You could do this with a string if you wanted to identify app code, too. Just add something in like Query: dimPerson or whatever you want to call the section of code generated the update.

CREATE OR ALTER PROCEDURE dbo.ErikIsSickOfChangeTracking
AS
BEGIN
SET NOCOUNT, XACT_ABORT ON;

DECLARE @ctc varbinary(128) = 
    (SELECT CONVERT(varbinary(128), 
                        'User: ' +
                        SUSER_NAME() +
                        ' @ ' + 
                        CONVERT(varchar(30), SYSDATETIME()) +
                        ' with ' +
                        OBJECT_NAME(@@PROCID)) );

WITH CHANGE_TRACKING_CONTEXT (@ctc)
UPDATE up
  SET up.permission = 'sa'
FROM dbo.user_perms AS up
WHERE up.permid = 8;

DECLARE @ctcv bigint = 
    (SELECT CHANGE_TRACKING_CURRENT_VERSION() -1);
SELECT  @ctcv AS CHANGE_TRACKING_CURRENT_VERSION;

SELECT 
    up.permid, 
    up.userid, 
    up.permission,
    ct.SYS_CHANGE_VERSION,
    ct.SYS_CHANGE_CREATION_VERSION,
    ct.SYS_CHANGE_OPERATION,
    CHANGE_TRACKING_IS_COLUMN_IN_MASK
        (COLUMNPROPERTY(OBJECT_ID('dbo.user_perms'), 
                        'permission', 
                        'ColumnId'), 
                        ct.SYS_CHANGE_COLUMNS) AS is_column_in_mask,  
    ct.SYS_CHANGE_COLUMNS,
    CONVERT(sysname, ct.SYS_CHANGE_CONTEXT) as 'SYS_CHANGE_CONTEXT'
FROM CHANGETABLE (CHANGES dbo.user_perms, @ctcv) AS ct
LEFT OUTER JOIN dbo.user_perms AS up
    ON ct.permid = up.permid;

END;

EXEC dbo.ErikIsSickOfChangeTracking;

Which will give us…

It’s Over

I don’t think I have anything else to say about Change Tracking right now.

Thanks for reading!

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

An Important Difference Between Change Tracking And Change Data Capture In SQL Server

Posted on January 22, 2021May 16, 2022 by Erik Darling

Overload

Sometimes documentation is overwhelming, and it’s easy to miss important distinctions between features.

One thing I’ve seen people run into is that these two technologies have very different relationships with Partitioning.

Now, I know this isn’t going to be the most common scenario, but often when you find people doing rocket surgeon stuff like tracking data changes, there are lots of other features creeping around.

Change Data Capture Lets You Choose

When you run sys.sp_cdc_enable_table to enable Change Data Capture for a table, there’s a parameter to allow for Partition Switching.

It’s fittingly named @allow_partition_switch — and it’s true by default.

That means if you’re using CDC, and you get an error while trying to switch partitions, it’s your fault. You can fix it by disabling and re-enabling CDC with the correct parameter here.

EXEC sys.sp_cdc_disable_table
    @source_schema = 'dbo',
    @source_name = 'ct_part',
    @capture_instance = 'dbo_ct_part';

EXEC sys.sp_cdc_enable_table
    @source_schema = 'dbo',
    @source_name = 'ct_part',
    @role_name = NULL, --or whatever you used before
    @allow_partition_switch = 1;

Change Tracking Just Says No

To demo things a little:

CREATE PARTITION FUNCTION ct_func (int)
AS RANGE LEFT FOR VALUES
    (1, 2, 3, 4, 5);
GO

CREATE PARTITION SCHEME ct_scheme
AS PARTITION ct_func
    ALL TO([PRIMARY]);
GO

We don’t need anything fancy for the partitioning function/scheme, or the tables:

CREATE TABLE dbo.ct_part
(
    id int PRIMARY KEY CLUSTERED ON ct_scheme(id),
    dt datetime
);

CREATE TABLE dbo.ct_stage
(
    id int PRIMARY KEY,
    dt datetime
);

Now we flip on Change Tracking:

ALTER TABLE dbo.ct_part
    ENABLE CHANGE_TRACKING  
    WITH (TRACK_COLUMNS_UPDATED = ON);

We can get a little bit fancy with how we stick data into the table (ha ha ha), and how we figure out what the “highest” partition number is so we can try to switch it out.

INSERT
    dbo.ct_part(id, dt)
SELECT
    x.c,
    x.c
FROM 
(
    VALUES (1),(2),(3),(4),(5)
) AS x(c);

SELECT 
    c.*
FROM dbo.ct_part AS c
CROSS APPLY
(
    VALUES($PARTITION.ct_func(c.id))
) AS cs (p_id);

Since we (probably against most best practices) have five partitions, all with one row in them, our highest partition number will be five.

If we try to switch data out into our staging table, we’ll get an error message:

ALTER TABLE dbo.ct_part 
    SWITCH PARTITION 5 TO dbo.ct_stage;

Msg 4900, Level 16, State 2, Line 62
The ALTER TABLE SWITCH statement failed for table 'Crap.dbo.ct_part'. 
It is not possible to switch the partition of a table that has change tracking enabled. 
Disable change tracking before using ALTER TABLE SWITCH.

We could disable Change Tracking here, do the switch, and flip it back on, but then we’d lose all the tracking data.

SELECT 
    ct.*
FROM CHANGETABLE(CHANGES dbo.ct_part, 0) AS ct;

ALTER TABLE dbo.ct_part
    DISABLE CHANGE_TRACKING;

ALTER TABLE dbo.ct_part 
    SWITCH PARTITION 5 TO dbo.ct_stage;

ALTER TABLE dbo.ct_part
    ENABLE CHANGE_TRACKING  
    WITH (TRACK_COLUMNS_UPDATED = ON);

SELECT 
    ct.*
FROM CHANGETABLE(CHANGES dbo.ct_part, 0) AS ct;

Is This A Dealbreaker?

This depends a lot on how you’re using the data in Change Tracking tables, and whether or not you can lose all the data in there every time you switch data out.

If processes that rely on the data in there have all completed, and you’re allowed to start from scratch, you’re fine. Otherwise, Change Tracking might not be the solution for you. Similarly, Temporal Tables don’t do so well with Partition Switching either.

Change Data Capture handles partition switching just fine, though. It’s a nice perk, and for some people it narrows the choices down to one immediately.

Thanks for reading!

Going Further

SQL Server Foreign Keys Don’t Always Improve Query Plans

Posted on January 21, 2021May 16, 2022 by Erik Darling

Short and Lousy

This is one of the most frustrating things I’ve seen from the optimizer in quite a while.

Here are a couple tables, with a foreign key between them:

CREATE TABLE dbo.ct(id int PRIMARY KEY, dt datetime);

CREATE TABLE dbo.ct_fk(id int PRIMARY KEY, dt datetime);

ALTER TABLE dbo.ct ADD CONSTRAINT
    ct_c_fk FOREIGN KEY (id) REFERENCES dbo.ct_fk(id);

When we use the EXISTS clause, join elimination occurs normally:

SELECT COUNT_BIG(*) AS [?]
FROM dbo.ct AS c
WHERE EXISTS
(
    SELECT 1/0
    FROM dbo.ct_fk AS cf
    WHERE cf.id = c.id
);

SQL Server Query Plan — all the chickens

But when we use NOT EXISTS, it… doesn’t.

SELECT COUNT_BIG(*) AS [?]
FROM dbo.ct AS c
WHERE NOT EXISTS
(
    SELECT 1/0
    FROM dbo.ct_fk AS cf
    WHERE cf.id = c.id
);

Thanks for reading!

Going Further

But When Did We Capture That Changed Data?

Posted on January 20, 2021May 16, 2022 by Erik Darling

Where CDC Can Fall Short

There are two common questions people ask about changed data that CDC doesn’t support very naturally:

When did it change?
Who changed it?

While it is possible to join to cdc.lsn_time_mapping to get that information, you may be dealing with a box product that doesn’t support that functionality.

Or something.

Hypothetically.

They’re Just Tables

The kinda funny thing about the tables that all of your changed data ends up in is that… they’re regular tables.

They’re not in the sys schema, they’re in the cdc schema. You can alter them in all sorts of ways. You can drop and truncate them if you want. I mean, not in that order, but you get my point. There’s no special protection for them.

That means you can add a column like this to them to track when the rows ended up in there. This also saves you from altering production tables to account for when things change.

ALTER TABLE cdc.dbo_Posts_CT 
    ADD ChangeTime datetime DEFAULT SYSDATETIME();

Oversight

I do think it’s a pretty big oversight to not have a column like this in the table already, but maybe people use CDC in far different ways than I normally see it used.

Or maybe not, and everyone has to come up with some workaround like this to deal with it. It could be that CDC data ends up like a lot of other data, and people get really excited about having it, but never actually look at it.

Thanks for reading!

Going Further

Indexed Views As Filtered Indexes In SQL Server

Posted on January 19, 2021August 4, 2022 by Erik Darling

Pssst!

If you landed here from Brent’s weekly links, use this link to get my training for 90% off.

The access is for life, but this coupon code isn’t! Get it while it lasts.

Discount applies at checkout, and you have to purchase everything for it to apply.

A Persistent Frustration

SQL Server comes with some great features for tuning queries:

Computed Columns
Filtered Indexes
Indexed Views

But there’s an interoperability issue when you try to use things together. You can’t create a filtered index with the filter definition on a computed column, nor can you create a filtered index on an indexed view.

If you find yourself backed into a corner, you may need to consider using an indexed view without any aggregation (which is the normal use-case).

Empty Tables

If we try to do something like this, we’ll get an error.

DROP TABLE IF EXISTS dbo.indexed_view;
GO

CREATE TABLE dbo.indexed_view
(
    id int PRIMARY KEY,
    notfizzbuzz AS (id * 2)
);
GO

CREATE INDEX n 
    ON dbo.indexed_view (notfizzbuzz) 
WHERE notfizzbuzz = 0;
GO

Yes, I’m putting the error message here for SEO bucks.

Msg 10609, Level 16, State 1, Line 19
Filtered index 'nfb' cannot be created on table 'dbo.indexed_view' because the column 'notfizzbuzz' in the filter expression is a computed column. 
Rewrite the filter expression so that it does not include this column.

An Indexed View Doesn’t Help

If we run this to create an indexed view on top of our base table, we still can’t create a filtered index, but there’s a different error message.

CREATE OR ALTER VIEW dbo.computed_column
WITH SCHEMABINDING
AS
SELECT
    iv.id, 
    iv.notfizzbuzz
FROM dbo.indexed_view AS iv;
GO 

CREATE UNIQUE CLUSTERED INDEX c 
    ON dbo.computed_column(id);

CREATE INDEX nfb 
    ON dbo.computed_column(notfizzbuzz) 
WHERE notfizzbuzz = 0;

Msg 10610, Level 16, State 1, Line 37
Filtered index 'nfb' cannot be created on object 'dbo.computed_column' because it is not a user table. 
Filtered indexes are only supported on tables. 
If you are trying to create a filtered index on a view, consider creating an indexed view with the filter expression incorporated in the view definition.

But what a thoughtful error message it is! Thanks, whomever wrote that.

Still Needs Help

We can create this indexed view just fine.

CREATE OR ALTER VIEW dbo.computed_column
WITH SCHEMABINDING
AS
SELECT
    iv.id, 
    iv.notfizzbuzz
FROM dbo.indexed_view AS iv
WHERE iv.notfizzbuzz = 0;
GO 

CREATE UNIQUE CLUSTERED INDEX c 
    ON dbo.computed_column(id);

But if we try to select from it, the view is expanded.

SELECT
    cc.id, 
    cc.notfizzbuzz
FROM dbo.computed_column AS cc
WHERE cc.notfizzbuzz = 0;

The issue here is the simple parameterization that is attempted with the trivial plan.

If we run the query like this, and look at the end of the output, we’ll see a message at the bottom that our query is safe for auto (simple) parameterization. This may still happen even if the plan doesn’t remain trivial (more detail at the link above!)

DBCC FREEPROCCACHE;
GO 
DBCC TRACEON(8607, 3604);
GO 
SELECT
    cc.id, 
    cc.notfizzbuzz
FROM dbo.computed_column AS cc
WHERE cc.notfizzbuzz = 0;
DBCC TRACEOFF(8607, 3604);
GO 

********************
** Query marked as Cachable
** Query marked as Safe for Auto-Param
********************

Making It Work

The two ways we can run this query to get the indexed view to be used are like so:

SELECT
    cc.id, 
    cc.notfizzbuzz
FROM dbo.computed_column AS cc WITH(NOEXPAND)
WHERE cc.notfizzbuzz = 0;


SELECT
    cc.id, 
    cc.notfizzbuzz
FROM dbo.computed_column AS cc
WHERE cc.notfizzbuzz = 0
AND   1 = (SELECT 1);

A Closer Look

If we put those two queries through the ringer, we’ll still see auto (simple) parameterization from the first query:

DBCC FREEPROCCACHE;
GO 
DBCC TRACEON(8607, 3604);
GO 
SELECT
    cc.id, 
    cc.notfizzbuzz
FROM dbo.computed_column AS cc WITH(NOEXPAND)
WHERE cc.notfizzbuzz = 0;
GO 
DBCC TRACEOFF(8607, 3604);
GO 

********************
** Query marked as Cachable
** Query marked as Safe for Auto-Param
********************

DBCC FREEPROCCACHE;
GO 
DBCC TRACEON(8607, 3604);
GO 
SELECT
    cc.id, 
    cc.notfizzbuzz
FROM dbo.computed_column AS cc
WHERE cc.notfizzbuzz = 0
AND   1 = (SELECT 1);
GO 
DBCC TRACEOFF(8607, 3604);
GO 

********************
** Query marked as Cachable
********************

It’s goofy, but it’s worth noting. Anyway, if I had to pick one of these methods to get the plan I want, it would be the NOEXPAND version.

Using that hint is the only thing that will allow for statistics to get generated on indexed views.

In case you’re wondering, marking the computed column as PERSISTED doesn’t change the outcome for any of these issues.

Thanks for reading!

Going Further

Normalizing SQL Server Tables To Reduce Query Blocking

Posted on January 18, 2021June 17, 2022 by Erik Darling

Chalky

I see a lot of tables that look something like this:

CREATE TABLE dbo.orders
(
    order_id int NOT NULL PRIMARY KEY
         DEFAULT (NEXT VALUE FOR dbo.order_id),
    order_date datetime NOT NULL,
    order_ship_date datetime NOT NULL,
    order_total money NOT NULL,
    order_tax money NOT NULL,
    customer_id int NOT NULL
        DEFAULT (NEXT VALUE FOR dbo.customer_id),
    customer_fullname nvarchar(250),
    customer_street nvarchar(250),
    customer_street_2 nvarchar(250),
    customer_city nvarchar(250),
    customer_state nvarchar(250),
    customer_zip nvarchar(250),
    customer_country nvarchar(250)
);

Looking at the design, there are two big problems:

There are “order” columns that are going to get a lot of inserts and updates
You’re going to be storing the same customer information over and over again

The more related, but not independent, data you store in the same table, the harder it becomes to effectively index that table.

A while back, I blogged about Tables Within Tables, but uh… surprisingly, the problem still exists! Usually when I blog about something, the problem disappears. Hm.

Better But Not Best

If we follow some practical guidance and put named columns into their own table, we end up with this:

CREATE TABLE dbo.orders
(
    order_id int NOT NULL PRIMARY KEY
         DEFAULT (NEXT VALUE FOR dbo.order_id),
    order_date datetime NOT NULL,
    order_ship_date datetime NOT NULL,
    order_total money NOT NULL,
    order_tax money NOT NULL,
    customer_id int NOT NULL
);

CREATE TABLE dbo.customers
(
    customer_id int NOT NULL PRIMARY KEY
        DEFAULT (NEXT VALUE FOR dbo.customer_id),
    customer_fullname nvarchar(250),
    customer_street nvarchar(250),
    customer_street_2 nvarchar(250),
    customer_city nvarchar(250),
    customer_state nvarchar(250),
    customer_zip nvarchar(250),
    customer_country nvarchar(250)
);

This is a better scenario, because we only store customer information once, and inserts/updates to order information don’t impact people working with customer data.

But this still isn’t great — what if a customer wants to send an order to a different address?

If we wanted to store everything in this table, we’d be breaking other practical rules: we’d have to have multiple rows for users, or we’d have to add columns columns to the table to deal with multiple addresses. That’s a mess both for people who don’t use all those extra columns, and for people who might have half a dozen addresses they send to.

Getting There

A better way to phrase the customer table might be like this:

CREATE TABLE dbo.customers
(
    customer_id int NOT NULL PRIMARY KEY
        DEFAULT (NEXT VALUE FOR dbo.customer_id),
    default_fullname nvarchar(250),
    default_street nvarchar(250),
    default_street_2 nvarchar(250),
    default_city nvarchar(250),
    default_state nvarchar(250),
    default_zip nvarchar(250),
    default_country nvarchar(250)
);

Most of the time, people are going to send stuff to one address — call it home if you want. It’s probably also their billing address, so it makes sense for it to be the default, and to have it be the first choice.

Then we’ll have a table of EAV data that looks like this:

CREATE TABLE dbo.customers_address_book
(
    address_id int NOT NULL
        DEFAULT (NEXT VALUE FOR dbo.address_id),
    customer_id int NOT NULL,
    address_type tinyint,
    customer_fullname nvarchar(250),
    customer_street nvarchar(250),
    customer_street_2 nvarchar(250),
    customer_city nvarchar(250),
    customer_state nvarchar(250),
    customer_zip nvarchar(250),
    customer_country nvarchar(250),
    CONSTRAINT pk_cab_id PRIMARY KEY (customer_id, address_id)
);

In a table like this, whenever a customer ships to a non-default address it gets stored off here. Now customers can have as many addresses as they want to choose from without us having to have an extra bloated table of default information plus non-default information.

Because of the way this data is modeled, we don’t need to keep adding columns to accommodate multiple addresses. We just tack rows on, and since this data isn’t likely to get updated the insert/select pattern should end up with minimal blocking.

Tomato Sauce

I know, horrifying. You might have to write a join. You poor, downtrodden developer.

Of course, this makes the most sense when you’re dealing with OLTP workloads. And sure, a lot of these columns probably don’t need to be as long as they are, but that’s a totally different post.

When you’re dealing with reporting data, de-normalizing is generally preferred. Though if you’re doing serious reporting and using column store indexes, I’d probably wanna keep the strings out as much as possible, and just key back to them in other tables. Yuck.

Have I ever mentioned that strings in databases were a mistake?

Thanks for reading!

Going Further

SQL Server Performance Measures And Factors

Posted on January 15, 2021May 16, 2022 by Erik Darling

Start To Finish

When I’m working with clients, we look at a lot of stuff together to make sure things are in good working order.

If we’re tuning a single query, the goal is pretty easy to define. We want it to finish as fast as possible.

How you get there is where things may get interesting, but let’s be honest: most of the time you’re going to be fixing the same half a dozen problems in some combination.

Note that the query finishing, and results being finished returning are two different metrics. A query may finish very quickly, but returning results may take considerably longer for various reasons.

You can think of this as responsiveness, and it’s a cornerstone to building something people are happy with. When you’re talking about user experience, performance has to be part of the conversation.

How fast individual queries finish also plays a big part in overall concurrency.

Put Through

Typically this matters more for OLTP workloads. Data warehouses aren’t usually high concurrency environments in that the Batch Requests/Sec counter is sitting around bored. Joe Obbish has a great talk on improving data warehouse loading patterns, because usually isn’t always. Ha ha ha. Barf.

Workloads that do have OLTP characteristics are more sensitive to all sorts of things, and I don’t just mean parameters. Locking and Deadlocking are high up on the list, along with available hardware resources.

Rather obviously, if you want to improve overall throughput, having queries finish as quickly as possible is an attractive goal. You can run way more queries in one second if they take 100ms than if they take 900ms, you know? You know. You’re smart.

Factorials

What are the factors that influence query speed?

Starting in the database:

Well written queries
Thoughtful indexes
Maintained Statistics
Correctly stored data types
Properly normalized tables

Moving out a little bit further, there are some important settings:

Parallelism
Optimistic Isolation Levels
Max Server Memory
tempdb stuff
Instant File Initialization
Lock Pages In Memory (mostly)

Further out, we have the hardware:

CPU
Memory
Disk
Network Path To Disks (SAN)

A lot of times something being off in one place can make it look like there’s a bottleneck in another place. Probably the easiest example to conjure up is if you’re missing an opportune nonclustered index, and you end up scanning a clustered index over and over again. The clustered index doesn’t fit in memory, so you end up generating a lot of PAGEIOLATCH_SH waits.

It might look like you have a memory shortage, but in reality you could be making far better use of the memory you have.

Margin For Error

Being able to identify and fix specific bottlenecks is an important skill. Wait stats and other counters can be useful, but are often underwhelming unless you’re watching at the right time, or you have a monitoring tool that logs them for you. Bottlenecks can live in many places, and oftentimes performance tuning is like playing whack-a-mole with them.

Hitting hard limits and running out of CPU or RAM is never fun. Picking hardware that’s up to the task is step one, but staying on top of query and index tuning to make sure it stays adequate for longer is a process that you need to go through. If this is the sort of thing you need help with, drop me a line.

Thanks for reading!

Going Further

Tracking Row Changes With Temporal Columns In SQL Server

Posted on January 13, 2021May 16, 2022 by Erik Darling

Extra, Extra

Temporal tables are cool, but keeping all that history can be stressful. Change Tracking adds overhead to every transaction, and requires Snapshot Isolation to be successful. Change Data Capture can also run into problems scanning the transaction log, depending on transaction volume, etc.

Change Data Capture is also a little unfortunate in that it doesn’t track schema changes like adding or dropping columns, or changing column data types. Change Tracking doesn’t either, it just tracks the keys of what changed when data is modified.

Temporal Tables are different, because it will add new, and drop removed columns from the history table as they change in the base table. It’ll also push data type changes across. But the history table will not have the various permutations tracked. It just mirrors current state.

Just The Columns, Ma’am

Let’s say your needs aren’t robust enough to need any one of those highly specialized features, or even triggers to move data around when it changes.

You’re perfectly free and able to add the tracking columns that temporal tables use to your base tables, but adding them is far from free. When I added them to the 17 million row Posts table, it took about 40 seconds. My laptop doesn’t suck, either.

To show you a little how it works, let’s create a copy of the Votes table from Stack Overflow.

CREATE TABLE dbo.Votes_Tracked
(
    Id int NOT NULL,
    PostId int NOT NULL,
    UserId int NULL,
    BountyAmount int NULL,
    VoteTypeId int NOT NULL,
    CreationDate datetime NOT NULL,
    LastModified datetime2 GENERATED ALWAYS AS ROW start NOT NULL
        CONSTRAINT DF_LastModified DEFAULT (SYSDATETIME()),
    JunkDate datetime2 GENERATED ALWAYS AS ROW END HIDDEN NOT NULL
        CONSTRAINT DF_JunkDate DEFAULT ('9999-12-31 23:59:59.9999999'),
    PERIOD FOR SYSTEM_TIME (LastModified, JunkDate),
    CONSTRAINT PK_Votes_Id
        PRIMARY KEY CLUSTERED (Id ASC)
);
GO

Note that you need two columns to define the “period for system time”, and one of them will always be useless. That’s why I called it JunkDate, and not, like, whatever. But the good news is you can define that column as HIDDEN so that it doesn’t show up in all your queries.

Now we can stick some data in there and see how it works.

INSERT 
    dbo.Votes_Tracked WITH (TABLOCK)
(
    Id,
    PostId,
    UserId,
    BountyAmount,
    VoteTypeId,
    CreationDate
)
SELECT
    Id,
    PostId,
    UserId,
    BountyAmount,
    VoteTypeId,
    CreationDate
FROM StackOverflow2013.dbo.Votes AS v
WHERE v.CreationDate >= '20130101';

Looking Around

The table data looks like this:

If we run an update:

UPDATE v
    SET v.BountyAmount = 9999
FROM dbo.Votes_Tracked AS v
WHERE v.VoteTypeId = 7;

Note that these screen caps were taken without the HIDDEN keyword added to the table definition: that was an after thought recommended by my friend Peter.

Whatabouts?

Of course, if you remove rows from the table, they’re just gone. You’d still need a trigger to cover deletes, if you need to track those.

And if you want to remove those columns later, it takes a little bit of tweaking.

ALTER TABLE dbo.Votes_Tracked DROP CONSTRAINT DF_LastModified, DF_JunkDate;
ALTER TABLE dbo.Votes_Tracked DROP COLUMN JunkDate, LastModified;

Msg 13588, Level 16, State 1, Line 63
Column 'JunkDate' in table 'Crap.dbo.Votes_Tracked' cannot be dropped because it is a part of period definition.

Of course, the table isn’t system versioned, so this command will also fail:

ALTER TABLE dbo.Votes_Tracked SET (SYSTEM_VERSIONING = OFF);

Msg 13591, Level 16, State 1, Line 66
SYSTEM_VERSIONING is not turned ON for table 'Crap.dbo.Votes_Tracked'.

If you want to remove them, you’ll need to use this:

ALTER TABLE dbo.Votes_Tracked DROP PERIOD FOR SYSTEM_TIME;

Now you’ll be able to remove them.

Is This A Good Idea?

Well, it depends on what you need. If you just need to know when a row changed, and you don’t need to know what changed or who changed it, it can work in a more automated way than triggers. It does require an additional column, which isn’t ideal, but it’s not a gigantic column, and you don’t need to worry about indexing it because the data is junk.

I haven’t found any ridiculous downsides to this, but I’ll keep this updated if I do.

Thanks for reading!

Going Further

Considerations For Implementing Soft Deletes In SQL Server

Posted on January 12, 2021May 16, 2022 by Erik Darling

From The Beginning

Implementing soft deletes for an app that’s been around for a while can be tough. In the same way as implementing Partitioning can be tough to add in later to get data management value from (rebuilding clustered indexes on the scheme, making sure all nonclustered indexes are aligned, and all future indexes are too, and making sure you have sufficient partitions at the beginning and end for data movement).

Boy, I really stressed those parentheses out.

If you do either one from the outset, it’s far less painful to manage. The structural stuff is there for you from the beginning, and you can test different strategies early on before data change become difficult to manage.

Queries

The first and most obvious thing is that all your queries now need to only find data that isn’t deleted.

Almost universally, it’s easier to put views on top of tables that have the appropriate bit search for deleted or not deleted rows than to expect people to remember it.

CREATE VIEW dbo.Users_Active
AS
SELECT 
    u.*
FROM dbo.Users AS u
WHERE u.is_deleted = 0;

CREATE VIEW dbo.Users_Inactive
AS
SELECT 
    u.*
FROM dbo.Users AS u
WHERE u.is_deleted = 1;

It’s not that views have any magical performance properties; they’re just queries after all, but it gives you an explicit data source.

Indexes

Depending on how your other queries search for data, you may need to start accounting for the is_deleted flag in your indexes. This could make a really big difference if the optimizer stops choosing your narrower nonclustered indexes because it hates key lookups.

Typically, other predicates will give you a selective-enough result set that a residual predicate on a bit field won’t make much difference. If you’ve already got a seek to the portion of data you’re interested in and most of it will be not-deleted, who cares?

And let’s be honest, in most implementations deleted rows will be the minority of data, and searches for it will be far less common. Usually it’s just there for an occasional audit.

In adjacent cases where instead of deleted you need to designate things as currently active, and you may have many inactive rows compared to active rows, filtered indexes can be your best friend.

Coming back to the views, I don’t think that making them indexed is necessary by default, but it might be if you’re using forced parameterization and filtered indexes.

CREATE TABLE dbo.Users(id int, is_deleted bit);
GO 

CREATE INDEX u ON dbo.Users (id) WHERE is_deleted = 0;
GO

SELECT 
    u.id, u.is_deleted
FROM dbo.Users AS u
WHERE u.is_deleted = 0;

Under simple parameterization, this can be fine. Under forced parameterization, things can get weird.

Tables and Tables

In some cases, it might be easier to create tables specifically for deleted rows so you don’t have unnecessary data in your main tables. You can implement this easily enough with after triggers. Just make sure they’re designed to handle multiple rows.

If you want something out of the box, you might mess with:

Temporal tables
Change Data Capture
Change Tracking

However, none of those help you deal with who deleted rows. For that, you’ll need an Audit.

Thanks for reading!

Going Further

Don’t Be Afraid Of tempdb In SQL Server

Posted on January 5, 2021May 16, 2022 by Erik Darling

It’s There For A Reason

When tuning queries, one pattern I see over and over again is people running crazy-long queries. Maybe they worked well-enough at some point, but over the years they just kept getting slower and slower.

Sometimes there are comments, and other times there’s enough domain knowledge on the call to understand how a query ended up in the shape it’s in. One persistent idea is that tempdb is something to be avoided. Either because it was “slow” or to avoid contention.

Granted, if a query has been around long enough, these may have been valid concerns at some point. In general though, temp tables (the # kind, not the @ kind) can be quite useful when query tuning.

You Might Be Using It, Anyway

Even if there’s some rule against directly using temp tables, queries can end up using tempdb by the caseload anyway.

Consider that Spool operators explicitly execute in tempdb, any spills will go to tempdb, and work tables that are used in a number of circumstances occur in tempdb. The bigger and more complicated your queries are, the more likely you are to run into cases where the optimizer Spools, Spills, or use some other workspace area in tempdb in your query plan.

Worse, optimizations available for temp tables aren’t available to on-the-fly operators. You also lose the ability to take further action by indexing your temp tables, etc.

It’s Often Easier Than Other Options

Many times when tuning queries, I’ll be puzzled by the optimizer’s choices. Sometimes it’s join type, other times it’s join order, or something else. Perhaps the most common reason is some misestimation, of course.

Query and index hints are great to experiment with, but are often unsatisfying as permanent fixes. I’m not saying to never use them, but you should explore other options first. In other words, keep temp tables on the table.

Thanks for reading!