DROPCLEANBUFFERS: A Better Test Of Storage Than Query Performance For SQL Server

Relics


When I’m tuning queries, people will often ask me what metrics I look at to judge efficiency. Usually, it’s just getting things to be done faster.

Sometimes it’s okay to use more CPU via a parallel plan to get your query faster. Sometimes it’s okay to do more reads to get a query faster.

Sure, it’s cool when it works out that you can reduce resources overall, but every query is special. It all sort of depends on where the bottleneck is.

One thing I’ve been asked about several times is about how important it is to clear out the plan cache and drop clean buffers between runs.

While this post is only about the dropping of cleanly buffers, let’s touch on clearing the plan cache in some way quickly.

Dusted


Clearing out the plan cache (or recompiling, or whatever) is rarely an effective query tuning mechanism, unless you’re working on a parameter sniffing issue, or trying to prove that something else about a query is causing a problem. Maybe it’s local variables, maybe it’s a bad estimate from a table variable.

You get the point.

But starting with a new plan every time is overblown — if you change things like indexes or the way the query is written, you’re gonna get a new plan anyway.

If you’re worried about long compile times, you might also want to do this to prove it’s not necessarily the query that’s slow.

Busted


Let’s look at a big picture. The script that generates this picture is as follow:

--Calgon
DBCC DROPCLEANBUFFERS;

--o boy
SELECT
    COUNT_BIG(*) AS records
FROM dbo.Posts AS p;

--table manners
SELECT
    COUNT_BIG(*) AS records
FROM dbo.Posts AS p;

--so considerate
CREATE INDEX p ON dbo.Posts(Id);

--y tho?
DBCC DROPCLEANBUFFERS;

--woah woah woah
SELECT
    COUNT_BIG(*) AS records
FROM dbo.Posts AS p;

--hey handsome
SELECT
    COUNT_BIG(*) AS records
FROM dbo.Posts AS p;

We’re gonna drop them buffferinos, run the same count query twice, add a real narrow index, then count twice again.

Great. Great. Great.

SQL Server Query Plan
different lifetime

Annalieses


For the first two executions, we performance tuned the query by about 30 seconds, just by… reading data from memory.

Hm. Okay. Unless you’re trying to prove that you don’t have enough memory, or that storage sucks, you’re not really convincing me of much.

Yes, RAM is faster than disk. Now what?

For the second two executions, query performance got way better. But reading the smaller index from disk hardly changed overall execution time.

If it’s not a strong enough argument that getting a query from 14 seconds down to half a second with a better index means you need an index, you might be working for difficult people.

Of course, Size Matters™

brick to back

The second query finishes much faster because we have a much smaller amount of data to read, period. If we had a where clause that allowed our index to seek to a specific chunk of data, we could have done even less work. This shouldn’t surprise anyone, though. Reading 450MB is faster than reading 120,561MB.

This is not a math test.

Coasting


Starting queries out with an empty buffer pool doesn’t really offer any insights into if you’ve tuned the query. It only exaggerates a difference that is usually not a reality.

It is a useful tool if you want to prove that:

  • You need more memory
  • You need better storage
  • You need a different index

But I sure wouldn’t use it to prove that I made a query better.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Residual Predicates In SQL Server Query Plans

We Will Talk About Things And Have Fun Now


Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

USE StackOverflow;

EXEC dbo.DropIndexes; 

/*
CREATE INDEX east 
    ON dbo.Posts
        (PostTypeId, Score, OwnerUserId) 
WITH ( MAXDOP = 8, 
       SORT_IN_TEMPDB = ON, 
       DATA_COMPRESSION = ROW );
*/

DROP TABLE IF EXISTS #t;
GO 

SELECT   
    u.Id,
    u.Reputation,
    u.DisplayName,
    p.Id AS PostId,
    p.Title
INTO #t
FROM dbo.Users AS u
JOIN dbo.Posts AS p
    ON p.OwnerUserId = u.Id
WHERE u.Reputation >= 1000
AND   p.PostTypeId = 1
AND   p.Score >= 1000 
ORDER BY u.Reputation DESC;



/*
CREATE INDEX east 
    ON dbo.Posts(PostTypeId, Score, OwnerUserId);
*/
SELECT 
    t.Id, 
    t.Reputation, 
    ( 
        SELECT 
            MAX(p.Score) 
        FROM dbo.Posts AS p 
        WHERE p.OwnerUserId = t.Id 
        AND   p.PostTypeId IN (1, 2) 
    ) AS TopPostScore,
    t.PostId, 
    t.Title
FROM #t AS t
ORDER BY t.Reputation DESC;


/*
Usually I love replacing select 
list subqueries with APPLY

Just show the saved plan here
*/
SELECT 
    t.Id, 
    t.Reputation, 
    pq.Score,
    t.PostId, 
    t.Title
FROM #t AS t
OUTER APPLY --We have to use outer apply to not restrict results!
(
    SELECT 
        MAX(p.Score) AS Score
    FROM dbo.Posts AS p 
    WHERE p.OwnerUserId = t.Id 
    AND   p.PostTypeId IN (1, 2)
) AS pq
ORDER BY t.Reputation DESC;


/*
TOP (1) also spools
*/
SELECT 
    t.Id, 
    t.Reputation, 
    ( 
        SELECT TOP (1) 
            p.Score
        FROM dbo.Posts AS p
        WHERE p.PostTypeId IN (1, 2)
        AND   p.OwnerUserId = t.Id
        ORDER BY p.Score DESC 
    ) AS TopPostScore,
    t.PostId, 
    t.Title
FROM #t AS t
ORDER BY t.Reputation DESC;

SELECT 
    t.Id, 
    t.Reputation, 
    pq.Score,
    t.PostId, 
    t.Title
FROM #t AS t
OUTER APPLY
(
    SELECT TOP (1) 
        p.Score
    FROM dbo.Posts AS p
    WHERE p.PostTypeId IN (1, 2)
    AND   p.OwnerUserId = t.Id
    ORDER BY p.Score DESC
) AS pq
ORDER BY t.Reputation DESC;


/*
CREATE INDEX east 
    ON dbo.Posts(PostTypeId, Score, OwnerUserId);
*/
SELECT 
    t.Id, 
    t.Reputation, 
    pq.Score,
    t.PostId, 
    t.Title
FROM #t AS t
OUTER APPLY --This one is fast
(
    SELECT TOP (1) 
        p.Score
    FROM dbo.Posts AS p
    WHERE p.PostTypeId = 1
    AND   p.OwnerUserId = t.Id
    ORDER BY p.Score DESC
) AS pq
ORDER BY t.Reputation DESC;

SELECT 
    t.Id, 
    t.Reputation, 
    pa.Score,
    t.PostId, 
    t.Title
FROM #t AS t
OUTER APPLY --This two is slow...
(
    SELECT TOP (1) 
        p.Score
    FROM dbo.Posts AS p
    WHERE p.PostTypeId = 2
    AND   p.OwnerUserId = t.Id
    ORDER BY p.Score DESC
) AS pa
ORDER BY t.Reputation DESC;


/*
Use the Score!
*/
SELECT 
    t.Id, 
    t.Reputation, 
    ISNULL(pa.Score, pq.Score) AS TopPostScore,
    t.PostId, 
    t.Title
FROM #t AS t
OUTER APPLY --This one is fast
(
    SELECT TOP (1) 
        p.Score --Let's get the top score here
    FROM dbo.Posts AS p
    WHERE p.PostTypeId = 1
    AND   p.OwnerUserId = t.Id
    ORDER BY p.Score DESC
) AS pq
OUTER APPLY --This two is slow...
(
    SELECT TOP (1) 
        p.Score
    FROM dbo.Posts AS p
    WHERE p.PostTypeId = 2
    AND   p.OwnerUserId = t.Id
    AND   pq.Score < p.Score --Then use it as a filter down here
    ORDER BY p.Score DESC
) AS pa
ORDER BY t.Reputation DESC;


SELECT 
    t.Id, 
    t.Reputation, 
    ISNULL(pq.Score, 0) AS Score,
    t.PostId, 
    t.Title
INTO #t2
FROM #t AS t
OUTER APPLY --This one is fast
(
    SELECT TOP (1) 
        p.Score --Let's get the top score here
    FROM dbo.Posts AS p
    WHERE p.PostTypeId = 1
    AND   p.OwnerUserId = t.Id
    ORDER BY p.Score DESC
) AS pq;


SELECT 
    t.Id, 
    t.Reputation, 
    ISNULL(pa.Score, t.Score) AS TopPostScore, 
    t.PostId, 
    t.Title
FROM #t2 AS t
OUTER APPLY 
(
    SELECT TOP (1) 
        p.Score
    FROM dbo.Posts AS p
    WHERE p.PostTypeId = 2
    AND   p.OwnerUserId = t.Id
    AND   t.Score < p.Score --Then use it as a filter down here
    ORDER BY p.Score DESC
) AS pa
ORDER BY t.Reputation DESC;



/*
What happened?
 * Index key column order
   * (PostTypeId, Score, OwnerUserId)

Other things we could try:
 * Shuffling index key order, or creating a new index
   * (PostTypeId, OwnerUserId, Score)
 
 * Rewriting the query to use ROW_NUMBER() instead
  * Have to be really careful here, probably use Batch Mode

*/

/*
CREATE TABLE dbo.t
(
id int NOT NULL,
INDEX c CLUSTERED COLUMNSTORE
);
*/

SELECT 
    t.Id, 
    t.Reputation, 
    pa.Score,
    t.PostId, 
    t.Title
FROM #t AS t
LEFT JOIN dbo.t AS tt ON 1 = 0
OUTER APPLY
(
    SELECT 
        rn.*
    FROM 
    (
        SELECT
            p.*,
            ROW_NUMBER()
                OVER
                (
                    PARTITION BY 
                        p.OwnerUserId
                    ORDER BY
                        p.Score DESC
                ) AS n
        FROM dbo.Posts AS p
        WHERE p.PostTypeId IN (1, 2)
    ) AS rn
    WHERE rn.OwnerUserId = t.Id
    AND   rn.n = 1
) AS pa
ORDER BY t.Reputation DESC;


DROP TABLE #t, #t2;

 

SQL Server Filtered Indexes Need Serious Help

Minorly


Filtered indexes are really interesting things. Just slap a where clause on your index definition, and you can do all sorts of helpful stuff:

  • Isolate hot data
  • Make soft delete queries faster
  • Get a histogram specific to the span of data you care about

Among other things, of course. There are some annoying things about them though.

  • They only work with specific ANSI options
  • If you don’t include the filter definition columns in the index, it might not get used
  • They only work when queries use literals, not parameters or variables

Majorly


Part of the optimizer’s process consists of expression matching, where things like computed columns, filtered indexes, and indexed views are considered for use in your query.

I mean, if you have any of them. If you don’t, it probably just stares inwardly for a few nanoseconds, wondering why you don’t care about it.

Something that this part of the process is terrible at is any sort of “advanced” expression matching. It has to be exact, or you get whacked.

Here’s an example:

DROP TABLE IF EXISTS dbo.is_deleted;

CREATE TABLE dbo.is_deleted
(
    id int PRIMARY KEY,
    dt datetime NOT NULL,
    thing1 varchar(50) NOT NULL,
    thing2 varchar(50) NOT NULL,
    is_deleted bit NOT NULL
);

INSERT dbo.is_deleted WITH(TABLOCK)
(
    id,
    dt,
    thing1,
    thing2,
    is_deleted
)
SELECT 
    x.n,
    DATEADD(MINUTE, x.n, GETDATE()),
    SUBSTRING(x.text, 0, 50),
    SUBSTRING(x.text, 0, 50),
    x.n % 2
FROM (
SELECT 
    ROW_NUMBER() OVER 
    (
        ORDER BY 1/0
    ) AS n,
    m.*
FROM sys.messages AS m
) AS x;

CREATE INDEX isd 
ON dbo.is_deleted
    (dt)
INCLUDE
    (is_deleted)
WHERE 
    (is_deleted = 0);

Overly


If you run that setup script, you’ll get yourself a table that’s ripe for a filtered index on the is_deleted column.

But it doesn’t work with every type of query pattern. Some people are super fancy and want to find NOTs!

SELECT 
    COUNT_BIG(*) AS records
FROM dbo.is_deleted AS id
WHERE id.dt >= GETDATE() + 200
AND   (NOT 1 = id.is_deleted)
AND   1 = (SELECT 1);

I have the 1 = (SELECT 1) in there for reasons. But we still get no satisfaction.

SQL Server Query Plan
hurtin

If we try to force the matter, we’ll get an error!

SELECT 
    COUNT_BIG(*) AS records
FROM dbo.is_deleted AS id WITH(INDEX = isd)
WHERE id.dt >= GETDATE() + 200
AND   (NOT 1 = id.is_deleted)
AND   1 = (SELECT 1);

The optimizer says non.

Msg 8622, Level 16, State 1, Line 84
Query processor could not produce a query plan because of the hints defined in this query. 
Resubmit the query without specifying any hints and without using SET FORCEPLAN.

It has no problem with this one, though.

SELECT
    COUNT_BIG(*) AS records
FROM dbo.is_deleted AS id
WHERE id.dt >= GETDATE() + 200
AND   (0 = id.is_deleted)
AND   1 = (SELECT 1);

Underly


It would be nice if there were some more work put into filtered indexes to make them generally more useful.

In much the same way that a general set of contradictions can be detected, simple things like this could be too.

Computed columns have a similar issue, where if the definition is col1 + col2, a query looking at col2 + col1 won’t pick it up.

It’s a darn shame that such potentially powerful tools don’t get much love.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Recompile And Nested Procedures In SQL Server

Rock Sale


While I was answering a question, I had to revisit what happens when using different flavors of recompile hints with stored procedure when they call inner stored procedures. I like when this happens, because there are so many little details I forget.

Anyway, the TL;DR is that if you have nested stored procedures, recompiling only recompiles the outer one. The inner procs — really, I should say modules, because it includes other objects that compile query plans — but hey. Now you know what I should have said.

If you want to play around with the tests, you’ll need to grab sp_BlitzCache. I’m too lazy to write plan cache queries from scratch.

Testament


The procs:

CREATE OR ALTER PROCEDURE dbo.inner_sp
AS
BEGIN

    SELECT
        COUNT_BIG(*) AS records
    FROM sys.master_files AS mf;
END;
GO 

CREATE OR ALTER PROCEDURE dbo.outer_sp
--WITH RECOMPILE /*toggle this to see different behavior*/
AS
BEGIN

    SELECT 
        COUNT_BIG(*) AS records
    FROM sys.databases AS d;
    
    EXEC dbo.inner_sp;

END;
GO

The tests:

--It's helpful to run this before each test to clear out clutter
DBCC FREEPROCCACHE;

--Look at this with and without 
--WITH RECOMPILE in the proc definition
EXEC dbo.outer_sp;

--Take out the proc-level recompile and run this
EXEC dbo.outer_sp WITH RECOMPILE;

--Take out the proc-level recompile and run this
EXEC sp_recompile 'dbo.outer_sp';
EXEC dbo.outer_sp;

--You should run these between each test to verify behavior
--If you just run them here at the end, you'll be disappointed
EXEC sp_BlitzCache 
    @DatabaseName = 'Crap', 
    @QueryFilter = 'procedure', 
    @SkipAnalysis = 1, 
    @HideSummary = 1;

EXEC sp_BlitzCache 
    @DatabaseName = 'Crap', 
    @QueryFilter = 'statement', 
    @SkipAnalysis = 1, 
    @HideSummary = 1;

Whatchalookinat?


After each of these where a recompile is applied, you should see the inner proc/statement in the BlitzCache results, but not the outer proc.

It’s important to understand behavior like this, because recompile hints are most often used to help investigate parameter sniffing issues. If it’s taking place in nested stored procedure calls, you may find yourself with a bunch of extra work to do or needing to re-focus your use of recompile hints.

Of course, this is why I much prefer option recompile hints on problem statements. You get much more reliable behavior.

And, as Paul writes:

For instances running at least SQL Server 2008 build 2746 (Service Pack 1 with Cumulative Update 5), using OPTION (RECOMPILE) has another significant advantage over WITH RECOMPILE: Only OPTION (RECOMPILE) enables the Parameter Embedding Optimization.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

SQL Server Foreign Keys Don’t Always Improve Query Plans

Short and Lousy


This is one of the most frustrating things I’ve seen from the optimizer in quite a while.

Here are a couple tables, with a foreign key between them:

CREATE TABLE dbo.ct(id int PRIMARY KEY, dt datetime);

CREATE TABLE dbo.ct_fk(id int PRIMARY KEY, dt datetime);

ALTER TABLE dbo.ct ADD CONSTRAINT
    ct_c_fk FOREIGN KEY (id) REFERENCES dbo.ct_fk(id);

When we use the EXISTS clause, join elimination occurs normally:

SELECT COUNT_BIG(*) AS [?]
FROM dbo.ct AS c
WHERE EXISTS
(
    SELECT 1/0
    FROM dbo.ct_fk AS cf
    WHERE cf.id = c.id
);
SQL Server Query Plan
all the chickens

But when we use NOT EXISTS, it… doesn’t.

SELECT COUNT_BIG(*) AS [?]
FROM dbo.ct AS c
WHERE NOT EXISTS
(
    SELECT 1/0
    FROM dbo.ct_fk AS cf
    WHERE cf.id = c.id
);
SQL Server Query Plan
?

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Indexed Views As Filtered Indexes In SQL Server

Pssst!


If you landed here from Brent’s weekly links, use this link to get my training for 90% off.

The access is for life, but this coupon code isn’t! Get it while it lasts.

Discount applies at checkout, and you have to purchase everything for it to apply.

A Persistent Frustration


SQL Server comes with some great features for tuning queries:

  • Computed Columns
  • Filtered Indexes
  • Indexed Views

But there’s an interoperability issue when you try to use things together. You can’t create a filtered index with the filter definition on a computed column, nor can you create a filtered index on an indexed view.

If you find yourself backed into a corner, you may need to consider using an indexed view without any aggregation (which is the normal use-case).

Empty Tables


If we try to do something like this, we’ll get an error.

DROP TABLE IF EXISTS dbo.indexed_view;
GO

CREATE TABLE dbo.indexed_view
(
    id int PRIMARY KEY,
    notfizzbuzz AS (id * 2)
);
GO

CREATE INDEX n 
    ON dbo.indexed_view (notfizzbuzz) 
WHERE notfizzbuzz = 0;
GO

Yes, I’m putting the error message here for SEO bucks.

Msg 10609, Level 16, State 1, Line 19
Filtered index 'nfb' cannot be created on table 'dbo.indexed_view' because the column 'notfizzbuzz' in the filter expression is a computed column. 
Rewrite the filter expression so that it does not include this column.

An Indexed View Doesn’t Help


If we run this to create an indexed view on top of our base table, we still can’t create a filtered index, but there’s a different error message.

CREATE OR ALTER VIEW dbo.computed_column
WITH SCHEMABINDING
AS
SELECT
    iv.id, 
    iv.notfizzbuzz
FROM dbo.indexed_view AS iv;
GO 

CREATE UNIQUE CLUSTERED INDEX c 
    ON dbo.computed_column(id);

CREATE INDEX nfb 
    ON dbo.computed_column(notfizzbuzz) 
WHERE notfizzbuzz = 0;
Msg 10610, Level 16, State 1, Line 37
Filtered index 'nfb' cannot be created on object 'dbo.computed_column' because it is not a user table. 
Filtered indexes are only supported on tables. 
If you are trying to create a filtered index on a view, consider creating an indexed view with the filter expression incorporated in the view definition.

But what a thoughtful error message it is! Thanks, whomever wrote that.

Still Needs Help


We can create this indexed view just fine.

CREATE OR ALTER VIEW dbo.computed_column
WITH SCHEMABINDING
AS
SELECT
    iv.id, 
    iv.notfizzbuzz
FROM dbo.indexed_view AS iv
WHERE iv.notfizzbuzz = 0;
GO 

CREATE UNIQUE CLUSTERED INDEX c 
    ON dbo.computed_column(id);

But if we try to select from it, the view is expanded.

SELECT
    cc.id, 
    cc.notfizzbuzz
FROM dbo.computed_column AS cc
WHERE cc.notfizzbuzz = 0;
SQL Server Query Plan
upstate

The issue here is the simple parameterization that is attempted with the trivial plan.

If we run the query like this, and look at the end of the output, we’ll see a message at the bottom that our query is safe for auto (simple) parameterization. This may still happen even if the plan doesn’t remain trivial (more detail at the link above!)

DBCC FREEPROCCACHE;
GO 
DBCC TRACEON(8607, 3604);
GO 
SELECT
    cc.id, 
    cc.notfizzbuzz
FROM dbo.computed_column AS cc
WHERE cc.notfizzbuzz = 0;
DBCC TRACEOFF(8607, 3604);
GO 

********************
** Query marked as Cachable
** Query marked as Safe for Auto-Param
********************

Making It Work


The two ways we can run this query to get the indexed view to be used are like so:

SELECT
    cc.id, 
    cc.notfizzbuzz
FROM dbo.computed_column AS cc WITH(NOEXPAND)
WHERE cc.notfizzbuzz = 0;


SELECT
    cc.id, 
    cc.notfizzbuzz
FROM dbo.computed_column AS cc
WHERE cc.notfizzbuzz = 0
AND   1 = (SELECT 1);
SQL Server Query Plan
thanks i guess

A Closer Look


If we put those two queries through the ringer, we’ll still see auto (simple) parameterization from the first query:

DBCC FREEPROCCACHE;
GO 
DBCC TRACEON(8607, 3604);
GO 
SELECT
    cc.id, 
    cc.notfizzbuzz
FROM dbo.computed_column AS cc WITH(NOEXPAND)
WHERE cc.notfizzbuzz = 0;
GO 
DBCC TRACEOFF(8607, 3604);
GO 

********************
** Query marked as Cachable
** Query marked as Safe for Auto-Param
********************

DBCC FREEPROCCACHE;
GO 
DBCC TRACEON(8607, 3604);
GO 
SELECT
    cc.id, 
    cc.notfizzbuzz
FROM dbo.computed_column AS cc
WHERE cc.notfizzbuzz = 0
AND   1 = (SELECT 1);
GO 
DBCC TRACEOFF(8607, 3604);
GO 

********************
** Query marked as Cachable
********************

It’s goofy, but it’s worth noting. Anyway, if I had to pick one of these methods to get the plan I want, it would be the NOEXPAND version.

Using that hint is the only thing that will allow for statistics to get generated on indexed views.

In case you’re wondering, marking the computed column as PERSISTED doesn’t change the outcome for any of these issues.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Considerations For Implementing Soft Deletes In SQL Server

From The Beginning


Implementing soft deletes for an app that’s been around for a while can be tough. In the same way as implementing Partitioning can be tough to add in later to get data management value from (rebuilding clustered indexes on the scheme, making sure all nonclustered indexes are aligned, and all future indexes are too, and making sure you have sufficient partitions at the beginning and end for data movement).

Boy, I really stressed those parentheses out.

If you do either one from the outset, it’s far less painful to manage. The structural stuff is there for you from the beginning, and you can test different strategies early on before data change become difficult to manage.

Queries


The first and most obvious thing is that all your queries now need to only find data that isn’t deleted.

Almost universally, it’s easier to put views on top of tables that have the appropriate bit search for deleted or not deleted rows than to expect people to remember it.

CREATE VIEW dbo.Users_Active
AS
SELECT 
    u.*
FROM dbo.Users AS u
WHERE u.is_deleted = 0;

CREATE VIEW dbo.Users_Inactive
AS
SELECT 
    u.*
FROM dbo.Users AS u
WHERE u.is_deleted = 1;

It’s not that views have any magical performance properties; they’re just queries after all, but it gives you an explicit data source.

Indexes


Depending on how your other queries search for data, you may need to start accounting for the is_deleted flag in your indexes. This could make a really big difference if the optimizer stops choosing your narrower nonclustered indexes because it hates key lookups.

Typically, other predicates will give you a selective-enough result set that a residual predicate on a bit field won’t make much difference. If you’ve already got a seek to the portion of data you’re interested in and most of it will be not-deleted, who cares?

And let’s be honest, in most implementations deleted rows will be the minority of data, and searches for it will be far less common. Usually it’s just there for an occasional audit.

In adjacent cases where instead of deleted you need to designate things as currently active, and you may have many inactive rows compared to active rows, filtered indexes can be your best friend.

Coming back to the views, I don’t think that making them indexed is necessary by default, but it might be if you’re using forced parameterization and filtered indexes.

CREATE TABLE dbo.Users(id int, is_deleted bit);
GO 

CREATE INDEX u ON dbo.Users (id) WHERE is_deleted = 0;
GO

SELECT 
    u.id, u.is_deleted
FROM dbo.Users AS u
WHERE u.is_deleted = 0;

Under simple parameterization, this can be fine. Under forced parameterization, things can get weird.

SQL Server Query Plan
tutor the tutors

Tables and Tables


In some cases, it might be easier to create tables specifically for deleted rows so you don’t have unnecessary data in your main tables. You can implement this easily enough with after triggers. Just make sure they’re designed to handle multiple rows.

If you want something out of the box, you might mess with:

  • Temporal tables
  • Change Data Capture
  • Change Tracking

However, none of those help you deal with who deleted rows. For that, you’ll need an Audit.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Simplifying Archival Processes In SQL Server Using OUTPUT

Long List Of Demands


Many suggestions for making changes or improvements to a process in a database are met like orders that mean, if not certain, then likely death.

EXT. A WEBEX MEETING

… Erik is smiling, happy to be wrapping up and moving on. Client DBAs and Developers are sweating profusely, mechanical keyboards all heavily armed with Cherry MX Blue switches in an attempt to drown out the part where he talks about NOLOCK hints. For some reason everyone is smoking cigarettes and finalizing their wills

ERIK

You should enable the remote DAC, it’ll be helpful next time you hit THREADPOOL.

There’s a script included to turn it on, and a bunch of examples on how to use it.

IT Manager

By God, there’s no telling who of us will survive this, but we’ll see’er through.

Oh, Now You Need Consistency


Well, friends, I have good news for you. This is an easy one to implement.

Let’s say that in Stack Overflow land, when a user deletes their account we also delete all their votes. That’s not how it works, but it’s how I’m going to show you how to condense what can normally be a difficult process to isolate into a single operation.

First, we need an archive table. So let it be done.

CREATE TABLE dbo.Votes_Archive
(
    Id int NOT NULL,
    PostId int NOT NULL,
    UserId int NULL,
    BountyAmount int NULL,
    VoteTypeId int NOT NULL,
    CreationDate datetime NOT NULL,
    CONSTRAINT PK_VotesArchive_Id
        PRIMARY KEY CLUSTERED (Id ASC)
);

Next, we need a query to, like, do things. I guess.

Normally, you’d insert the rows you want to keep into the archive table, and then remove them from the source table by using a delete … where exists  type query. Usually in a transaction, because you heard you need them.

Savvier practitioners might even use the output clause with their delete to dump things into a #temp table.

But the savviest practitioners will do it all in one go.

Sauvignon Blanc


What a lot of people miss about output is that the results are tabular, like many other things in a database.

Just like you can select from deleted and inserted virtual tables in a trigger, you can select from them in a… select statement.

I’m using a transaction here because I want to roll it back.

BEGIN TRAN

INSERT dbo.Votes_Archive
    (Id, PostId, UserId, BountyAmount, VoteTypeId, CreationDate)
SELECT 
    v.*
FROM 
(
    DELETE v
        OUTPUT
            Deleted.Id,
            Deleted.PostId,
            Deleted.UserId,
            Deleted.BountyAmount,
            Deleted.VoteTypeId,
            Deleted.CreationDate
    FROM dbo.Votes AS v
    WHERE v.UserId = 190597
) AS v;

--Validate stuff
SELECT * FROM dbo.Votes AS v WHERE v.UserId = 190597;

SELECT * FROM dbo.Votes_Archive AS vi;
--Stop validating stuff, nutso
ROLLBACK

So when whoever user 190597 is deletes their account, all their votes are inserted into the archive table at the same time they’re deleted.

The execution plan looks like this:

SQL Server Query Plan
polar seltzer

Which is pretty nifty, and eliminates the need for a lot of jumping around.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Defeating Parameter Sniffing With Dynamic SQL In SQL Server

Enjoy!



Thanks for watching!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Using Plan Guides To Get Around Query Hints In SQL Server

Prophesy As


According to Not-Australians, there used to be a trace flag that would get queries to ignore any supplied hints. It doesn’t work anymore, which sucks, kinda.

Because people do lots of stupid things with hints. Real stupid things. Things you wouldn’t believe the stupid of.

Let’s say, for example, hypothetically of course, that your front end application would add an index hint to every query.

That index hint may or not be helpful to your query in any way. But there it is.

Let’s also posit, using the very depths of our imaginations, that the front end developer was unlikely to change that behavior.

Planning Fields


We’ve got a couple indexes:

CREATE INDEX r 
    ON dbo.Users(Reputation) 
WITH(MAXDOP = 8, SORT_IN_TEMPDB = ON);

CREATE INDEX c 
    ON dbo.Users(CreationDate) 
WITH(MAXDOP = 8, SORT_IN_TEMPDB = ON);

And we’ve got a query that, via an index hint, is being forced to use the wrong index.

DECLARE @Reputation int = 2;
EXEC sp_executesql N'SELECT * FROM dbo.Users WITH (INDEX  = c) WHERE Reputation = @Reputation;',
                   N'@Reputation int',
                   @Reputation;

The ensuing query plan makes no sense whatsoever.

SQL Server Query Plan
i really mean it

The things are all backwards. We scan the entire nonclustered index, and do a lookup to the clustered index just to evaluate the @Reputation predicate.

The idea is bad. Please don’t do the idea.

Guiding Bright


There are two things we could do here. We could hint the query to use the index we want, sure.

But what if we change something about this index, or add another one to the table? We might want the optimizer to have a bit more freedom to choose.

I mean, I know. That has its own risks, but whatever.

We can add a plan guide that looks like this:

EXEC sp_create_plan_guide
@name = N'dammit',
@stmt = N'SELECT * FROM dbo.Users WITH (INDEX  = c) WHERE Reputation = @Reputation;',
@type = N'SQL',
@module_or_batch = NULL,
@params = N'@Reputation int',
@hints =  N'OPTION(TABLE HINT(dbo.Users))';

If we were writing proper queries where tables are aliased, it’d look like this:

EXEC sp_create_plan_guide
@name = N'dammit',
@stmt = N'SELECT u.* FROM dbo.Users AS u WITH (INDEX  = c) WHERE u.Reputation = @Reputation;',
@type = N'SQL',
@module_or_batch = NULL,
@params = N'@Reputation int',
@hints =  N'OPTION(TABLE HINT(u))';

When we re-run our query, things look a lot better:

SQL Server Query Plan
focus

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.