Why Comparing Columns Doesn’t Always Seek In SQL Server

Adjoining


When writing queries, sometimes you have the pleasure of being able to pass a literal value, parameter,  or scalar expression as a predicate.

With a suitable index in place, any one of them can seek appropriately to the row(s) you care about.

But what about when you need to compare the contents of one column to another?

It gets a little bit more complicated.

All About Algorithms


Take this query to start with, joining Users to Posts.

SELECT
    c = COUNT_BIG(*)
FROM dbo.Users AS u
JOIN dbo.Posts AS p
    ON p.OwnerUserId = u.Id;

The OwnerUserId column doesn’t have an index on it, but the Id column on Users it the primary key and the clustered index.

But the type of join that’s chosen is Hash, and since there’s no where clause, there’s no predicate to apply to either table for filtering.

SQL Server Query Plan
jumbo

This is complicated slightly by the Bitmap, which is created on the OwnerUserId column from the Posts table and applied to the Id column from the Users table as an early filter.

However, it’s a useless Bitmap, and Bitmaps don’t really seek anyway.

The same pattern can generally be observed with Merge Joins.  Where things are a bit different is with Nested Loops.

Shoop Da Loop


If we use a query hint, we can see what would happen with a Nested Loops Join.

SELECT
    c = COUNT_BIG(*)
FROM dbo.Users AS u
JOIN dbo.Posts AS p
    ON p.OwnerUserId = u.Id
OPTION(LOOP JOIN);

The plan looks like this now, with a Seek on the Users table.

SQL Server Query Plan
petting motions

The reason is that this flavor of Nested Loops, known as Apply Nested Loops, takes each row from the outer input and uses it as a scalar operator on the inner input.

An example of Regular Joe Nested Loops™ looks like this:

SELECT
    c = COUNT_BIG(*)
FROM dbo.Users AS u
WHERE u.Reputation = 
(
    SELECT
        MIN(p.Score)
    FROM dbo.Posts AS p
);

Where the predicate is applied at the Nested Loops operator:

SQL Server Query Plan
and bert

Like most things, indexing is key, but there are limits.

Innermost


Let’s create this index:

CREATE INDEX ud ON dbo.Users(UpVotes, DownVotes);

And run this query:

SELECT
    c = COUNT_BIG(*)
FROM dbo.Users AS u
WHERE u.UpVotes = u.DownVotes;

The resulting query plan looks like this:

SQL Server Query Plan
did a cuss

But what other choice is there? If we want a seek, we need a particular thing or things to seek to.

SELECT
    c = COUNT_BIG(*)
FROM dbo.Users AS u
WHERE u.UpVotes = u.DownVotes
AND   u.UpVotes = 1;
SQL Server Query Plan
name game

We seek to everyone with an UpVote of 1, and then somewhat awkwardly search the DownVotes column for values >= 1 and <= 1.

But again, these are specific values we can search for.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Why Large Modifications Are Never Fast In SQL Server

Well, Stop


How many times have you seen something like this?

SQL Server Query Plan
honeypot

You can replace “Update” with “Insert” or “Delete”, but the story is the same.

Large modifications are typically not fast modifications, for a number of reasons. To understand why, we need to think about the process as a whole.

Modi-Fi


Why large modifications are slow can root-cause to a number of places:

  1. Updates and Deletes run single-threaded, no matter what
  2. Inserts can go parallel, but the rules are often prohibitive
  3. Halloween Protection Spools can be really slow
  4. You probably have a bunch of nonclustered indexes that need to be read so that…
  5. All those indexes can be modified, and written to the transaction log as well
  6. If your database is synchronously replicated anywhere, you have to wait on that ack
  7. There are no Batch Mode modifications

Usually when modifications need to hit a significant number of rows, I want to think about batching things. That might not always be possible, but it’s certainly a kinder way to do things.

If you can’t batch inserts and deletes, you may want to think about using partitioning to quickly switch data in and out.

And of course, don’t use MERGE.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

A Not So Silent Bug With Filtered Indexes On Computed Columns In SQL Server

Bugbash


At some point in the past, I blogged about a silent bug with computed columns and clustered column store indexes.

In this post, I’m going to take a quick look at a very loud bug.

Normally, you can’t add a filtered index to a computed column. I’ve always hated that limitation. How nice would that be for so many currently difficult tasks?

I wonder if Oracle…

You Can’t Do That On Television


If you run this code to completion — and boy are you missing out if you don’t — you get an error.

CREATE TABLE dbo.ohno
(
    id int NOT NULL, 
    crap AS id * 2
);
GO 

CREATE NONCLUSTERED INDEX c 
ON dbo.ohno
    (crap) 
WHERE crap > 0;
GO

Here’s the error:

Msg 10609, Level 16, State 1, Line 13
Filtered index 'c' cannot be created on table 'dbo.ohno' because the column 'crap' in the filter expression is a computed column. 
Rewrite the filter expression so that it does not include this column.

Okay, great! Works on my machine.

Kapow, Kapow


However, you can totally create this table using the inline index creation syntax.

CREATE TABLE dbo.ohyeah
(
    id int NOT NULL, 
    crap AS id * 2,
    INDEX c 
        (crap) 
    WHERE crap > 0
);
GO

However,  if you try to query the table, you’re met with a buncha errors.

SELECT
    id
FROM dbo.ohyeah AS o;

SELECT
c = COUNT_BIG(*)
FROM dbo.ohyeah AS o;

Even without explicitly referencing the computed column, you get this error message.

Msg 207, Level 16, State 1, Procedure ohyeah, Line 1 [Batch Start Line 29]
Invalid column name 'crap'.
Msg 207, Level 16, State 1, Procedure ohyeah, Line 1 [Batch Start Line 29]
Invalid column name 'crap'.
Msg 207, Level 16, State 1, Procedure ohyeah, Line 1 [Batch Start Line 29]
Invalid column name 'crap'.
Msg 207, Level 16, State 1, Procedure ohyeah, Line 1 [Batch Start Line 29]
Invalid column name 'crap'.
Msg 207, Level 16, State 1, Procedure ohyeah, Line 1 [Batch Start Line 29]
Invalid column name 'crap'.
Msg 4184, Level 16, State 2, Line 30
Cannot retrieve table data for the query operation because the table "dbo.ohyeah" schema is being altered too frequently. 
Because the table "dbo.ohyeah" contains a filtered index or filtered statistics, changes to the table schema require a refresh of all table data. 
Retry the query operation, and if the problem persists, use SQL Server Profiler to identify what schema-altering operations are occurring.

You see what? See you what that error message says? “[I[f the problem persists, use SQL Server Profiler to identify what schema-altering operations are occurring”

Use SQL Server Profiler.

Take that, Extended Events.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Don’t Rely On Square Brackets To Protect You From SQL Injection: Use QUOTENAME Instead

Uno Mal


I see a lot of scripts on the internet that use dynamic SQL, but leave people wide open to SQL injection attacks.

In many cases they’re probably harmless, hitting DMVs, object names, etc. But they set a bad example. From there, people will adapt whatever dynamic SQL worked elsewhere to something they’re currently working on.

Here’s a simple script to show you how just sticking brackets into a string doesn’t protect you from SQL injection:

DROP TABLE IF EXISTS #t
CREATE TABLE #t(id int);

DECLARE 
    @s nvarchar(max) = N'[' + N'PRINT 1] DROP TABLE #t;--' + N']';

PRINT @s

EXEC sys.sp_executesql
    @s;

SELECT
    *
FROM #t AS t;
GO 

DROP TABLE IF EXISTS #t
CREATE TABLE #t(id int);

DECLARE 
    @s nvarchar(max) = QUOTENAME(N'PRINT 1] DROP TABLE #t;--')

PRINT @s

EXEC sys.sp_executesql
    @s;

SELECT
    *
FROM #t AS t;
GO

You can run this anywhere, and the results look like this:

[PRINT 1] DROP TABLE #t;--]
Msg 2812, Level 16, State 62, Line 572
Could not find stored procedure 'PRINT 1'.
Msg 208, Level 16, State 0, Line 583
Invalid object name '#t'.
[PRINT 1]] DROP TABLE #t;--]
Msg 2812, Level 16, State 62, Line 587
Could not find stored procedure 'PRINT 1] DROP TABLE #t;--'.

In the section where square brackets were used, the temp table #t got dropped. In the section where QUOTENAME was used, it wasn’t.

When you’re writing dynamic SQL, it’s important to make it as safe as possible. Part of that is avoiding the square bracket trap.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

A Couple Quick Notes On Using XML For String Aggregation In SQL Server

Cat Skins


If you’re on a new-ish version of SQL Server, using STRING_AGG is your best bet for this. Please use it instead.

Unlike STRING_SPLIT, it’s not compatibility-walled. You don’t need to be on level 130+ to use it.

If you’re on an earlier version, your most reliable bet is using XML. Using the local variable method can have quite unexpected results at times. I’ve seen it go from returning a full list of values to only returning the last value more times than I can count.

Let’s look at the XML version, though. Just in case you’re not on SQL Server 2017.

Weirdoverse


The purpose of these queries is to show you hot to remove XML elements, and handle XML control characters like &, <, >, etc. All of these results return a single row, just to keep the examples simple.

SELECT
    x = 
    (
        SELECT 
                b.Name
        FROM dbo.Badges AS b
        WHERE b.Id = 100564
        FOR XML 
            PATH('')
    );

This will return two things we don’t want:

  1. XML elements
  2. An ampersand turned into &amp;
monkeybone

To get rid of the XML elements, we can alias the inner results as [text()]

SELECT
    x = 
    (
        SELECT 
            [text()] =
                b.Name
        FROM dbo.Badges AS b
        WHERE b.Id = 100564
        FOR XML 
            PATH('')
    );

That will give us this back, still with the ampersand all mangled up.

malt shop

To fix that, we need to add a little bit to the XML-ing:

SELECT
   x = 
   (
       SELECT 
           [text()] = 
               b.Name
       FROM dbo.Badges AS b
       WHERE b.Id = 100564
       FOR XML 
           PATH(''),
           TYPE
   ).value
     (
         './text()[1]', 
         'nvarchar(max)'
     );

First, we need to add TYPE to the XML PATH syntax. That’ll give us an XML typed return type to use the .value method on.

From there, we can grab the text element, and give it a data type. You don’t specifically need to get ./text()[1] though, but I tend to use it because I’ve seen some very weird performance issues when using less verbose expressions like .[1] or just . instead.

Plans


Here are the plan differences, which are negligible for a single row.

SQL Server Query Plan
monikers

Of course, local factors may require deviating from what generally works best.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Read Committed Is A Garbage Isolation Level In SQL Server

Yep.



My friend Forrest ran into a similarly fun problem, too.

Thanks for watching!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Collation Is Weird In SQL Server

Particular String


Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

A Silent Bug With Clustered Column Store Indexes And Computed Columns In SQL Server 2019

Oh Boy


This bug has been reported to the proper authorities, and this blog post is not to dunk on the Summer Intern responsible for column store indexes, but to warn you to not do this, because I don’t know what it will break under the covers.

If you read the documentation for column store indexes, it says that column store indexes can’t be created on persisted computed columns.

And that’s true. If we step through this script, creating the column store index will fail.

/*Bye Felicia*/
DROP TABLE IF EXISTS dbo.cci;
GO 

/*Hi Felicia*/
CREATE TABLE dbo.cci(id int, fk int, sd date, ed datetime);
GO 

/*COMPUTER!*/
ALTER TABLE dbo.cci ADD cc AS id * 2;
ALTER TABLE dbo.cci ALTER COLUMN cc ADD PERSISTED;
GO 

/*Rats*/
CREATE CLUSTERED COLUMNSTORE INDEX c ON dbo.cci;
GO

The error message we get is pretty clear:

Msg 35307, Level 16, State 1, Line 76
The statement failed because column 'cc' on table 'cci' is a computed column. 
Columnstore index cannot include a computed column implicitly or explicitly.

80 Blocks from Tiffany’s


However, if we change the order of things a little bit, we’ll find that we can add a persisted computed column to a table with a clustered column store index on it.

/*Bye Felicia*/
DROP TABLE IF EXISTS dbo.cci;
GO 

/*Hi Felicia*/
CREATE TABLE dbo.cci(id int, fk int, sd date, ed datetime, INDEX c CLUSTERED COLUMNSTORE);
GO 

/*???*/
ALTER TABLE dbo.cci ADD cc AS id * 2;
ALTER TABLE dbo.cci ALTER COLUMN cc ADD PERSISTED;
GO 

/*With data?*/
INSERT dbo.cci
(
    id, fk, sd, ed
)
VALUES
(
    10, 10, GETDATE(), GETDATE()  
);

/*yepyepyepyepyepyep*/
SELECT
    c.*
FROM dbo.cci AS c

/*Eh?*/
ALTER INDEX c ON dbo.cci REBUILD;
ALTER INDEX c ON dbo.cci REORGANIZE;
DBCC CHECKTABLE('dbo.cci');

And everything seems to work. However, if we drop the column store index, it can’t be re-created.

The key here is that the computed column is added, and then the persisted attribute is added second. That is where an error should be thrown.

But what do I know? I’m just a bouncer.

Bomber


Again, I’m telling you not to do this. I’m telling you that it might break things in gruesome and horrible ways.

I don’t think that this is a preview of upcoming support for persisted computed columns in clustered column store indexes.

Thanks for reading!

p.s. Really, don’t do this

p.p.s. If you do this you’re asking for trouble

p.p.p.s. This was fixed in CU12

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

sp_QuickieStore: Searching For Wait Stats

Money Money


Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

No THREADPOOL Waits In Query Store

Yep Nope


Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.