T-SQL – Page 9 – Darling Data

A T-SQL Query To Get The Text Between Two Delimiters In SQL Server

Posted on October 20, 2020March 1, 2024 by Erik Darling

I swear It’s Not String Splitting

I have to do this a fair amount, and I always go look at this little cheat sheet that I have.

Then it occurred to me that it might be worth sharing the details here, in case anyone else runs into the same need.

The way I learned to do it is with SUBSTRING and CHARINDEX, which is a pretty common method.

Why CHARINDEX? Because it accepts an optional 3rd parameter that PATINDEX doesn’t, where you can give it a starting position to search. That comes in really handy! Let’s look at how.

The first thing we need for our test case is the starting point, which I’ve arbitrarily chosen as a colon in error messages.

SELECT *,
       SUBSTRING(
                 m.text, /*First argument*/
                 CHARINDEX(':', m.text), /*Second argument*/
                 LEN(m.text) /*Third argument*/
                 ) AS parsed_string
FROM sys.messages AS m
WHERE m.language_id = 1033
AND   m.text LIKE N'%:%:%';

But the results aren’t exactly what we want! We can still see all the colons.

What we really need is to cut out the semi-colon, which means we need to move one character over.

Keep in mind that this will change based on the length of the string you’re searching for. If you were looking for ‘Erik’, you’d need to move over four positions.

Sometimes it’s easier to remember that by calling LEN on the search string.

SELECT *,
       SUBSTRING(
                 m.text,
                 CHARINDEX(':', m.text) + LEN(':'),
                 LEN(m.text)
                 ) AS parsed_string
FROM sys.messages AS m
WHERE m.language_id = 1033
AND   m.text LIKE N'%:%:%';

That’ll get us to where we need to be for the first position! Now we need to get the text up to the second colon, which is where things get a little more complicated.

Right now, we’re just getting everything through the end of the error message, using LEN as the 3rd argument to SUBTSTRING.

To start with, let’s establish how we can use the third argument in CHARINDEX.

SELECT *,
       SUBSTRING(
                 m.text, 
                 CHARINDEX(':', m.text) + LEN(':'),
                 CHARINDEX(':', m.text, /*!*/CHARINDEX(':', m.text) + LEN(':')/*!*/)
                 ) AS parsed_string
FROM sys.messages AS m
WHERE m.language_id = 1033
AND   m.text LIKE N'%:%:%';

The start of the third argument is going to look nearly identical to the first one, except we’re going to start our search in the string after the first colon.

The code in between the exclamation points is the same as the second argument to SUBSTRING.

That’s because it gives us the correct starting position to start looking for the second colon from.

But, you know, this still doesn’t get us what we need, exactly. We need to chop some characters off.

How many?

I’ll save you some delicate CPU brain cycles: we need to subtract the length of the search string, and then subtract the number of positions in that the first search hit was.

SELECT *,
       SUBSTRING(
                 m.text, /*First argument*/
                 CHARINDEX(':', m.text) + LEN(':'), /*Second argument*/
                 CHARINDEX(':', m.text, CHARINDEX(':', m.text) + LEN(':'))
                 - LEN(':') - CHARINDEX(':', m.text) /*Third argument*/
                 ) AS parsed_string
FROM sys.messages AS m
WHERE m.language_id = 1033
AND   m.text LIKE N'%:%:%';

Which finally gives us what we’re asking for:

Now, this may not make you totally happy. After all, there are still leading and trailing spaces on each line.

If you want to get rid of those, you can either adjust your LEN functions, or you can call TRIM, LTRIM/RTRIM on the final result. I don’t care, really.

They’re your strings. They shouldn’t even be in the database.

Thanks for reading!

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Please Use Block Quotes For Code Comments In Your T-SQL

Posted on October 6, 2020May 16, 2022 by Erik Darling

Options!

When you’re writing code, you have some options when it comes to leaving comments:

Double dashes at the start of a line: -- Comment!
Block quotes around a section of text: /* Comment! */
No comment whatsoever:

The problem with the double-dash method is that there’s no terminator for it. If you end up pasting the query from a DMV where it’s not all nicely formatted, it’s pretty likely going to end up all on a single line.

With a long enough query, it can be a real pain to scroll across looking for comments and separating them to a new line so you can run the query.

What really sucks is that tools that can automatically format T-SQL for you can’t handle this either. If I could just right click and solve the problem, I’d be talking about that instead.

So, please: If you’re going to put comments in your code, use the block quotes.

clearly superior

Thanks for reading!

Going Further

How SQL Server Treats NULLs In Case Expressions

Posted on June 24, 2020May 14, 2022 by Erik Darling

I Forgot

Honest, I did. It happens to the best of us.

Why, just last week I forgot everything for 3 hours.

It was lovely.

These two queries look close enough , but they aren’t.

--Numero Uno
SELECT   CASE v.UserId WHEN NULL 
                       THEN 'NULL!'
                       ELSE 'NOT NULL!'
         END AS Scrubbed,
         v.VoteTypeId,
         COUNT_BIG(*) AS records
FROM     dbo.Votes AS v
GROUP BY CASE v.UserId WHEN NULL 
                       THEN 'NULL!'
                       ELSE 'NOT NULL!'
         END,
         v.VoteTypeId
ORDER BY v.VoteTypeId;

--Numero Dos
SELECT   CASE WHEN v.UserId IS NULL THEN 'NULL!'
              WHEN v.UserId IS NOT NULL THEN 'NOT NULL!'
         END AS Scrubbed,
         v.VoteTypeId,
         COUNT_BIG(*) AS records
FROM     dbo.Votes AS v
GROUP BY CASE WHEN v.UserId IS NULL THEN 'NULL!'
              WHEN v.UserId IS NOT NULL THEN 'NOT NULL!'
         END,
         v.VoteTypeId
ORDER BY v.VoteTypeId;

Results To Gridiot

The first query returns what appears to be rather odd results.

There are many NULLs in this column, and they’re exposed when we run the second query.

See them? All the NULLs? They’re here now.

1-900-MUPPET

The first query is the same thing as saying UserId = NULL.

That doesn’t get you anywhere at all in a database.

What is this, EXCEL?

To judge the NULL and the NOT NULL, you have to use IS NULL and IS NOT NULL.

But I forgot about that in CASE expressions, so I decided to write it down.

Thanks for reading!

Going Further

Forced Parameterization Doesn’t Fix SQL Injection Problems In SQL Server

Posted on May 21, 2020June 24, 2022 by Erik Darling

Short and Sweaty

If you have stored procedures that do things like this:

IF @OwnerUserId IS NOT NULL
   SET @Filter = @Filter + N' AND p.OwnerUserId = ' + RTRIM(@OwnerUserId) + @nl;
IF @CreationDate IS NOT NULL
   SET @Filter = @Filter + N' AND p.CreationDate >= ''' + RTRIM(@CreationDate) + '''' + @nl;
IF @LastActivityDate IS NOT NULL
   SET @Filter = @Filter + N' AND p.LastActivityDate < ''' + RTRIM(@LastActivityDate) + '''' + @nl;
IF @Title IS NOT NULL 
   SET @Filter = @Filter + N' AND p.Title LIKE ''' + N'%' + @Title + N'%''' + @nl;
IF @Body IS NOT NULL
   SET @Filter = @Filter + N' AND p.Body LIKE ''' + N'%' + @Body + N'%'';';    
IF @Filter IS NOT NULL
   SET @SQLString += @Filter;

PRINT @SQLString
EXEC (@SQLString);

Or even application code that builds unparameterized strings, you’ve probably already had someone steal all your company data.

Way to go.

But Seriously

I was asked recently if the forced parameterization setting could prevent SQL injection attacks.

Let’s see what happens! I’m using code from my example here.

EXEC dbo.AwesomeSearchProcedure @OwnerUserId = 35004, 
                                @Title = NULL, 
                                @CreationDate = NULL, 
                                @LastActivityDate = NULL, 
                                @Body = N''' UNION ALL SELECT t.object_id, t.name, NULL, NULL, SCHEMA_NAME(t.schema_id) FROM sys.tables AS t; --';

If we look at the printed output from the procedure, we can see all of the literal values.

SELECT TOP (1000) p.OwnerUserId, p.Title, p.CreationDate, p.LastActivityDate, p.Body
FROM dbo.Posts AS p
WHERE 1 = 1 
AND p.OwnerUserId = 35004
AND p.Body LIKE '%' UNION ALL SELECT t.object_id, t.name, NULL, NULL, SCHEMA_NAME(t.schema_id) FROM sys.tables AS t; --%';

But if we look at the query plan, we can see partial parameterization (formatted a little bit for readability)

dang

where @0 = @1 and p . OwnerUserId = @2 
and p . Body like '%' union all select t . object_id , t . name , null , null , SCHEMA_NAME ( t . schema_id ) from sys . tables as t

Slightly More Interesting

If we change the LIKE predicate on Body to an equality…

IF @Body IS NOT NULL
   SET @Filter = @Filter + N' AND p.Body = ''' + @Body + ';';

The parameterization will change a little bit, but still not fix the SQL injection attempts.

Instead of the ‘%’ literal value after the like, we get @3 — meaning this is the third literal that got parameterized.

dang

where @0 = @1 and p . OwnerUserId = @2 
and p . Body = @3 union all select t . object_id , t . name , null , null , SCHEMA_NAME ( t . schema_id ) from sys . tables as t

But the injecty part of the string is still there, and we get the full list of tables in the database back.

Double Down

If you’d like to learn how to fix tough problems like this, and make your queries stay fast, check out my advanced SQL Server training.

Thanks for reading!

Going Further

Error Handling In SQL Server Cursors

Posted on May 5, 2020May 16, 2022 by Erik Darling

Spit My Last Error

Going Further

If you wanna play with the code I used to try things out on your own, here it is:

CREATE TABLE zero_ten (id INT NOT NULL PRIMARY KEY);

INSERT dbo.zero_ten ( id )
SELECT 0 AS id
UNION ALL 
SELECT TOP (10) ROW_NUMBER() OVER(ORDER BY 1/0)
FROM sys.messages AS m

SELECT * FROM dbo.zero_ten AS zt;
RETURN;

DECLARE c CURSOR LOCAL STATIC FOR
SELECT N'DECLARE @atmosphere INT; SELECT @atmosphere = 1/' + RTRIM(zt.id)
FROM dbo.zero_ten AS zt;

DECLARE @joydivision NVARCHAR(MAX) = N'';
DECLARE @errors TABLE
(
    id INT PRIMARY KEY IDENTITY,
    error_number INT,
    error_severity INT,
    error_state INT,
    error_message NVARCHAR(MAX)
);

OPEN c;
FETCH NEXT FROM c
INTO @joydivision;

WHILE @@FETCH_STATUS = 0
BEGIN
    BEGIN TRY
        RAISERROR(@joydivision, 0, 1) WITH NOWAIT;
        EXEC sys.sp_executesql @joydivision;
        FETCH NEXT FROM c INTO @joydivision;
    END TRY
    BEGIN CATCH
        INSERT @errors ( error_number, error_severity, error_state, error_message )
        SELECT ERROR_NUMBER(), ERROR_SEVERITY(), ERROR_STATE(), ERROR_MESSAGE();
        --FETCH NEXT FROM c INTO @joydivision;
    END CATCH;
END;

SELECT *
FROM @errors AS e;

CLOSE c;
DEALLOCATE c;
GO

A Funny Thing Happened With Create Or Alter In SQL Server

Posted on March 30, 2020May 16, 2022 by Erik Darling

“Funny”

Going Further

An Unfortunate Performance Side Effect Of The OUTPUT Clause In SQL Server

Posted on January 20, 2020May 14, 2022 by Erik Darling

Put Out

I see people using OUTPUT to audit modifications from time to time, often because “triggers are bad” or “triggers are slow”.

Well, sometimes, sure. But using OUTPUT can be a downer, too.

Let’s look at how.

A Process Emerges

Say we’ve got a table that we’re using to track user high scores for their questions.

CREATE TABLE dbo.HighQuestionScores
(
    Id INT PRIMARY KEY CLUSTERED,
    DisplayName NVARCHAR(40) NOT NULL,
    Score BIGINT NOT NULL
);

To test the process, let’s put a single user in the table:

INSERT dbo.HighQuestionScores WITH (TABLOCK)
        (Id, DisplayName, Score)
SELECT u.Id, u.DisplayName, p.Score
FROM dbo.Users AS u
JOIN 
(
    SELECT p.OwnerUserId, 
	       MAX(p.Score) AS Score
    FROM dbo.Posts AS p
    WHERE p.PostTypeId = 1
    GROUP BY p.OwnerUserId
) AS p ON p.OwnerUserId = u.Id
WHERE u.Id = 22656;

To exacerbate the problem, I’m not going to create any helpful indexes here. This is a good virtual reality simulator, because I’ve seen your indexes.

Yes you. Down in front.

The relevant part of the query plan is the scan of the Posts table:

It’s parallel, and takes 1.8 seconds.

Aw, dit

Now let’s add in an OUTPUT clause.

I’m going to skip over inserting the output into any structure, because I want you to understand that the target doesn’t matter.

INSERT dbo.HighQuestionScores WITH (TABLOCK)
        (Id, DisplayName, Score)
OUTPUT Inserted.Id,
       Inserted.DisplayName,
       Inserted.Score
SELECT u.Id, u.DisplayName, p.Score
FROM dbo.Users AS u
JOIN 
(
    SELECT p.OwnerUserId, MAX(p.Score) AS Score
    FROM dbo.Posts AS p
    WHERE p.PostTypeId = 1
    GROUP BY p.OwnerUserId
) AS p ON p.OwnerUserId = u.Id
WHERE u.Id = 22656;

The relevant part of the plan now looks like this:

We’ve lost parallelism, and inspecting the properties of the Insert operator tells us why:

SQL Server Query Plan Tool Tip — Less Successful

We’ve got a Non Parallel Plan Reason. Why aren’t there any spaces? I don’t know.

Why can’t that go parallel? I also don’t know.

What About Triggers?

If we create a minimal trigger on the table, we can see if it has the same overhead.

CREATE OR ALTER TRIGGER dbo.hqs_insert ON dbo.HighQuestionScores
AFTER INSERT
AS
BEGIN
SET NOCOUNT ON;

    SELECT Inserted.Id,
           Inserted.DisplayName,
           Inserted.Score
    FROM Inserted;

END

Let’s go back to the original insert, without the output! We care about two things:

Is the parallel portion of the insert plan still there?
Is there any limitation on parallelism with the Inserted (and by extension, Deleted) virtual tables?

The answers are mostly positive, too. The insert plan can still use parallelism.

I’m not gonna post the same picture here, you can scroll back fondly.

Though the select from the Inserted table within the trigger doesn’t go parallel, it doesn’t appear to limit parallelism for the entire plan. It does appear that reads from the Inserted table can’t use parallelism (sort of like the table variable in a MSTVF).

Let’s modify the trigger slightly:

CREATE OR ALTER TRIGGER dbo.hqs_insert ON dbo.HighQuestionScores
AFTER INSERT
AS
BEGIN
SET NOCOUNT ON;

    DECLARE @Id INT
	DECLARE @DisplayName NVARCHAR(40)
	DECLARE @Score BIGINT

    SELECT @Id = Inserted.Id,
           @DisplayName = Inserted.DisplayName,
           @Score = Inserted.Score
    FROM Inserted
	JOIN dbo.Comments AS c 
	    ON c.UserId = Inserted.Id;

END

And for variety, let’s insert a lot more data into our table:

TRUNCATE TABLE dbo.HighQuestionScores;

INSERT dbo.HighQuestionScores WITH (TABLOCK)
        (Id, DisplayName, Score)
SELECT u.Id, u.DisplayName, p.Score
FROM dbo.Users AS u
JOIN 
(
    SELECT p.OwnerUserId, MAX(p.Score) AS Score
    FROM dbo.Posts AS p
    WHERE p.PostTypeId = 1
    GROUP BY p.OwnerUserId
) AS p ON p.OwnerUserId = u.Id
WHERE u.Id < 500000;

Here’s the query plan:

The read from Inserted is serial, but the remainder of the plan fully embraces parallelism like a long lost donut.

Togetherness

Given a well-tuned workload, you may not notice any particular overhead from using OUTPUT to audit certain actions.

Of course, if you’re using them alongside large inserts, and those large inserts happen to run for longer than you’d like, it might be time to see how long they take sans the OUTPUT clause. It’s entirely possible that using a trigger instead would cause fewer performance issues.

Thanks for reading!

Going Further

How To Get Greatest And Least Values In A SQL Server Query

Posted on December 11, 2019May 16, 2022 by Erik Darling

Update: Azure SQL DB And SQL Server 2022 Will Have These

So be on the lookout.

Spinning Out Of Control

It’s sorta kinda pretty crazy when every major database platform has something implemented, and SQL Server doesn’t.

Geez, even MySQL.

But a fairly common need in databases is to find the max value from two columns.

Maybe even across two tables.

Track V

For one table, it’s fairly straight forward.

SELECT     MAX(x.CombinedDate) AS greatest
FROM       dbo.Users AS u
CROSS APPLY( VALUES( u.CreationDate ), ( u.LastAccessDate )) AS x( CombinedDate );

We’re using our old friend cross apply with a values clause to create on “virtual” column from two date columns.

As far as indexing goes, I couldn’t find any performance difference between these two. They both take about 1 second.

CREATE INDEX smoochies ON dbo.Users(CreationDate, LastAccessDate);
CREATE INDEX woochies ON dbo.Users(LastAccessDate, CreationDate);

Indexing strategy will likely rely on other local factors, like any where clause filtering.

Monolith

A similar pattern will work across two tables:

SELECT     MAX(x.Score)
FROM       dbo.Posts AS p
JOIN       dbo.Comments AS c
    ON p.Id = c.PostId
CROSS APPLY( VALUES( p.Score ), ( c.Score )) AS x ( Score );

Though this is the worst possible way to write the query. It runs for around 10 seconds.

The indexes I have for this query look like so:

CREATE INDEX thicc ON dbo.Posts(Id, Score);
CREATE INDEX milky ON dbo.Comments(PostId, Score);

Reversing the key column order helps — the query runs in about 3 seconds, but I need to force index usage.

Of course, this is still the second worst way to write this query.

The best way I’ve found to express this query looks like so:

SELECT MAX(x.Score)
FROM
    (
     SELECT MAX(p.Score) AS Score
     FROM dbo.Posts AS p
    ) AS p
CROSS JOIN
    (
     SELECT MAX(c.Score) AS Score
     FROM dbo.Comments AS c
    ) AS c
CROSS APPLY( VALUES( p.Score ), ( c.Score )) AS x( Score );

The cross join here isn’t harmful because we only produce two rows.

And it finishes before we have time to move the mouse.

Likewise, the faster pattern for a single table looks like this:

SELECT MAX(x.Dates)
FROM
    (
     SELECT MAX(u.CreationDate) CreationDate
     FROM dbo.Users AS u
    ) AS uc
CROSS JOIN
    (
     SELECT MAX(u.LastAccessDate) LastAccessDate
     FROM dbo.Users AS u
    ) AS ul
CROSS APPLY (VALUES (uc.CreationDate), (ul.LastAccessDate)) AS x (Dates);

Because we’re able to index for each MAX

CREATE INDEX smoochies ON dbo.Users(CreationDate);
CREATE INDEX woochies ON dbo.Users(LastAccessDate);

Of course, not every query can be written like this, or indexed for perfectly, but it’s gruel for thought if you need specific queries like this to be as fast as possible.

Thanks for reading!

Going Further

Why Functions Are Bad in SQL Server Where Clauses

Posted on December 5, 2019May 14, 2022 by Erik Darling

Raised Right

It seems like every time I check out a server, the query plans are a nightmare. Users are freaking out, and management’s coffee is more Irish than Colombian.

Many times, the issue is that people are using presentation layer functions for relational processes. The where clause, joins, group by, and order by parts of a query.

Think about built-in string and date functions, wrapped around columns, and the problems they can cause.

These are things you should actively be targeting in existing code, and fighting to keep out of new code.

Nooptional

When you’re trying to get rid of them, remember your better options

Cleaning data on input, or via triggers: Better than wrapping everything in RTRIM/LTRIM
Using computed columns: Better than relying on runtime calculations like DATEADD/DATEDIFF
Breaking queries up: Use UNION ALL to query for either outcome (think ISNULL)
Using indexed views: If you need to calculate things in columns across tables
Creating reporting tables: Sometimes it’s easier to denormalize a bit to make writing and indexing easier
Using #temp tables: If you have data that you need to persist a calculation in and the query to generate it is complicated

Note the things I’m not suggesting here:

CTEs: Don’t materialize anything
@table variables: Cause more problems than they solve
Views: Don’t materialize unless indexed
Functions: Just no, thanks

More Work

Yes, finding and fixing this stuff is more work for you. But it’s a whole lot less work for the optimizer, and your server, when you’re done.

If that’s the kind of thing you need help with, drop me a line.

Thanks for reading!

Going Further

When SELECT * Doesn’t Matter In SQL Server Queries

Posted on November 18, 2019August 6, 2023 by Erik Darling

Pavlovian

There’s a strange response to some things in the SQL Server community that borders on religious fervor. I once worked with someone who chastised me for having SELECT * in some places in the Blitz scripts. It was odd and awkward.

Odd because this person was the most Senior DBA in the company, and awkward because they didn’t believe me that it didn’t matter in some cases.

People care about SELECT * for many valid reasons, but context is everything.

One For The Money

The first place it doesn’t matter is EXISTS. Take this index and this query:

CREATE INDEX specatular_blob ON dbo.Posts(PostTypeId, OwnerUserId);

SELECT COUNT(*) AS records
FROM   dbo.Users AS u
WHERE  EXISTS ( SELECT * 
                FROM dbo.Posts AS p 
				WHERE p.OwnerUserId = u.Id 
				AND p.PostTypeId = 2 );

The relevant part of the query plan looks like this:

We do a seek into the index we created on the two columns in our WHERE clause. We didn’t have to go back to the clustered index for everything else in the table.

That’s easy enough to prove if we only run the subquery — we have to change it a little bit, but the plan tells us what we need.

SELECT *
FROM   dbo.Posts AS p
WHERE  p.OwnerUserId = 22656
AND    p.PostTypeId = 2;

This time we do need the clustered index:

You can even change it to something that would normally throw an error:

SELECT COUNT(*) AS records
FROM   dbo.Users AS u
WHERE  EXISTS ( SELECT 1/0 
                FROM dbo.Posts AS p 
				WHERE p.OwnerUserId = u.Id 
				AND p.PostTypeId = 2 );

Two For Completeness

Another example is in derived tables, joins, and apply.

Take these two queries. The first one only selects columns in our nonclustered index (same as above).

The second one actually does a SELECT *.

/*selective*/
SELECT     u.Id,
           u.DisplayName,
		   ca.OwnerUserId, --I am only selecting columns in our index
		   ca.PostTypeId,
		   ca.Id
FROM       dbo.Users AS u
CROSS APPLY( SELECT TOP (1) * --I am select *
             FROM dbo.Posts AS p 
			 WHERE p.OwnerUserId = u.Id 
			 AND p.PostTypeId = 2 
			 ORDER BY p.OwnerUserId DESC, p.Id DESC) AS ca
WHERE U.Reputation >= 100000;

/*less so*/
SELECT     u.Id,
           u.DisplayName,
		   ca.* --I am select *
FROM       dbo.Users AS u
CROSS APPLY( SELECT TOP (1) * --I am select *
             FROM dbo.Posts AS p 
			 WHERE p.OwnerUserId = u.Id 
			 AND p.PostTypeId = 2 
			 ORDER BY p.OwnerUserId DESC, p.Id DESC) AS ca
WHERE U.Reputation >= 100000;

The first query only touches our narrow nonclustered index:

The second query does a key lookup, because we really do select everything.

Trash Pile

I know, you’ve been well-conditioned to freak out about certain things. I’m here to help.

Not every SELECT * needs to be served a stake through the heart and beheading.

Thanks for reading!