T-SQL Archives | Page 8 of 10

Making The Most Of Temp Tables Part 4: Batch Mode

Posted on December 4, 2020May 16, 2022 by Erik Darling

Le Big Zoom Zoom

When you have queries that need to process a lot of data, and probably do some aggregations over that lot-of-data, batch mode is usually the thing you want.

Originally introduced to accompany column store indexes, it works by allowing CPUs to apply instructions to up to 900 rows at a time.

It’s a great thing to have in your corner when you’re tuning queries that do a lot of work, especially if you find yourself dealing with pesky parallel exchanges.

Oh, Yeah

One way to get that to happen is to use a temp table with a column store index on it.

SELECT 
    v.UserId, 
    SUM(v.BountyAmount) AS SumBounty
FROM dbo.Comments AS c
JOIN dbo.Votes AS v
    ON  v.PostId = c.PostId
    AND v.UserId = c.UserId
GROUP BY v.UserId
ORDER BY SumBounty DESC;

CREATE TABLE #t(id INT, INDEX c CLUSTERED COLUMNSTORE);

SELECT 
    v.UserId, 
    SUM(v.BountyAmount) AS SumBounty
FROM dbo.Comments AS c
JOIN dbo.Votes AS v
    ON  v.PostId = c.PostId
    AND v.UserId = c.UserId
LEFT JOIN #t AS t 
    ON 1 = 0
GROUP BY v.UserId
ORDER BY SumBounty DESC;

Keep in mind, this trick won’t work if you’re on SQL Server 2019 and using in memory tempdb. But aside from that, you’re free to rock and roll with it.

If you end up using this enough, you may just wanna create a real table to use, anyway.

Remarkable!

If we look at the end (or beginning, depending on how you read your query plans) just to see the final times, there’s a pretty solid difference.

SQL Server Query Plan — you can’t make me

The first query takes around 10 seconds, and the second query takes around 4 seconds. That’s a pretty handsome improvement without touching anything else.

Thanks for reading!

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Making The Most Of Temp Tables Part 3: More Opportune Indexes To Make Queries Go Faster

Posted on December 3, 2020May 16, 2022 by Erik Darling

I Know You

You have too many indexes on too many tables already, and the thought of adding more fills you with a dread that has a first, middle, last, and even a confirmation name.

This is another place where temp tables can save your bacon, because as soon as the query is done they basically disappear.

Forever. Goodbye.

Off to buy a pack of smokes.

That Yesterday

In yesterday’s post, we looked at how a temp table can help you materialize an expression that would otherwise be awkward to join on.

If we take that same query, we can see how using the temp table simplifies indexing.

SELECT
    p.OwnerUserId,
    SUM(p.Score) AS TotalScore,
    COUNT_BIG(*) AS records,
    CASE WHEN p.PostTypeId = 1 
         THEN p.OwnerUserId
         WHEN p.PostTypeId = 2
         THEN p.LastEditorUserId
    END AS JoinKey
INTO #Posts
FROM dbo.Posts AS p
WHERE p.PostTypeId IN (1, 2)
AND   p.Score > 100
GROUP BY CASE
             WHEN p.PostTypeId = 1 
             THEN p.OwnerUserId
             WHEN p.PostTypeId = 2 
             THEN p.LastEditorUserId
         END,
         p.OwnerUserId;

CREATE CLUSTERED INDEX c ON #Posts(JoinKey);

SELECT *
FROM #Posts AS p
WHERE EXISTS
(
    SELECT 1/0
    FROM dbo.Users AS u
    WHERE p.JoinKey = u.Id
);

Rather than have to worry about how to handle a bunch of columns across the where and join and select, we can just stick a clustered index on the one column we care about doing anything relational with to get the final result.

Thanks for reading!

Going Further

Making The Most Of SQL Server Temporary Tables Part 2: Materializing Expressions

Posted on December 2, 2020May 16, 2022 by Erik Darling

Bad Data

A lot of the time when I see queries that are written with all sorts of gymnastics in the join or where clause and I ask some questions about it, people usually start complaining about the design of the table.

That’s fine, but when I ask about changing the design, everyone gets quiet. Normalizing tables, especially for Applications Of A Certain Age™ can be a tremendously painful project. This is why it’s worth it to get things right the first time. Simple!

Rather than make someone re-design their schema in front of me, often times a temp table is a good workaround.

Egg Splat

Let’s say we have a query that looks like this. Before you laugh, and you have every right to laugh, keep in mind that I see queries like this all the time.

They don’t have to be this weird to qualify. You can try this if you have functions like ISNULL, SUBSTRING, REPLACE, or whatever in joins and where clauses, too.

SELECT
    p.OwnerUserId,
    SUM(p.Score) AS TotalScore,
    COUNT_BIG(*) AS records
FROM dbo.Users AS u
JOIN dbo.Posts AS p
    ON u.Id = CASE 
                   WHEN p.PostTypeId = 1 
                   THEN p.OwnerUserId
                   WHEN p.PostTypeId = 2
                   THEN p.LastEditorUserId
              END
WHERE p.PostTypeId IN (1, 2)
AND   p.Score > 100
GROUP BY p.OwnerUserId;

There’s not a great way to index for this, and sure, we could rewrite it as a UNION ALL, but then we’d have two queries to index for.

Sometimes getting people to add indexes is hard, too.

People are weird. All day weird.

Steak Splat

You can replace it with a query like this, which also allows you to index a single column in a temp table to do your correlation.

SELECT
    p.OwnerUserId,
    SUM(p.Score) AS TotalScore,
    COUNT_BIG(*) AS records,
    CASE WHEN p.PostTypeId = 1 
         THEN p.OwnerUserId
         WHEN p.PostTypeId = 2
         THEN p.LastEditorUserId
    END AS JoinKey
INTO #Posts
FROM dbo.Posts AS p
WHERE p.PostTypeId IN (1, 2)
AND   p.Score > 100
GROUP BY CASE
             WHEN p.PostTypeId = 1 
             THEN p.OwnerUserId
             WHEN p.PostTypeId = 2 
             THEN p.LastEditorUserId
         END,
         p.OwnerUserId;

SELECT *
FROM #Posts AS p
WHERE EXISTS
(
    SELECT 1/0
    FROM dbo.Users AS u
    WHERE p.JoinKey = u.Id
);

Remember that temp tables are like a second chance to get schema right. Don’t waste those precious chances.

Thanks for reading!

Going Further

Making The Most Of Temp Tables In SQL Server Part 1: Fully Parallel Inserts

Posted on December 1, 2020May 16, 2022 by Erik Darling

Sing Along

If you have a workload that uses #temp tables to stage intermediate results, and you probably do because you’re smart, it might be worth taking advantage of being able to insert into the #temp table in parallel.

Remember that you can’t insert into @table variables in parallel, unless you’re extra sneaky. Don’t start.

If your code is already using the SELECT ... INTO #some_table pattern, you’re probably already getting parallel inserts. But if you’re following the INSERT ... SELECT ... pattern, you’re probably not, and, well, that could be holding you back.

Pile On

Of course, there are some limitations. If your temp table has indexes, primary keys, or an identity column, you won’t get the parallel insert no matter how hard you try.

The demo code is available here if you’d like to test it out.

The first thing to note is that inserting into an indexed temp table, parallel or not, does slow things down. If your goal is the fastest possible insert, you may want to create the index later.

No Talent

When it comes to parallel inserts, you do need the TABLOCK, or TABLOCKX hint to get it, e.g. INSERT #tp WITH(TABLOCK) which is sort of annoying.

But you know. It’s the little things we do that often end up making the biggest differences. Another little thing we may need to tinker with is DOP.

Here are the query plans for 3 fully parallel inserts into an empty, index-less temp #table. Note the execution times dropping as DOP increases. At DOP 4, the insert really isn’t any faster than the serial insert.

If you start experimenting with this trick, and don’t see noticeable improvements at your current DOP, you may need to bump it up to see throughput increases.

Also remember that if you’re doing this with clustered column store indexes, it can definitely make things worse.

Page Supplier

Though the speed ups above at higher DOPs are largely efficiency boosters while reading from the Posts table, the speed does stay consistent through the insert.

If we crank one of the queries that gets a serial insert up to DOP 12, we lose some speed when we hit the table.

Next time you’re tuning a query and want to drop some data into a temp table, you should experiment with this technique.

Thanks for reading!

Going Further

An Edge Case When Working With Date Parameters In SQL Server

Posted on November 26, 2020May 16, 2022 by Erik Darling

Wheeze Man

When people tell you that working with correct data types is important, it’s for a variety of very good reasons.

Not only can you avoid performance issues, but you can avoid strange query plan distractions, too.

Let’s look at an example for when you use date parameters against datetime columns.

Wrong And Mean

Index from outta nowhere pow!

CREATE INDEX pe ON dbo.Posts(LastEditDate);

The important thing about the LastEditDate column in the Posts table is that it’s nullable.

Not all posts will get edited. Especially mine. They’re always correct the first time.

Basically read only, if we’re being honest about things.

Or maybe it’s about 50/50.

If your columns aren’t nullable, you’ll run into far fewer problems and ambiguities.

I’d like a new data type called ABYSS. Or maybe VOID.

The Problem: Wrong Data Type And NULL Checks

DECLARE @d date = '20170601';
DECLARE @sql nvarchar(MAX) = N'
SELECT
    COUNT_BIG(*) AS records
FROM dbo.Posts AS p
WHERE p.LastEditDate > @d
AND   p.LastEditDate IS NOT NULL;'

EXEC sp_executesql @sql, 
                   N'@d date', 
                   @d;
GO

If we pass in a parameter that has a date datatype, rather than date time, an odd thing will happen if we add in a redundant IS NOT NULL check.

The seek predicate will only seek to the first non-NULL value, rather than immediately to the start of the range of dates we care about, which means we end up reading a lot more rows than necessary.

Note the query runtime of 743 milliseconds, and that we end up reading quite a few more rows than we return.

And here I was told Seeks are always efficient ?

Solution One: Stop Checking For NULLs

If we either stop checking for NULLs, we’ll get around the issue.

DECLARE @d date = '20170601';
DECLARE @sql nvarchar(MAX) = N'
SELECT
    COUNT_BIG(*) AS records
FROM dbo.Posts AS p
WHERE p.LastEditDate > @d;'

EXEC sp_executesql @sql, 
                   N'@d date', 
                   @d;
GO

The plan for this query looks a bit different, but performance is no worse for the wear.

Note the 25 millisecond execution time. A clear improvement over the 743 milliseconds above. Though the query plan does look a bit odd.

The compute scalar gins up a date range, which is checked in the seek:

I wonder what Expr1002 is up to.

Solution Two: Just Use The Right Datatype To Begin With

In reality, this is what we should have done from the start, but the whole point of this here blog post is to show you what can happen when you Do The Wrong Thing™

When we use the right datatype, we get a simple plan that executes quickly, regardless of the redundant NULL check.

DECLARE @d date = '20170601';
DECLARE @sql nvarchar(MAX) = N'
SELECT
   COUNT_BIG(*) AS records
FROM dbo.Posts AS p
WHERE p.LastEditDate > @d
AND   p.LastEditDate IS NOT NULL;'

EXEC sp_executesql @sql, 
                   N'@d datetime', 
                   @d;

Here, the NULL check is a residual predicate rather than the Seek predicate, which results in a seek that really seeks instead of just meandering past some NULLs.

Thanks for reading!

Going Further

Why You Shouldn’t Ignore Filter Operators In SQL Server Query Plans Part 2

Posted on November 20, 2020May 16, 2022 by Erik Darling

If You Remember Part 1

We looked at a couple examples of when SQL Server might need to filter out rows later in the plan than we’d like, and why that can cause performance issues.

Now it’s time to look at a few more examples, because a lot of people find them surprising.

As much as I love surprising people, sometimes I’d much rather… not have to explain this stuff later.

Since all my showering and errands are out of the way, we should be able to get through this list uninterrupted.

Unless I get thirsty.

Max Datatypes

If we need to search a column that has a MAX datatype, or if we define a parameter as being a MAX datatype and search a more sanely typed column with it, both will result in a later filter operation than we may care for.

SELECT 
    COUNT_BIG(*) AS records
FROM dbo.Users AS u
WHERE u.AboutMe = N'Hi';

DECLARE @Location nvarchar(MAX) = N'here';
SELECT 
    COUNT_BIG(*) AS records 
FROM dbo.Users AS u 
WHERE u.Location = @Location 
OPTION(RECOMPILE);

Even with a recompile hint!

Here we can see the value of properly defining string widths! If we don’t, we may end up reading entire indexes, and doing the work to weed out rows later.

Probably something that should be avoided.

Functions

There are some built-in functions, like DATALENGTH, which can’t be pushed when used in a where clause.

Of course, if you’re going to do this regularly, you should be using a computed column to get around the issue, but whatever!

SELECT 
    COUNT_BIG(*) AS records
FROM dbo.Users AS u
WHERE DATALENGTH(u.Location) > 0;

And of course, everyone’s favorite love-to-hate, the scalar UDF.

Funny thing about these, is that sometimes tiny bumps in the number of rows you’re after can make for big jumps in time.

SELECT TOP (165)
    u.Id,
    u.DisplayName
FROM dbo.Users AS u
WHERE dbo.ScalarFunction(u.Id) > 475
ORDER BY u.Id;

SELECT TOP (175)
    u.Id,
    u.DisplayName
FROM dbo.Users AS u
WHERE dbo.ScalarFunction(u.Id) > 475
ORDER BY u.Id;

Complexity

Sometimes people (and ORMs) will build up long parameter lists, and use them to build up a long list IN clause list, and even sometimes a long OR clause list.

To replicate that behavior, I’m using code I’m keeping on GitHub in order to keep this blog post a little shorter.

To illustrate where things can get weird, aside from the Filter, I’m going to run this with a few different numbers of parameters.

EXEC dbo.Longingly @loops = 15;
EXEC dbo.Longingly @loops = 18;
EXEC dbo.Longingly @loops = 19;

This will generate queries with different length IN clauses:

SQL Server Missing Index Request — bigger than others

Which will result in slightly different query plans:

We can see some tipping points here.

At 15 parameters, we get a scan with a stream aggregate
At 18 parameters, we get a scan with a filter
At 19 parameters, we get a parallel scan with a filter

Parallelism to the rescue, again, I suppose.

Thanks for reading!

Going Further

Why You Shouldn’t Ignore Filter Operators In SQL Server Query Plans Part 1

Posted on November 19, 2020May 16, 2022 by Erik Darling

Source Of Frustration

Ahem.

*taps mic*

When we write queries that need to filter data, we tend to want to have that filtering happen as far over to the right in a query plan as possible. Ideally, data is filtered when we access the index.

Whether it’s a seek or a scan, or if it has a residual predicate, and if that’s all appropriate isn’t really the question.

In general, those outcomes are preferable to what happens when SQL Server is unable to do any of them for various reasons. The further over to the right in a query plan we can reduce the number of rows we need to contend with, the better.

There are some types of filters that contain something called a “startup expression”, which are usually helpful. This post is not about those.

Ain’t Nothin’ To Do

There are some cases when you have no choice but to rely on a Filter to remove rows, because we need to calculate some expression that we don’t currently store the answer to.

For example, having:

SELECT 
    p.OwnerUserId,
    COUNT_BIG(*) AS records
FROM dbo.Posts AS p
JOIN  dbo.Comments AS c
    ON c.PostId = p.Id
JOIN dbo.Votes AS v
    ON v.PostId = p.Id
GROUP BY p.OwnerUserId
HAVING COUNT_BIG(*) > 2147483647;

We don’t know which rows might qualify for the count filter up front, so we need to run the entire query before filtering things out:

There’s a really big arrow going into that Filter, and then nothing!

Likewise, filtering on the result of a windowing function will get you a similar execution plan.

Of course, there’s not a lot to be done about these Filters, is there?

Unless you pre-compute things somewhere else, you have to figure them out at runtime.

Leftish Fetish

If you write yourself a left join, Filters may become more common, too.

You might do something terrible:

SELECT 
    COUNT_BIG(*) AS records
FROM dbo.Users AS u
LEFT JOIN dbo.Posts AS p
    ON p.OwnerUserId = u.Id
WHERE DATEDIFF(YEAR, p.CreationDate, p.LastActivityDate) > 5;

Or you might do something that seems reasonable:

SELECT 
    COUNT_BIG(*) AS records
FROM dbo.Users AS u
LEFT JOIN dbo.Posts AS p
    ON p.OwnerUserId = u.Id
WHERE p.Id IS NULL;

But what you get is disappointing!

What we care about here is that, rather than filtering rows out when we touch indexes or join the tables, we have to fully join the tables together, and then eliminate rows afterwards.

This is generally considered “less efficient” than filtering rows earlier. Remember when I said that before? It’s still true.

Click the links above to see some solutions, so you don’t feel left hanging by your left joins.

The Message

If you see Filters in query plans, they might be for a good reason, like calculating things you don’t currently know the answer to.

They might also be for bad reasons, like you writing a query in a silly way.

There are other reasons they might show up too, that we’ll talk about tomorrow.

Why tomorrow? Why not today? Because if I keep writing then I won’t take a shower and run errands for another hour and my wife will be angry.

Thanks for reading!

Going Further

Join Me At Data Platform Summit 2020!

Posted on November 12, 2020May 16, 2022 by Erik Darling

The Road From Nowhere

This year, I’m teaching an 8 hour online workshop at Data Platform Summit, and I’d love it if you joined me.

Here’s what I’ll be teaching:

Class Title: The Beginner’s Guide To Advanced Performance Tuning

Abstract: You’re new to SQL Server, and your job more and more is to fix performance problems, but you don’t know where to start.

You’ve been looking at queries, and query plans, and puzzling over indexes for a year or two, but it’s still not making a lot of sense.

Beyond that, you’re not even sure how to measure if your changes are working or even the right thing to do.

In this full day performance tuning extravaganza, you’ll learn about all the most common anti-patterns in T-SQL querying and indexing, and how to spot them using execution plans. You’ll also leave knowing why they cause the problems that they do, and how you can solve them quickly and painlessly.

If you want to gain the knowledge and confidence to tune queries so they’ll never be slow again, this is the training you need.

Date: Dec 7 & 8.

Time: 12 PM to 04 PM EST (View in your timezone)

Tickets: Tickets here!

Going Further

Not All SQL Server Function Rewrites Are Straightforward

Posted on November 11, 2020May 16, 2022 by Erik Darling

And Some, Not At All

Let’s say at some point, you just didn’t know any better, and you wrote a scalar function to make some common thing you needed to do all “modular” and “portable” and stuff.

Good on you, not repeating yourself. Apparently I repeat myself for a living.

Anyway, you know what stinks? When you hit divide by zero errors. It’d be cool if math fixed that for us.

Does anyone know how I can get in touch with math?

Uncle Function

Since you’re a top programmer, you know about this sort of stuff. So you write a bang-up function to solve the problem.

Maybe it looks something like this.

CREATE OR ALTER FUNCTION dbo.safety_dance(@n1 INT, @n2 INT)
RETURNS INT
WITH SCHEMABINDING,
     RETURNS NULL ON NULL INPUT
AS
BEGIN
RETURN
(
    SELECT @n1 / NULLIF(@n2, 0)
);
END
GO

You may even be able to call it in queries about like this.

SELECT TOP (5)
    u.DisplayName,
    fudge = dbo.safety_dance(SUM(u.UpVotes), COUNT(*))
FROM dbo.Users AS u
GROUP BY u.DisplayName
ORDER BY fudge DESC;

The problem is that it makes this query take a long time.

At 23 seconds, this is probably unacceptable. And this is on SQL Server 2019, too. The function inlining thing doesn’t quite help us, here.

One feature restriction is this, so we uh… Yeah.

The UDF does not contain aggregate functions being passed as parameters to a scalar UDF

But we’re probably good query tuners, and we know we can write inline functions.

Ankle Fraction

This is a simple enough function. Let’s get to it.

CREATE OR ALTER FUNCTION dbo.safety_dance_inline(@n1 INT, @n2 INT)
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN
(
    SELECT @n1 / NULLIF(@n2, 0) AS safety
);

Will it be faster?

SELECT TOP (5)
    u.DisplayName,
    fudge = (SELECT * FROM dbo.safety_dance_inline(SUM(u.UpVotes), COUNT(*))) 
FROM dbo.Users AS u
GROUP BY u.DisplayName
ORDER BY fudge DESC;

Well, yes. Mostly because it throws an error.

Msg 4101, Level 15, State 1, Line 35
Aggregates on the right side of an APPLY cannot reference columns from the left side.

Well that’s weird. Who even knows what that means? There’s no apply, here.

What’s your problem, SQL Server?

Fixing It

To get around this restriction, we need to also rewrite the query. We can either use a CTE, or a derived table.

--A CTE
WITH counts AS 
(
    SELECT 
        u.DisplayName, 
        SUM(Upvotes) AS Upvotes,
        COUNT(*) AS records
    FROM dbo.Users AS u
    GROUP BY u.DisplayName
) 
SELECT TOP(5) 
    c.DisplayName,
    fudge = (SELECT * FROM dbo.safety_dance_inline(c.Upvotes, c.records) AS s)
FROM counts AS c
ORDER BY fudge DESC;

--A derived table
SELECT TOP(5) 
    c.DisplayName,
    fudge = (SELECT * FROM dbo.safety_dance_inline(c.Upvotes, c.records) AS s)
FROM 
(
    SELECT 
        u.DisplayName, 
        SUM(Upvotes) AS Upvotes,
        COUNT(*) AS records
    FROM dbo.Users AS u
    GROUP BY u.DisplayName
) AS c
ORDER BY fudge DESC;

Is it faster? Heck yeah it is.

Thanks for reading!

Going Further

How To Avoid SQL Injection In Dynamic SQL Queries In SQL Server

Posted on November 10, 2020May 16, 2022 by Erik Darling

Injectables

Dynamic SQL is always a hot topic. I love using it. Got a lot of posts about it.

Recently, while answering a question about it, it got me thinking about safety when accepting table names as user input, among other things.

The code in the answer looks like this:

CREATE OR ALTER PROCEDURE dbo.SelectWhatever (@SchemaName sysname, @TableName sysname)
AS
BEGIN
SET NOCOUNT, XACT_ABORT ON;

DECLARE @SafeSchema sysname = NULL,
        @SafeTable sysname = NULL,
        @SQL NVARCHAR(MAX) = N'';

SELECT @SafeSchema = SCHEMA_NAME(t.schema_id),
       @SafeTable  = t.name
FROM sys.tables AS t
WHERE t.schema_id = SCHEMA_ID(ISNULL(@SchemaName, 'dbo'))
AND   t.name = ISNULL(@TableName, 'SomeKnownTable');

IF (@SafeSchema IS NULL)
BEGIN
    RAISERROR('Invalid schema: %s', 0, 1, @SchemaName) WITH NOWAIT;
    RETURN;
END;

IF (@SafeTable IS NULL)
BEGIN
    RAISERROR('Invalid table: %s', 0, 1, @TableName) WITH NOWAIT;
    RETURN;
END;


SET @SQL += N'
SELECT TOP (100) *
/*dbo.SelectWhatever*/
FROM ' + QUOTENAME(@SafeSchema) 
       + N'.'
       + QUOTENAME(@SafeTable)
       + N';';

RAISERROR('%s', 0, 1, @SQL) WITH NOWAIT;
EXEC sys.sp_executesql @SQL;

END;

Sure, there are other things that I could have done, like used OBJECT_ID() and SCHEMA_ID() functions to validate existence, but I sort of like the idea of hitting the system view, because if you follow that pattern, you could expand on it if you need to accept and validate column names, too.

Expansive

Yeah, I’m using some new-version-centric stuff in here, because I uh… I can. Thanks.

If you need examples of how to split strings and create CSVs, get them from the zillion other examples on the internet.

CREATE OR ALTER PROCEDURE dbo.SelectWhatever (@SchemaName sysname, @TableName sysname, @ColumnNames NVARCHAR(MAX))
AS
BEGIN
SET NOCOUNT, XACT_ABORT ON;

DECLARE @SafeSchema sysname = NULL,
        @SafeTable sysname = NULL,
        @SafeColumns NVARCHAR(MAX) = NULL,
        @SQL NVARCHAR(MAX) = N'';

SELECT @SafeSchema = SCHEMA_NAME(t.schema_id),
       @SafeTable  = t.name
FROM sys.tables AS t
WHERE t.schema_id = SCHEMA_ID(ISNULL(@SchemaName, 'dbo'))
AND   t.name = ISNULL(@TableName, 'SomeKnownTable');

SELECT @SafeColumns = STRING_AGG(QUOTENAME(c.name), ',')
FROM sys.columns AS c
WHERE c.object_id = OBJECT_ID(@SafeSchema + N'.' + @SafeTable)
AND   c.name IN ( SELECT TRIM(ss.value) 
                  FROM STRING_SPLIT(@ColumnNames, ',') AS ss );


IF (@SafeSchema IS NULL)
BEGIN
    RAISERROR('Invalid schema: %s', 0, 1, @SchemaName) WITH NOWAIT;
    RETURN;
END;

IF (@SafeTable IS NULL)
BEGIN
    RAISERROR('Invalid table: %s', 0, 1, @TableName) WITH NOWAIT;
    RETURN;
END;

IF (@SafeColumns IS NULL)
BEGIN
    RAISERROR('Invalid column list: %s', 0, 1, @ColumnNames) WITH NOWAIT;
    RETURN;
END;


SET @SQL += N'
SELECT TOP (100) ' 
       + @SafeColumns
       + N'
/*dbo.SelectWhatever*/
FROM ' + QUOTENAME(@SafeSchema) 
       + N'.'
       + QUOTENAME(@SafeTable)
       + N';';

RAISERROR('%s', 0, 1, @SQL) WITH NOWAIT;
EXEC sys.sp_executesql @SQL;

END;

Normally I’d raise hell about someone using a function like STRING_SPLIT in a where clause, but for simple DMV queries you’re not likely to see a significant perf hit.

There’s a lot of stuff you’ll see in DMV queries that are not okay in normal queries.

Some explanations

It would be nice if we had a dynamic SQL data type that did some of this stuff for us. Sort of like XML document validation with less obtuse error messages.

Until that time which will never come, we have to do the hard work. One way to make your dynamic SQL a little bit safer is to keep user inputs as far away from the execution as you can.

In the above example, I declare a separate set of variables to hold values, and only use what a user might enter in non-dynamic SQL blocks, where they can’t do any harm.

If there’s anything goofy in them, the “@Safe” variables end up being NULL, and an error is thrown.

Also, I’m using QUOTENAME on every individual object: Schema, Table, and Column, to cut down on any potential risks of naughty object values being stored there. If I had to do this for a database name, that’d be an easy add on, using sys.databases.

If you’ve got to work with stringy input for dynamic SQL, this is one way to make the ordeal a bit more safe. You can also extend that to easier to locate key values, like so:

CREATE PROCEDURE dbo.SaferStringSearch (@UserEquals NVARCHAR(40) = NULL, @UserLike NVARCHAR(40))
AS
SET NOCOUNT, XACT_ABORT ON
BEGIN

CREATE TABLE #ids(id INT NOT NULL PRIMARY KEY);

INSERT #ids WITH (TABLOCK) ( id )
SELECT u.Id
FROM dbo.Users AS u
WHERE u.DisplayName = @UserEquals
UNION
SELECT u.Id
FROM dbo.Users AS u
WHERE u.DisplayName LIKE @UserLike;

DECLARE @SQL NVARCHAR(MAX) = N'
SELECT p.*
FROM dbo.Posts AS p
WHERE EXISTS
(
    SELECT 1/0
    FROM #ids AS i
    WHERE i.id = p.OwnerUserId
);
';

EXEC sp_executesql @SQL;

END;

I get that this isn’t the most necessary use of dynamic SQL in the world, it’s really just a simple way to illustrate the idea.

Stay Safe

If you’ve got to pass strings to dynamic SQL, these are some ways to make the process a bit safer.

In the normal course of things, you should parameterize as much as you can, of course.

For search arguments, that’s a no-brainer. But for objects, you can’t do that. Why? I don’t know.

I’m not a programmer, after all.

Thanks for reading!