The Art Of The SQL Server Stored Procedure: Pagination

Video Star


A common-enough practice to limit search results is to write paginated queries. You may give users the ability to specify how many rows per page they want to see, or you may have a set number of rows per page.

But the end goal is to allow users to very quickly get a smaller number of rows returned to them. Almost no one needs to get many thousands of rows back, unless they’re planning on exporting the data.

In these four videos, which are part of my paid training, I’ll teach you how to write reliably fast paging queries in SQL Server.

If you like this sort of stuff, head to the very end of my blog post for a discount code on all of my training.

 

 

 

 

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.

The Art Of The SQL Server Stored Procedure: Dynamic Searches

Convenience


Like having a built-in type to make dynamic SQL more easily managed, it would also be nice to have some mechanism to manage dynamic searches.

Of course, what I mean by dynamic searches is when you have a variety of parameters that users can potentially search on, with none or few of them being required.

Right now, of course, your first best option is writing parameterized dynamic SQL that generates a where clause specific to the set of non-null parameters being sent to the procedure.

Your second best option is to just slap a statement-level recompile hint on things.

There are of course pros and cons to each.

Dynamic SQL vs Option Recompile


For dynamic SQL, the main cons are:

  • Relying on developers to write it safely
  • Detachment from stored procedure (harder to identify in query store/plan cache)
  • Potentially managing permissions
  • Handling increasingly complex scenarios (lists of columns, custom ordering, optional joins, etc.)

The main pros are:

  • Execution plan reusability
  • Transparent optimizer predicates
  • Consistent performance for indexed searches
  • Incredible flexibility

For OPTION(RECOMPILE), there’s really not a whole lot of up and down.

  • Pro: Really simple to slap on the end of a query
  • Con: Query constantly compiles a new plan (no plan reuse, but no cache bloat either)

As long as you have query store enabled, tracking the plans over time is available to you.

I don’t see queries that take a long time to compile all too frequently, but it’s worth keeping an eye on compilation time if you’re trying to balance these options locally.

Trust me on this one.

Managing Performance


The most difficult part, once you’ve made a choice, is figuring out indexing. Assuming you want searches to be fast, this is important.

Assuming you have just one table, and just 6 optional parameters, there are over 700 potential index key column combinations to consider.

Once you consider includes, and throwing columns into the mix for sorting, the mind begins to boggle.

In many cases, nonclustered column store indexes make this a lot easier. Row store indexes are built in a way that may make having many similar ones necessary.

Key column order makes a lot of difference when speeding up searches generally. An index on columns (a, b, c) will certainly make a variety of searches fast, so long as column a is part of the predicates.

Searches that do not filter on column a will have to scan the full index. Now you’re stuck trying to figure out what the most common or important searches are.

In this video, I talk about solving indexing problems with this type of query.

Thanks for reading (and watching)!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.

Happy New Year, From Darling Data

Up All Night


I don’t wanna. Neither do you. Have a good one.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.

The Art Of The SQL Server Stored Procedure: Parameter Sniffing

Responsible Reading


One of the most fun parts of my job is telling people that parameter sniffing is a good thing. They’re often shocked, because the phrase has such ominous connotations to developers.

The parameters! They get sniffed! Performance! Something! Stuff!

Parameter sniffing does not mean the sky is falling. Like I’ve said before, if it were such a bad idea, it would have been abandoned.

What you have to be mindful of is how data gets distributed over time, and doing a few things to make sure the optimizer has as few reasonable plan choices as possible.

  • Indexes keys that cover all where clause predicates at minimum
  • Indexes includes that cover all select list columns in special circumstances
  • Queries written to best take advantage of those indexes (SARGability, etc.)
  • Using temporary objects to isolate critical parts of queries into manageable chunks

That will help cut down on how disastrous plan sharing is between queries that need to process very different amounts of data.

Under really crazy circumstances (and I know, I’ve linked to this before), I use dynamic SQL to fix parameter sensitivity issues.

 

SQL Server 2022 Not To The Rescue


SQL Server 2022 has a feature that’s suppose to help with parameter sensitivity issues. It’s okay sometimes, but the current set of limitations are:

  • Only equality predicate parameters are evaluated for sensitivity issues
  • Only one parameter is selected for managing sensitivity
  • Only three query plan choices are generated for different estimation buckets

I have seen it be helpful in a few cases, but there are times when it should obviously kick in, but doesn’t.

There is currently no query hint to force the feature to kick in when it doesn’t (and it should).

Just so you know how I set up for this:

ALTER DATABASE StackOverflow2013 
SET COMPATIBILITY_LEVEL = 160;

Here are my indexes:

CREATE INDEX
    beavis
ON dbo.Posts
    (OwnerUserId, PostTypeId)
WHERE
    (PostTypeId = 1)
WITH
    (MAXDOP = 8, SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);

CREATE INDEX
    butthead
ON dbo.Votes
    (VoteTypeId, UserId, PostId)
INCLUDE
    (BountyAmount, CreationDate)
WITH
    (MAXDOP = 8, SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);

CREATE INDEX
    stewart
ON dbo.Badges
    (UserId)
WITH
    (MAXDOP = 8, SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);
GO

Here are the checks that validate options, etc.

SELECT
    d.name,
    d.compatibility_level,
    d.is_query_store_on
FROM sys.databases AS d
WHERE d.database_id = DB_ID();

SELECT
    dsc.configuration_id,
    dsc.name,
    dsc.value,
    dsc.is_value_default
FROM sys.database_scoped_configurations AS dsc
WHERE dsc.name = N'PARAMETER_SENSITIVE_PLAN_OPTIMIZATION';
sql server
way to be

With That Out Of The Way


Here’s the stored procedure that causes us grief.

CREATE OR ALTER PROCEDURE
   dbo.VoteSniffing
(
    @VoteTypeId integer
)
AS
BEGIN
SET XACT_ABORT, NOCOUNT ON;

    SELECT
        UserId  = 
            ISNULL(v.UserId, 0),
        Votes2013  = 
            SUM
            (
                CASE
                    WHEN 
                    (
                            v.CreationDate >= '20130101'
                        AND v.CreationDate <  '20140101'
                    )
                    THEN 1
                    ELSE 0
                END
            ),
        TotalBounty  = 
            SUM
            (
                CASE
                    WHEN v.BountyAmount IS NULL
                    THEN 0
                    ELSE 1
                END
            ),
        PostCount  = 
            COUNT(DISTINCT v.PostId),
        VoteTypeId  = 
            @VoteTypeId
    FROM dbo.Votes AS v
    WHERE v.VoteTypeId = @VoteTypeId
    AND   NOT EXISTS
    (
        SELECT
            1/0
        FROM dbo.Posts AS p
        JOIN dbo.Badges AS b
          ON b.UserId = p.OwnerUserId
        WHERE  p.OwnerUserId = v.UserId
        AND    p.PostTypeId = 1
    )
    GROUP BY 
        v.UserId
    ORDER BY
        PostCount DESC;
END;
GO

The good news is that all formatting checks pass here. Thank you, thank you, thank you. You’re far too kind.

Execution Problems


There’s only one parameter. It’s an equality predicate. And the VoteTypeId Column is highly volative.

sql server
oh dear me

You would think that with data distributions that look this way, there would be some attempt to remedy the situation by the Parameter Sensitive Plan Optimization feature.

Unfortunately, there is not.

Little Big Plan


When executed first with 4 as the VoteTypeId, the plan is very fast. Of course it is. We’re finding 733 rows to work with.

EXEC dbo.VoteSniffing
    @VoteTypeId = 4;
sql server execution plan
simple as

I have no qualms with this plan, other than the erroneous No Join Predicate warning. There is a join predicate, and you can see it in the Seek operators.

Heck, you can even see it in the query text. It’s pretty stupid.

When the plan is repeated for VoteTypeId 2, it is a disaster.

EXEC dbo.VoteSniffing
    @VoteTypeId = 2;
sql server query plan
not so hot

A Seek takes 9.5 seconds, a Nested Loops join occurs for 5 seconds, and a spill occurs for 22 seconds.

After the spill, another ~22 seconds is spent in various operators until it’s complete.

In other words, we still have our work cut out for us to tune this thing. If we go back to our list:

  • Indexes keys that cover all where clause predicates at minimum ✅
  • Indexes includes that cover all select list columns in special circumstances ✅
  • Queries written to best take advantage of those indexes (SARGability, etc.) ✅
  • Using temporary objects to isolate critical parts of queries into manageable chunks ❌

In this case, if we use a #temp table and isolate the portion of the query that evaluates a parameter (just getting data from the Votes table), we can solve 99% of the parameter sniffing problems between the majority of the VoteTypeIds.

However, VoteTypeId 2 is still a special flower. This is a good place to pause and ask ourselves if repeatedly filling a temporary object with 37 million rows is a good use of time.

This is when some techniques we’ve discussed before would come in handy:

  • Dynamic SQL to treat VoteTypeId 2 differently from the others
  • A separate stored procedure to intercept VoteTypeId 2 executions

Game Change


The reason why this is a more sensible arrangement is because when the plan is compiled initially for VoteTypeId 2, it runs just fine.

It’s also true that when VoteTypeId 4 uses the plan for VoteTypeId 2, it goes from taking 0 seconds across to taking a couple seconds. Sharing is not caring.

This is the fast plan for VoteTypeId 2. It has a lot of operators with additional startup costs that make grabbing small amounts of data slow.

sql server query plan
good for some

Parameter sensitivity is a tough thing at times. Managing multiple plans for the same query can end up being quite complicated.

There are absolutely times when you’re going to have to throw in the towel and use a statement-level recompile hint to make things function appropriately.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.

The Art Of The SQL Server Stored Procedure: Temporary Objects

And Then You Were Gone


I probably overuse temp tables. I’m the first one to admit it. But often when I choose to use one, it’s a choice between:

  • Spending a long time trying to business understand logic for a company I don’t work for
  • Tinkering with query syntax and using really weird tricks and hints to get different plan shapes
  • Waiting a long time to create or tweak indexes on huge tables (usually on “dev” hardware)

And you know what? If I had to work anywhere full time, and not balance performance tuning requests from a bunch of people, I just might spend more time on those things.

Don’t tell anyone, but sometimes if I’m really busy, I have no problem waiting a long time for an index change to go through, so I can look at something else for a while.

In general, temp tables offer a convenient way to focus on one problem part of the query, without the noise of a bazillion other joins and applys and subqueries and everything else going on.

They also offer a familiar paradigm to anyone looking at the work later, that doesn’t involve SELECT TOP (2147483647) UNION ALL SELECT TOP (1) type syntax, which can really raise some eyebrows on unfamiliar faces.

A Recent Example


I’m using Stack Overflow in my query, but a recent client engagement had a problem query that looked about the same.

WITH
    top5 AS
(
    SELECT
        p.*,
        n = 
            ROW_NUMBER() OVER
            (
                PARTITION BY
                    p.OwnerUserId
                ORDER BY
                    p.Score DESC
            )
    FROM dbo.Posts AS p
    WHERE p.OwnerUserId = 22656
    AND   p.PostTypeId = 1
)
SELECT
    u.DisplayName,
    t1.Title,
    t2.Title,
    t3.Title,
    t4.Title,
    t5.Title
FROM dbo.Users AS u
LEFT JOIN top5 AS t1
  ON  t1.OwnerUserId = u.Id
  AND t1.n = 1
LEFT JOIN top5 AS t2
  ON  t2.OwnerUserId = u.Id
  AND t2.n = 2
LEFT JOIN top5 AS t3
  ON  t3.OwnerUserId = u.Id
  AND t3.n = 3
LEFT JOIN top5 AS t4
  ON  t4.OwnerUserId = u.Id
  AND t4.n = 4
LEFT JOIN top5 AS t5
  ON  t5.OwnerUserId = u.Id
  AND t5.n = 5
WHERE t1.OwnerUserId IS NOT NULL;

The problems with this contain multitudes, but let’s let the query plan do the talking:

sql server query plan
oh, that.

In the words of Drake: “Take Care”

If you’re the type of developer who insists that common table expressions are some sort of magic potion, I do hope that you this is your hangover.

Practical Magic


Temporary tables are the cure for common table expression hangovers, for the most part.

Let’s see how we do with one.

WITH
    top5 AS
(
    SELECT
        p.*,
        n = 
            ROW_NUMBER() OVER
            (
                PARTITION BY
                    p.OwnerUserId
                ORDER BY
                    p.Score DESC
            )
    FROM dbo.Posts AS p
    WHERE p.OwnerUserId = 22656
    AND   p.PostTypeId = 1
)
SELECT
    t.OwnerUserId,
    t.Title,
    n
INTO #top5
FROM top5 AS t
WHERE t.n <= 5;

SELECT
    u.DisplayName,
    t1.Title,
    t2.Title,
    t3.Title,
    t4.Title,
    t5.Title
FROM dbo.Users AS u
LEFT JOIN #top5 AS t1
  ON  t1.OwnerUserId = u.Id
  AND t1.n = 1
LEFT JOIN #top5 AS t2
  ON  t2.OwnerUserId = u.Id
  AND t2.n = 2
LEFT JOIN #top5 AS t3
  ON  t3.OwnerUserId = u.Id
  AND t3.n = 3
LEFT JOIN #top5 AS t4
  ON  t4.OwnerUserId = u.Id
  AND t4.n = 4
LEFT JOIN #top5 AS t5
  ON  t5.OwnerUserId = u.Id
  AND t5.n = 5
WHERE t1.OwnerUserId IS NOT NULL;

Here’s the new query plan:

sql server query plan
hi mom

Without changing indexes, we get our query from 26 seconds to 1.5 seconds.

Really puts the “common” in “common table expressions”, doesn’t it?

Argue


While the “should I use common table expressions or temp tables?” question is largely a settled one for me, one thing that always comes up in seemingly chaotic practice is temp tables vs. table variables.

Table variables certainly have their place in the world, though often not without compromises or alternatives.

Many developers will use table variables for the purpose of passing them directly to another stored procedure. That’s fine and all, but you can just use #temp tables and reference them directly in another stored procedure, too.

Other times, which on gets used seems to be a question of moods and fancies of the developer currently working on something.

Some time ago I talked about when table variables are most useful in SQL Server practice.

And I stand by the majority of what’s discussed in here still. Hopefully you find it enlightening, even if you’ve already seen it.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.

The Art Of The SQL Server Stored Procedure: Dynamic SQL

Time Well Spent


I’ve spent a lot of time on this blog telling you different reasons why you should, when you should, and how you should use dynamic SQL.

In this post, I’m going to go into some of the finer points of how I approach dynamic SQL to avoid issues — not performance issues — more procedural issues.

Things you should use to protect yourself from wonky object names, string truncation, object identification, and more.

There are many things that I wish Microsoft would do with dynamic SQL to make working with it easier and safer. For one, there should be a specific data type for it with built-in safe guards against SQL injection. Sort of like a template that would make parameter substitution a lot easier, and allow queries in strings to be written without a ton of escape quotes so that they’d be easier to debug and troubleshoot.

It’s a dream that will never come true, but hey we got ledger tables that will eat up plenty of disk space in Azure, so that’s just great.

Quote And Quote Alike


The first and most obvious rule I have with dynamic SQL is that if I need to accept object names, I’m using QUOTENAME.

That goes for any object. Server, database, schema, table, view, procedure, function, you get the idea.

Why? Because developers don’t always do nice things when they name things. The most common one that I see is putting spaces in names, but over the years I’ve run into unprintable characters, brackets, symbols like @, #, $, and &, and… unicode characters.

If you look in any of the stored procedures I write, you’ll see how heavily I use QUOTENAME to save myself headaches later.

Now, you could write your own brackets into strings, but they don’t offer the same level of protection against SQL injection.

One thing I will say is smart to do is to keep two copies of anything you’re quoting in the name of. One to use in the dynamic SQL, and one to use for other information you may need to retrieve about the thing you’ve quoted.

As an example, if you accept a database name as a parameter, and then you add quotes to it, it’ll be awfully hard to find any information about it in sys.databases, because database_name will not equal [database_name]

There you go. Your mind is blown.

Always Unicode


Whenever I see parameters or variables created to hold dynamic SQL with varchar (and not nvarchar) as the type, I get real nervous.

Why? Because I know some lazy bones is about to concatenate a bunch of user input into their strings and then:

EXEC (@sql);

And that is unsafe, my friends. Very unsafe. This where unfunny people will tell an unfunny joke about Bobby Tables.

When you use sp_executesql to write and run the safe, parameterized dynamic SQL that you should, it expects unicode inputs for both the query that you execute.

DECLARE
    @s varchar(MAX) = 'SELECT d.* FROM sys.databases AS d WHERE d.database_id = @d;',
    @p varchar(MAX) = '@d integer',
    @d integer = 1;

EXEC sys.sp_executesql
    @s,
    @p,
    @d;

This will fail with the error:

Msg 214, Level 16, State 2, Procedure sys.sp_executesql, Line 1 [Batch Start Line 0]

Procedure expects parameter ‘@statement’ of type ‘ntext/nchar/nvarchar’.

And then with the error

Msg 214, Level 16, State 3, Procedure sys.sp_executesql, Line 1 [Batch Start Line 0]

Procedure expects parameter ‘@params’ of type ‘ntext/nchar/nvarchar’.

Because we did not follow instructions.

This, however, will run wonderfully, and keep us safe from unfunny jokes about Bobby Tables.

DECLARE
    @s nvarchar(MAX) = N'SELECT d.* FROM sys.databases AS d WHERE d.database_id = @d;',
    @p nvarchar(MAX) = N'@d integer',
    @d integer = 1;

EXEC sys.sp_executesql
    @s,
    @p,
    @d;

And besides, it’s good form. You never know when something unicodey may sneak in to your string, and you don’t want to end up with question marks.

Concatenation Street


One big problem, and one that can happen suddenly, out of nowhere, without any rhyme or reason, and only on one server, and when you test it on another server it won’t happen, is string truncation because of implicit conversion.

See, when you put strings together, even if you declare the base parameter or variable to hold the string as a max data type, shorter strings will somehow cause SQL Server to decide that the result will be… something else.

Even more annoying is that even if you put an outer CONVERT(nvarchar(max), everything) around the whole ordeal, one string concatenation inside will still mess the whole works.

If you look at the dynamic SQL that I write, you’ll notice that certain shorter strings are surrounded with converts to a max type.

This is why you’ll see code that looks like this:

+
CONVERT
(
    nvarchar(MAX),
    CASE @new
         WHEN 1
         THEN
N'
qsp.plan_forcing_type_desc,
qsp.force_failure_count,
qsp.last_force_failure_reason_desc,
w.top_waits,'
         ELSE
N''
    END
)
+

This is also why I go heavy on having debugging to print string out: Because you can usually tell by which part of the string gets cut off where the implicit conversion happened.

Formation


Formatting dynamic SQL is especially important. Remember when I blogged about formatting T-SQL?

That goes doubly-double for dynamic SQL. I’ll often write and format the query first, and then make it dynamic so that SQL Prompt can work it’s 80% magic.

That dynamic SQL I showed you up there? I’d never put that out into the world for people to deal with. Rude.

It would look way more like this:

DECLARE
    @s nvarchar(MAX) = N'
SELECT 
/*I came from Erik Darling''s awesome stored procedure*/
    d.* 
FROM sys.databases AS d 
WHERE d.database_id = @d;
',
    @p nvarchar(MAX) = N'@d integer',
    @d integer = 1;

RAISERROR(@s, 0, 1) WITH NOWAIT;

EXEC sys.sp_executesql
    @s,
    @p,
    @d;
  1. It’s formatted nicely
  2. It’s aligned so that it pretty prints in RAISERROR
  3. There are leading and trailing new lines so it separates nicely from other returned messaged
  4. There’s a comment to tell me where it came from

Now if only there were a real dynamic SQL type so I didn’t have to use two apostrophes in there.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.

The Art Of The SQL Server Stored Procedure: Wrapper Stored Procedures

Happenstance


Wrapper stored procedures are useful for things like:

  • Transforming declared local variables into parameters
  • Preventing code from compiling when it isn’t used
  • Generating different query plans to deal with parameter sniffing

The upside of using this over dynamic SQL is that you have a convenient object name attached to the code.

The downside is that if stored procedures share code logic, you now have more to maintain. Likewise, if you have many IF...ELSE branches, you’ll have many more stored procedures to dig around to (though they’ll be a lot easier to identify in the plan cache and query store).

There are other handy things about stored procedures that dynamic SQL makes iffy, like permissions, and making sure developers don’t write the kind of code that makes headlines.

Stored procedures also aren’t a good use case for all the “kitchen sink” queries that use a lot of optional parameters. Dynamic SQL is king here, because creating a stored procedure for every permutation of parameter combinations is a bigger no than well done steak.

To learn how to deal with those, watch these videos.

 

Thanks for reading (and watching)!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.

Merry Christmas, From Darling Data

Nope, Not Today


Today you do other things. Go. Goodbye.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.

The Art Of The SQL Server Stored Procedure: Local Variables

Nein Nein Nein


Just like you can’t fix a broken bone with good vibes, you can’t fix parameter sniffing with idiocy. More on that later, though.

Like many other things we’ve discussed thus far, local variables are a convenience to you that have behavior many people are still shocked by.

You, my dear and constant reader, may not be shocked, but the nice people who pay me money to fix things seem quite astounded by what happens when you invoke local variables.

So I find myself in a difficult position: do I dredge up more red meat for the millions of die-hard SQL Server performance nuts who come here for the strange and outlandish, or produce evergreen content for people who pay my substantial bar tabs.

Coin, tossed.

Corn, popped.

Greenery


Local variables present an attractive proposition for many reasons. For instance, you can:

  • Set a constant date/time variable
  • Assign values from complex subqueries to a single variable
  • Increment values in a loop to batch modifications

And here’s the thing: I’m with you on the need and convenience for all those things. It’s just how you usually end up using them that I disagree with.

Many developers are under the impression that parameter sniffing is a bad thing; it is not. If it were, modern database systems would have thrown the whole idea out ages ago.

Constantly generating execution plans is not a good use of your CPU’s brain power. The real problem is parameter sensitivity.

I’m going to say this as emphatically as I can for those of you who call local variables and using optimize for unknown as a best practice:

Local variables use an estimate derived from the total rows in the table multiplied by the assumed uniqueness of a column’s data. More often than not, that is a very small number.

If you have the kind of incredibly skewed data that is sensitive to parameters being sniffed, and plans being generated based on those initial estimates, it is highly unlikely that plans derived from fuzzy math will be suitable for general execution.

There may be exceptions to this, of course.

Every database is subject to many local factors that make arrangements outside the norm being sensible.

Drudgery


What you need to do as a developer is test these assumptions of yours. I spend a lot of my time fixing performance problems arising developers not testing assumptions.

A common feedback loop occurs when testing code against very small data sets that don’t expose the types of performance problems that arise from larger ones, or testing against data sets that may be similar in terms of volume, but not in terms of distribution.

I am being charitable here, though. Most code is only tested for result correctness. That’s a fine starting point.

After all, most developers are paid to develop, and are not performance tuning experts. It would be nice if performance tuning were part of the development process, but… If you’re testing against wack data, it’s impossible.

Timelines are short, demands are substantial, and expectations are based mostly around time to deliver.

Software development against databases is rarely done by developers who are knowledgable about databases, and more often than not by developers who are fluent in the stack used for the application.

Everything that happens between front and back may as well be taking place in Narnia or Oz or On A Planet Far, Far Away.

This section ended up being way more philosophical than I had intended, but it’s Côte-Rôtie season.

Let’s move on!

Assumptive


When you want to test whether or not something you’re working on performs best, you need to understand which options are available to you.

There are many ways to pet a dog when working with SQL Server. Never assume the way you’re doing it is the best and always the best.

One assumption you should absolutely never make is that the way you see everyone around you doing something is the best way to do something.

That’s how NOLOCK turned into an epidemic. And table variables. And Common Table Expressions. And Scalar UDFs. And Multi-Statement UDFs. And sticking ISNULL in every join and where clause.

And, of course, local variables.

One way to test your assumptions about things is to create a temporary stored procedure and test things in different ways.

CREATE OR ALTER PROCEDURE
    #p
(
    @ParentId integer
)
AS
BEGIN
    SET NOCOUNT ON;

    DECLARE
        @ParentIdInner integer = @ParentId;

    SELECT
        c = COUNT_BIG(*)
    FROM dbo.Posts AS p
    WHERE ParentId = 0;

    SELECT
        c = COUNT_BIG(*)
    FROM dbo.Posts AS p
    WHERE ParentId = @ParentId;

    SELECT
        c = COUNT_BIG(*)
    FROM dbo.Posts AS p
    WHERE ParentId = @ParentIdInner;
END;
GO

CREATE INDEX 
    p ON 
dbo.Posts
    (ParentId) 
WITH
    (SORT_IN_TEMPDB = ON, DATA_COMPRESSION  = PAGE);

EXEC dbo.#p 
    @ParentId = 0;

We’re using:

  1. A literal value
  2. An actual parameter
  3. A declared variable

With an example this simple, the local variable won’t:

  • Slow things down
  • Change the query plan
  • Prevent a seek

But it will throw cardinality estimation in the toilet and flush twice.

grimy

These are the kinds of things you need to test and look at when you’re figuring out the best way to pet SQL Server and call it a very good dog.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.

The Art Of The SQL Server Stored Procedure: Conditional Logic

Other Thing


There are two forms of conditional logic that I often have to fix in stored procedures:

  1. Branching to run different queries at different times
  2. Complicated join and where clause logic

The problems with both are similar in terms of performance. You see, when smart people tell you that SQL is a declarative language, and not a procedural language, they’re usually trying to get you to stop using cursors.

And that’s not always wrong or bad advice, trust me. But it also applies here.

When you’re developing stored procedures, the thing you need to understand is that SQL Server builds query plans for everything in them first time on the first compile, and then after any causes of a recompile.

It does not compile for just the branch of logic that gets explored on compilation. No no. That would be too procedural. Procedural we are not.

There are two exceptions to this rule:

  1. When the branches execute dynamic SQL
  2. When the branches execute stored procedures

If this sounds familiar to you, you’ve probably hear me talk about parameter sniffing, local variables, SARGability, and… well, more things dealing with SQL Server performance.

Hm.

Problem 1: IF Branching


Like I mentioned above, the only way to get if branching to only compile plans for explored branches, is to tuck them away.

Probably the easiest way to demonstrate this is to create a stored procedure with logical branching that accesses an object that doesn’t even pretend to exist.

CREATE OR ALTER PROCEDURE
    dbo.i_live
(
    @decider bit = NULL
)
AS
BEGIN
    SET NOCOUNT, XACT_ABORT ON;
    
    IF @decider = 'true'
    BEGIN
        SELECT
            dp.*
        FROM dbo.DinnerPlans AS dp;
    END;
    
    IF @decider = 'false'
    BEGIN
        SELECT
            dp.*
        FROM dbo.DinnerPlans AS dp WITH(INDEX = 2);
    END;
    
    IF @decider IS NULL
    BEGIN
        SELECT
            result = 'please make a decision.'
    END;
END;
GO 

EXEC dbo.i_live 
    @decider = 'true';

If you run this, you’ll get an error saying the index doesn’t exist, even though the code branch doesn’t run.

Where things get even weirder, but is well besides the point of the post, if you execute a store procedure that references a table that doesn’t exist, but not in the branch that executes, no plan will be cached for it.

You can see a stupid demo of that here. It’s probably not something you’ll run into a whole lot, but it’s probably worth noting.

If you’d like to read more about the performance problems IF branching doesn’t solve, you click on these links:

Problem 2: Conditional Logic In Your Predicates


You may love a big, messy, sloppy, join or where clause, but SQL Server’s query optimizer hates it. This may be disappointing to hear, but query optimization is a really hard job.

Feeding in a bunch of runtime complexity and expecting consistently good results is a shamefully naive approach to query writing.

The query optimizer is quite good at applying its craft to a variety of queries. At the base of things, though, it is a computer program written by people. When you think carefully about the goal of a generalized query optimizer, it has to:

  • Come up very quickly with a good enough execution plan
  • Regardless of the surrounding hardware
  • Respecting the logic of the query
  • Within the confines of available indexes, constraints, hints, and settings

As you add complexity to queries, various things become far harder to forecast and plan for in a generalized way.

Think of it like planning a car trip. The more stops you add, the harder it is to find the fastest route. Then throw in all the unexpecteds — traffic, construction, weather, people randomly gluing themselves to the road, breakdowns — and what have you got?

Chaos. Pure chaos.

While the idealism of writing everything in one big query seems attractive to SQL developers — stacking common table expressions, nesting view upon view and subquery upon subquery, and adding in all the OR logic one can possible surmise — it only opens the optimizer up to error, mis-estimates, and even missed opportunities.

The reality is that query optimizers across all database platforms have plenty of issues, blind spots, and shortcomings. Sometimes you need to write queries in a way that is less convenient to you in order to avoid them.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.