Software Vendor Mistakes With SQL Server: Not Enforcing Code Consistency

Practice Makes Perfect


Often when working with clients, the coding practices read like geological stratum. You can tell when and who wrote something based on the practices they used.

This is bad news, though, because it speaks to lone-wolf style development. That might be okay if all your SQL developers are well-trained, etc. But that’s mostly not the case.

It’s especially important for junior developers to have code reviewed by others so they don’t introduce bad/worst practices into production code, and for legacy code to get reviewed to make sure it’s up to current best practices.

Otherwise, you’re always going to have problems. Here’s where I see folks making mistakes, mostly because they don’t have any internal “style guide” for queries.

I don’t mean formatting — we’ll talk about that tomorrow — I mean the choices you make while you’re writing queries.

The Style Council


Consider just some of the options you have when writing queries:

  • Temp Table or Table Variable
  • Common Table Expression or Derived Table
  • SELECT INTO or INSERT SELECT
  • Inline Table Valued Function or View
  • ISNULL or COALESCE
  • Cursor or Set-Based Code

For all of these options, there are good use cases, and bad ones.

  • Most of the time you want to use #temp tables, but code executed at high frequency could benefit from @table variables
  • Common Table Expressions can be re-used, but the entire query inside them is executed every time they’re referenced
  • SELECT INTO is convenient and may get a fully parallel insert without a tablock hint, but may also incorrectly interpret certain attributes like data type or length
  • Views may be more intuitive to write, but inline table valued functions can be more useful for parameterized queries
  • ISNULL is easier to spell, but it only takes two arguments and there are differences in how data types are interpreted between the two
  • Often, set-based code will be more efficient, but there are many valid uses for cursors

Futuristic


If your application is using newer versions of SQL Server:

Or are you still using older methods to split strings or aggregate strings?

You could extend this to a few other things, too, like the additional programmability features added to SQL Server 2016 starting with SP1.

There’s a lot to consider and keep up with in SQL Server. Having internal documentation about this stuff is the key to making sure your code:

  • Aligns with current best practices
  • Comments thoroughly on deviations
  • Takes advantage of new paradigms and patterns in SQL Server

If that’s the kind of thing you need help with, hit the link below to set up a free sales call to discuss your needs.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Software Vendor Mistakes With SQL Server: Not Using Batch Mode

Just Add Batch Mode


There are a few ways to get Batch Mode involved in a query:

  • Be on SQL Server 2019+ Enterprise Edition and hope Batch Mode On Rowstore heuristics think your query is eligible
  • Be on any supported version of SQL Server Enterprise Edition and…
    • Left join to an empty table with a column store index on it on 1 = 0
    • Create an empty filtered nonclustered columnstore index on a table in your query
    • Create an actual columnstore index on your table

Since I’ve used the prior tricks in many posts many times, I’m going to actually create an index this time:

CREATE NONCLUSTERED COLUMNSTORE INDEX magick 
    ON dbo.Posts
(
    OwnerUserId, PostTypeId, Score
);

And now our query plan looks like this:

SQL Server Query Plan
smush

When Should I Use Batch Mode?


Batch Mode can be really powerful under the right conditions, especially when you have a column store index as a data source for your query.

Here’s some generic guidelines for when you should try things out:

  • Column store indexes
    • Big tables, over a couple million rows or so
    • Tables in data warehouses
    • Tables that support custom user searches
    • That wouldn’t otherwise fit in the buffer pool uncompressed
  • Queries
    • That process millions of rows
    • With multiple DISTINCT aggregate
    • That aggregate large numbers of rows
    • That are generated from custom user searches
  • Query plans
    • That have a tough time figuring out the right memory grant and would benefit from Batch Mode Memory Grant Feedback
    • That may be parameter-sensitive and benefit from Adaptive Joins
    • That use windowing functions and may benefit from Window Aggregate operators

Even if the end query result is not millions of rows, Batch Mode can be useful to get you to your result faster.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Software Vendor Mistakes With SQL Server: Not Indexing For Windowing Functions

Gretzky


Windowing functions solve a lot of interesting problems, and allow for some useful query rewrites at times.

The problem is that you’ve been sold a bill of goods by people using preposterously small databases. Today’s laptop hardware swallows little sample databases whole, and doesn’t expose the problems that data to hardware mismatches do.

Here’s a somewhat extreme example, by way of my home server, which is a Lenovo P17 that I bumped up until I hit around the $5000 mark. Here’s what I ended up with:

To put that in perspective, most cloud cost estimators will put you at around $5000 a month for similar hardware. I get it — no one pays list — but you get the idea.

But I said all that to say this: you take all those fabulous lessons about amazing window functions from the blog post to work with you and try to apply them to your data and… gee, we’ve been waiting an awful long time here, haven’t we?

Truth Poster


I’m not gonna do that to you. After all we’ve been through together! How could you ever think that?

First, let’s talk about some thing that will screw windowing function performance up:

  • Having columns from different tables in the partition by or order by clause
  • Not specifying ROWS instead of the default RANGE to avoid an on-disk spool
  • Applying system or user defined functions to columns in the partition by or order by clause
  • Selecting unnecessary columns along with the windowing function (makes indexing more difficult why, we’ll talk about that in a minute)

If you’re doing any of those things, it’s gonna be a rough ride indexing your way out of performance problems.

Let’s look at why!

Queryable


We’re going to use this handsome devil as an example.

I’m using a little trick on SQL Server to return no rows to SSMS, but still have to process the entire ugly part of the query first.

That’s the point of filtering to where the row number equals zero. No time to wait for all that.

WITH
    p AS
(
    SELECT
        p.OwnerUserId,
        p.PostTypeId,
        p.Score,
        x = 
            ROW_NUMBER() OVER
            (
                PARTITION BY
                    p.OwnerUserId,
                    p.PostTypeId
                ORDER BY
                    p.Score DESC
            
            )
    FROM dbo.Posts AS p
    WHERE p.OwnerUserId = 22656
)
SELECT
    p.*
FROM p
WHERE x = 0;

Here’s the query plan without a useful index, just to set the stage for why we want to tune this, and what we want to fix:

SQL Server Query Plan
dreamers

To paraphrase:

  • The clustered index of the Posts table doesn’t fit in the 96GB of RAM I have assigned to this VM, so we do ~25 seconds of reading from disk
  • Since we don’t have an index that presents data in the order requested by the query, SQL Server has to get it lined up for us

All worthy reasons to tune thing, I’d say. Our goal is to give SQL Server an object that fits in memory to read from, which also stored data in the order we need for the windowing function.

What’s The Wrong Index?


Precision is the name of the game, here. These indexes won’t help us.

With partition by columns out of order:

CREATE INDEX p ON
    dbo.Posts
(
    PostTypeId,
    OwnerUserId,
    Score DESC
);

With Score in ascending order:

CREATE INDEX p ON
    dbo.Posts
(
    OwnerUserId,
    PostTypeId,
    Score
);

While it’s possible that swapping PostTypeId and OwnerUserId would yield the same results, it’s better to tailor the index to the query as it’s written if we’re creating a new one.

If we already had an index in place, I might consider testing altering the query. Looking at these, though, you can start to get a sense of how writing a windowing function that spans columns in multiple tables will cause problems with ordering.

Now, either one of those indexes would at least give the query a smaller object to initially access data from, which will improve performance but still not get us where we want to be.

Here’s the query plan shape for either of those indexes:

SQL Server Query Plan
younger

What’s The Right Index?


Finally, this index will help us:

CREATE INDEX p ON
    dbo.Posts
(
    OwnerUserId,
    PostTypeId,
    Score DESC
);

This arranges data just-so, and our query happily uses it:

SQL Server Query Plan
loud noises

It’s difficult to argue with the results here! This finishes in 25 milliseconds at DOP 1, compared to ~550ms at DOP 8 with the other index.

You Don’t Write Queries Like That, Though


When I have to fix these performance issue for clients, I usually have to do one of the following things:

  • Rewrite the query to use a smaller select list
  • Create a wider index if I can’t rewrite the query (but I’m not doing this here, because it’s a really bad idea for a table as wide as the Posts table)
  • Get Batch Mode involved (on Enterprise Edition, anyway. Don’t bother with it on Standard Edition.)

Let’s switch the query up to something like this:

WITH
    p AS
(
    SELECT
        p.*,
        x = 
            ROW_NUMBER() OVER
            (
                PARTITION BY
                    p.OwnerUserId,
                    p.PostTypeId
                ORDER BY
                    p.Score DESC
            
            )
    FROM dbo.Posts AS p
    WHERE p.PostTypeId = 1
)
SELECT
    p.*
FROM p
WHERE x = 0;

When we select everything from the Posts table and use a less selective predicate, the optimizer stops choosing our index because it estimates the cost of doing a bunch of lookups higher than just scanning the clustered index.

SQL Server Query Plan
silencers

A couple things happen, too:

  • Selecting all the columns asks for a huge memory grant, at 17GB
  • We spill, and spill a ton. We spend nearly a full minute spilling to disk

 

Rewriting The Query


We can use the primary key of the table to create a more favorable set of conditions for our query to run by doing this:

WITH
    p AS
(
    SELECT
        p.Id,
        x = 
            ROW_NUMBER() OVER
            (
                PARTITION BY
                    p.OwnerUserId,
                    p.PostTypeId
                ORDER BY
                    p.Score DESC
            
            )
    FROM dbo.Posts AS p
    WHERE p.PostTypeId = 1
)
SELECT
    p2.*
FROM p
LEFT JOIN dbo.Posts AS p2
    ON p.Id = p2.Id
WHERE x = 0;

It’s a common enough pattern, but it pays off. You may not like it, but this is what peak performance looks like, etc.

SQL Server Query Plan
beef kitchen

Future Days


We’ll cover that in tomorrow’s post, since I think it deserves a good bit of attention.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Software Vendor Mistakes With SQL Server: Dealing With Bad Parameter Sniffing

Bad Wrap


When people hear the words “parameter sniffing”, there’s almost a universally bad reaction. They lose their minds and start raving about how to “fix” it.

  • Recompiling everything
  • Using optimize for unknown hints
  • Using local variables
  • Clearing the plan cache

In today’s post, I’m gonna show you two videos from my paid training:

  • Intro to parameter sniffing
  • Parameter sniffing recap

Those explain why parameter sniffing is so tough to deal with, and why all the stuff up there in that list isn’t really the greatest idea.

There’s a whole bunch of stuff in between those two videos where I’ll teach you specifics about fixing parameter sniffing problems.

If that’s the content you’re after, hit the link at the very end of the post for 75% off my entire training catalog.

Intro To Parameter Sniffing


Parameter Sniffing Recap


Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Software Vendor Mistakes With SQL Server: Writing And Optimizing Paging Queries

Paging Doctor Darling


Words I’ll never hear, at least in reference to me. I’m sure there’s a Darling out there who is a doctor. A real doctor, too. Not one of those “I went to college for 7 extra years” doctors who don’t know anything about the human body.

But I digress! Today we’re gonna learn about paging queries. Why? Because I see people screw them up all the gosh darn ding-a-ling time.

Far and away, I think the worst problem starts at the beginning — the SELECT list — where people try to get every single possible column. There’s a much better query pattern available to you, if you’re open to writing a little bit more code.

Unfortunately, I haven’t seen any ORMs able to handle this method natively. Hard to believe developers developed the same performance-killing method in C# that they did in T-SQL.

Okay, maybe not so hard to believe. But follow along to see a much better way of doing it in the video below. It’s just a small part of my paid training, and if you like what you see there’s a link at the end of the post for 75% off the whole package.

Optimizing Paging Queries Video


Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Software Vendor Mistakes With SQL Server: Handling Optional Parameters

Optionally Yours


You’ve got tables. So many tables And columns. So many columns.

Users — demanding as they are — might wanna see, search, and order by all sorts of things in those many columns in those many tables. Writing a query to do that is hard at first, especially if you’re afraid of dynamic SQL.

Over the years, developers have come up with all sorts of “workarounds” for handling optional search parameters. The problem is that none of them work, whether it’s:

  • column = @parameter or @parameter is null
  • column = isnull(@parameter, column)
  • column = coalesce(@parameter, column, ‘magic’)

Or any variation thereof. Doing this can (and will!) screw up query performance in all sorts of ways that sneak up on you.

  • Bad cardinality estimates
  • Scanning instead of Seeking in indexes
  • Using the “wrong” indexes

In the video below, I’ll show you how to use dynamic SQL the right way to handle optional parameter search scenarios. This video is a small part of my paid training. If you like what you see, there’s a link for 75% off the entire package at the bottom of the post.

Optional Parameters Video


Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Software Vendor Mistakes With SQL Server: Writing Unsafe Dynamic SQL

SQL Injection


The first thing that comes to mind when you mention dynamic SQL is SQL injection. That makes sense! It’s a scary thing that has messed up a lot of people, places, and things.

Imagine how scary it would be to wake up one day and find there had been a data breach, defacement, server takeover, or ransomeware-d.

All of these things are possible if you write the bad kind of dynamic SQL, or think that typing square brackets will save you.

If I had my way, there would be more way more invested in making safe dynamic SQL easy to write, debug, and maintain. But I don’t have my way, and Microsoft is more focused on defrauding you with shoddy cloud performance than anything else.

To define the problem:

  • You write code that accepts user input
  • The input is a string with a non-arbitrary length
  • Which is concatenated into another string for execution
  • And not parameterized, properly quoted, etc.

Building A String


This can be done in the application, in SQL Server, or a mix of the two.

Since I’m not an application developer, I’m going to scope this to how it looks in a T-SQL stored procedure.

CREATE OR ALTER PROCEDURE
    dbo.haxer
(
    @injectable nvarchar(100)
)
AS
BEGIN
   SET NOCOUNT, XACT_ABORT ON;

/*Our placeholder variable for the string we'll build*/
DECLARE    
    @sql nvarchar(MAX) = N'';

/*The start of our query-building*/
SELECT
    @sql = N'
SELECT
    H.*
FROM dbo.hax AS H
WHERE 1 = 1
';

/*
This is where things first go wrong,
concatenating input into the string
*/
IF @injectable IS NOT NULL
BEGIN
    SELECT
        @sql += N'AND H.user_name = ''' + @injectable + ''';';
END;

PRINT @sql;

/*
This is where things continue to go wrong, 
using EXEC against the variable
*/
EXEC(@sql);

END;
GO 

There’s not much complicated here. The comments tell the story:

  • Concatenating the @injectable parameter into the string means it can bring anything the user wants into your query
  • Using EXEC directly against the @sql variable means it can’t be parameterized

Before we can run this, though, we need a table and some dummy data.

Dummy Data And String Things


Here’s a simple table, with a single row in it. I don’t think this needs much explanation.

CREATE TABLE
    dbo.hax
(
    id int PRIMARY KEY,
    user_name nvarchar(100),
    points bigint,
    location nvarchar(250),
    creation_date datetime,
    last_login datetime
);

INSERT 
    dbo.hax
(
    id,
    user_name,
    points,
    location,
    creation_date,
    last_login
)
VALUES
(
    1,   
    N'Rick Ross',
    400000000,
    'Port Of Miami',
    '20060808',
    '20211210' 
);

As a note here, this is another good reason to reasonably scope string column lengths. The shorter they are, the shorter search parameters are, and that limits the amount of potentially malicious code that can be stuffed into them. A 10-byte length string is a lot harder to cause problems with than a string with a 50-byte length, and so on. I’ve oversized a couple columns here as examples.

Is there any value in a 100 character user name, or a 250 character location? I sort of doubt it.

Executed Normally


When we execute the stored procedure as-is, searching for our lone user, we get a single result back:

EXEC dbo.haxer
    @injectable = N'Rick Ross';

The current query result isn’t important, but the result of the PRINT statement is.

SELECT
    H.*
FROM dbo.hax AS H
WHERE 1 = 1
AND H.user_name = 'Rick Ross';

There are two issues, here because of the lack of parameterization

  • It can lead to generating multiple execution plans for the same query
  • Users can substitute the legitimate search with a malicious query

Unfortunately, neither the Optimize For Ad Hoc Workloads nor the Forced Parameterization settings can protect you from SQL injection. Forced parameterization can help you with the second problem, though.

Executed Abnormally


When we execute the stored procedure with a slightly different value for the parameter, we get into trouble:

EXEC dbo.haxer
    @injectable = N'%'' UNION ALL SELECT object_id,name,schema_id,type_desc,create_date,modify_date FROM sys.tables --';

We’re not trying to do anything too crazy here, like drop a table or a database, change a setting, or other thing that you might need a lot of control of the server, database, or schema to do. We’re querying a pretty standard system view.

Now, I’m not big on security, permissions, or any of that stuff. It has never captured my fancy, so I don’t have any great advice about it. I will make a few points though:

  • Most applications have a single login
  • A lot of the time that login is given sa-level permissions
  • Even when it’s not, that login may need to do routine database things:
    • Check if databases, tables, columns, or indexes exist (especially during upgrades)
    • Create or drop all of those things
    • Execute database maintenance procedures
    • Mess with settings or Agent jobs

Without really tight control of a whole bunch of permissions, you’re gonna end up with applications that can do a ton of crazy stuff, including a list of tables in the database from this query.

SELECT
    H.*
FROM dbo.hax AS H
WHERE 1 = 1
AND H.user_name = '%' UNION ALL SELECT object_id,name,schema_id,type_desc,create_date,modify_date FROM sys.tables--';

All of this fits well within the 100 characters available in the search parameter, and the entire query is executed as one.

From here, a mean person could take one of the tables names and list out the columns (using the sys.columns view), and then use that to grab data they wouldn’t normally be allowed to see.

This how user data ends up on the dark web. Passwords, credit cards, social security numbers, phone numbers, addresses, childhood fears. You name it.

Protecting Yourself


Sure, you can steam yourself up and encrypt the crap out of everything, not store certain personal or processing details in the database, or a number of other things.

But before you go and dump out the bag of golden hammers, let’s start by eliminating the risks of dynamic SQL. We’re gonna need to rewrite our stored procedure a bit to do that.

CREATE OR ALTER PROCEDURE
    dbo.no_hax
(
    @injectable nvarchar(100)
)
AS
BEGIN
   SET NOCOUNT, XACT_ABORT ON;

/*Our placeholder variable for the string we'll build*/
DECLARE    
    @sql nvarchar(MAX) = N'';

/*The start of our query-building*/
SELECT
    @sql = N'
SELECT
    H.*
FROM dbo.hax AS H
WHERE 1 = 1
';

/*
This changed: the parameter is embedded in the string
instead of being concatenated into it
*/
IF @injectable IS NOT NULL
BEGIN
    SELECT
        @sql += N'AND H.user_name = @injectable;';
END;

PRINT @sql;

/*
This also changed: we're using EXEC against
the system stored procedure sp_executesql,
and feeding it parameters 
*/
EXEC sys.sp_executesql
    @sql,
  N'@injectable nvarchar(100)',
    @injectable;

END;
GO

In this version, here’s what we’re doing different:

  • The parameter is inside of the string, instead of concatenated in from outside
  • We’re using the system stored procedure sp_executesql to execute the string we’ve built up

It’s important to note that just using sp_executesql won’t protect you from SQL injection attacks unless the strings are parameterized like they are here.

Any Execution


No matter how we execute this, the query that gets generated and executed will look the same.

EXEC dbo.no_hax
    @injectable = N'Rick Ross';

EXEC dbo.no_hax
    @injectable = N'%'' UNION ALL SELECT object_id,name,schema_id,type_desc,create_date,modify_date FROM sys.tables--';
SELECT
    H.*
FROM dbo.hax AS H
WHERE 1 = 1
AND H.user_name = @injectable;

The important thing here is that the second execution does not result in successful SQL injection, but the first query returns correct results.

The second query returns no results this time, because the entire search string is parameterized, and no names match the supplied value.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Software Vendor Mistakes With SQL Server: IF Branching In Stored Procedures

What Goes Wrong


I still see people calling into this trap when writing stored procedures. What no one seems to realize until it’s too late data grows past a toy data base size, is that SQL Server’s query optimizer doesn’t respect IF branches as boundaries in the way you’d expect.

Sure, the control-flow logic boundaries are respected, but the big surprise to most people comes from how the optimizer treats query plans in IF branches: It compiles them all no matter what.

In the video below, I’ll show you exactly what that means, and how it can really screw up performance. This video is a small part of my paid training offering, and if you enjoy it, there’s a link at the end of the post to get 75% off.

IF Branching Video!


Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Software Vendor Mistakes With SQL Server: Using CASE Expressions In JOIN Or WHERE Clauses

Just In Case


There are some circumstances where you can use this approach in a non-harmful way. In general it’s something you should avoid, though, especially if you’re not sure how to tell if it’s harmful or not.

This is one of those unfortunate query patterns that leads to you getting called nights and weekends because the server blew up. I’ve seen it do everything from generate a reliably bad query plan, to adding just enough of an element of cardinality uncertainty that slight plan variations felt like parameter sniffing — even with no parameters involved!

Uncertainty is something we’ve covered quite a bit in this series, because if you don’t know what you want, SQL Server’s optimizer won’t either.

Up The Bomb


Let’s start with a reasonable index on the Posts table:

CREATE INDEX p ON
    dbo.Posts
(
    PostTypeId,
    OwnerUserId
);

We don’t need a very complicated query to make things bad, even with that stunningly perfect index in place.

SELECT
    c = COUNT_BIG(*)
FROM dbo.Users AS u
JOIN dbo.Posts AS p
    ON 1 = CASE     
               WHEN p.PostTypeId = 1
               AND  p.OwnerUserId = u.Id
               THEN 1
               WHEN p.PostTypeId = 2
               AND  p.OwnerUserId = u.AccountId
               THEN 1
               ELSE -1
           END
WHERE u.Reputation > 500000;

This returns a count of 3,374, and runs for about a minute total.

Naughty Query Plan


The plan for this may look innocent enough, but it’s one of those cases where a Lazy Table Spool is a warning sign.

SQL Server Query Plan
unhappy ending

Operator times in query plans are generally a blessing, because they show you where execution time ratchets up. Unfortunately, it’s not always clear what you have to do to fix the problems they show you.

I suppose that’s what posts like this are for, eh?

Oh, what a feeling.

Rewriting The Query


A Useful Rewrite© of this query looks something like this, at least if you’re a fancy-pants blogger who cares a lot about formatting.

SELECT
    c = COUNT_BIG(*)
FROM
(
    SELECT
        u.Id,
        u.AccountId
    FROM dbo.Users AS u
    WHERE u.Reputation > 500000 
) AS u
CROSS APPLY
(

    SELECT
        p.*
    FROM dbo.Posts AS p
    WHERE p.PostTypeId = 1
    AND   p.OwnerUserId = u.Id

    UNION ALL

    SELECT
        p.*
    FROM dbo.Posts AS p
    WHERE p.PostTypeId = 2
    AND   p.OwnerUserId = u.AccountId
) AS c;

I’m using a derived table here because if I used a Common Table Expression, you’d think they have some ridiculous magic powers that they really don’t, and one wouldn’t make the query any more readable.

Caring about formatting makes queries more readable.

Good game.

Nice Query Plan


You’ll notice that this plan no longer features a Table Spool, no longer runs for over a minute, and makes me happy.

SQL Server Query Plan
push it to the limit

I’m not a math major, but 181 milliseconds seems like a great improvement over 60 seconds.

Suede Shoes


This is another case of more typing for us results in a faster query. Perhaps there’s some wisdom to learning how to clearly express oneself before starting a career talking to databases.

Database Communications Major, or something. Just don’t buy it from Sally Struthers.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Software Vendor Mistakes With SQL Server: Using Left Joins To Find Rows That Don’t Exist

Natural Language


In SQL Server’s T-SQL, and in the more general ANSI-Standard SQL, you’re supposed to write queries in a way that mimics how you’d ask the same question — just don’t call it a query — in English. Because of that, there are some ways to phrase a query that are more natural than others.

Some are more intuitive once you get them down, and others can bewitch you for decades. For example, I’m still not always awesome at reasoning out INTERSECT and EXCEPT queries, or even thinking of them first when writing a query where they’d be useful.

Maybe someday.

Dirty Pig Latin


Bad advice exists in many places in the world. I don’t mean to single out the internet, though it certainly has made the publishing and proliferation of bad advice much more accessible.

I do a lot of reading about databases in general, and SQL Server specifically, to see what other folks are out there writing about and teaching others. One of my favorite arrondissements of that world is the stuff that attracts beginners.

After all, that’s when you develop habits, good or bad. Much of this series focuses on the bad habits learned in that time, and how they muck up performance.

One of those bad habits I see over and over again is using LEFT JOINs to find rows that don’t exist. The reason I focus on these is because of the effect they have on query plans, due to the way that queries have to be logically processed.

Let’s look at that, first.

Who’s Not On First?


A quite common looking query for doing this (if you want to do it wrong) would look something like this:

SELECT
   c = COUNT_BIG(*)
FROM dbo.Users AS U
LEFT JOIN dbo.Comments AS C
    ON  C.UserId = U.Id
    AND C.Score > 0
WHERE C.Id IS NULL;

The important part of the query plan is right around here:

SQL Server Query Plan
slug

If you’re looking extra closely, without any leading or prompting by me whatsoever, you’ll notice that after the join operation bring the two tables we’re querying togethers — Users and Comments — which is expressed as a left outer join of course, then and only then do we filter out rows where the Id column in Comments is NULL.

The problem is that all this is after the join, and in some scenarios this is far less efficient. Both from the perspective that you have to join many more matching rows together, and from the perspective that the optimizer can sometimes have a weird time ordering outer joins, especially when there are a lot of them.

Note that, for various reasons, this query runs for around 4 seconds total.

Who Doesn’t Exist On First?


An often better way of expressing this sort of query is using the — wait for it — natural expression of the Structured Query Language.

SELECT
   c = COUNT_BIG(*)
FROM dbo.Users AS U
WHERE NOT EXISTS
      (
          SELECT
              1/0
          FROM dbo.Comments AS C
          WHERE  C.UserId = U.Id
          AND    C.Score > 0
      );

This query is logically equivalent to the last one. It may even be arguably better looking. More elegant, as the developers say when they want to describe well-formatted code that no one really understands.

The query plan looks like this now:

SQL Server Query Plan
but better

Now we get this crazy little thing called an Anti Semi Join. That means rows are filtered out at the join rather than at some point later on in an explicit Filter operator.

To highlight things a little further, look at the actual number of rows that pass through the filter in the original query and the join in the second query:

SQL Server Query Plan
a number!

See there? The filter in the LEFT JOIN query reduces he working row set to the same number as the NOT EXISTS query does at the join.

In most cases, you’re better off writing queries this way. It may depend a bit on available indexes, batch mode, and server settings. But writing a blog post that takes every single one of those things into account would likely leave you bored out of your gourd.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.