SARGability Week: What Is It, Anyway?

Sake Of


SARGability is the in-club way of saying that a search predicate(s) can be used to seek through the key(s) of an index.

Some things that mess it up are:

  • function(column) = something
  • column + column = something
  • column + value = something
  • value + column = something
  • column = @something or @something IS NULL
  • column like ‘%something’
  • column = case when …
  • value = case when column…
  • Mismatching data types

Yes, this has all been written about quite a bit — here and elsewhere — but it’s a query pattern that I still spend a lot of time fixing.

So here we are. More writing about it.

If you’re sick of hearing about it, stop doing it.

Symptomatic


So let’s say we’ve got this table:

CREATE TABLE
    dbo.sargability
(
    id int PRIMARY KEY,
    start_date date,
    end_date date
);

Right now, the only index on this table is on the id column. Since it’s the clustered index (by default, since it’s the primary key), it also “includes” the start_date and end_date columns. Conversely, this also means that any nonclustered indexes we create will have the id column automatically added to them. In the case of a non-unique index, it will be in the key. In the case of a unique index, it will be in the “includes”.

I once had someone go back and forth with me quite a bit about that last point, insisting that the clustered index didn’t have all of the table’s columns in it. But you know, if we run this query, we only touch the clustered index:

SELECT
    s.start_date,
    s.end_date
FROM dbo.sargability AS s;
SQL Server Query Plan
2rock

Slippery


This means two things: as long as we avoid the lapses in judgement listed up above, we can seek to a single value or range of values in the id column. The index has put the data in order (in this case ascending).

It also means that start_date and end_date are not in a searchable order, so any query we write that attempts to search/filter values there will have to scan the index (unless we also search/filter the id column).

SELECT
    s.start_date,
    s.end_date
FROM dbo.sargability AS s
WHERE id = 1;

SELECT
    s.start_date,
    s.end_date
FROM dbo.sargability AS s
WHERE s.start_date = '20210808';

SELECT
    s.start_date,
    s.end_date
FROM dbo.sargability AS s
WHERE s.end_date = '20210808';
SQL Server Query Plan
three times a lady

Recreation


Even though the equality predicates on start_date and end_date are perfectly SARGable, there’s no index for them to use to seek to those values in. They’re only in the clustered index, which is ordered by the id column. Other columns are not in a helpful order.

The fact that both of these queries end up scanning the clustered index may leave you under the impression that the isnull version is an acceptable practice.

SELECT
    s.start_date,
    s.end_date
FROM dbo.sargability AS s
WHERE s.end_date = '20210808';

SELECT
    s.start_date,
    s.end_date
FROM dbo.sargability AS s
WHERE ISNULL(s.end_date, '19000101') = '20210808';

But with an index on the column the problem becomes more apparent, with the “good” query seeking and the “bad” query scanning.

CREATE INDEX s ON dbo.sargability(end_date) INCLUDE(start_date);
SQL Server Query Plan
managerial

Evidence


For the rest of the week, we’re going to look at various ways to fix non-SARGable queries with things like computed columns, temp tables, index key column order, and dynamic SQL.

These are the approaches I normally take in my query tuning work, so hopefully others will find them helpful.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

On Entity Framework And SQL Server

Specificity


I know there are other ORMs in the world, but since my working life is spent performance fixing SQL Server, I end up seeing Entity Framework the most.

This isn’t going to be an Entity Framework bash-fest (that’s what Twitter is for), it’s more a critique of the way developers blindly trust this abstraction layer to always do the smart and sensible thing.

After all, I’ve seen applications using Entity Framework quite successfully. Part of what made it successful was the developers being comfortable with the database, figuring out how what they do with code gets translated into a query, and when that ends up going terribly long.

One of  the big lightbulb moments they had was realizing that maybe doing it all in one big query isn’t always the best option.

Until It Doesn’t


On small databases, perhaps with uncomplicated schema and application requirements, you can get by without giving much of a care about these things.

However, as databases grow up, more tables get added, and all that, you need to start paying more attention. I’ve said it before: abstraction isn’t magic, and that holds true for Entity Framework, too.

I totally understand: you might not know anything about databases, might not  have any interest in learning more about databases, and your job is to focus on the code to develop new features. Because of that, you put your faith and trust into Entity Framework to do things the best, most correct way.

That isn’t always what happens, though. It might work, but it might not work well. Not being an Entity Framework expert, I’m not always sure if the disconnect is in the way Entity Framework is designed, or in the way the developer used it.

Funny Valentine


For transactional queries, you’ll generally be okay so long as you obey common database decency rules around table and index design.

What is usually overly-ambitious is reporting and dashboard population queries. They’re often interpreted poorly, with dozens of left joins and nested derived tables.

And look, no, I don’t expect someone coding those types of queries in Entity Framework to be able to write better T-SQL queries. They probably have even less training and experience there. I’m basically repeating myself: abstraction isn’t magic.

If you’re going to work heavily with Entity Framework code that hits SQL Server, you need to:

  • Get into databases
  • Get someone who’s into databases

You need someone who can get in there, find problem queries, review indexes, and help track down which sections of the code generate them.

Whether some portions of the application need to be replaced with stored procedures, or you write custom SQL that can take advantage of a different approach to accessing the data, you need someone with the skills to write that T-SQL well, or you’ll just end up with the same problem in a different way.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Finding Queries With Multiple Plans In Query Store

Cached Out


There are lots of examples of how to do this with the plan cache, but, well, the plan cache can be an awfully unstable place.

Query store being a bit more historically reliable, we can use some of the same tricks to track them down there too.

SELECT
    x.total_query_plans,
    qsq.query_hash,
    qsp.query_plan_hash,
    query_plan =
        TRY_CONVERT
        (
            xml,
            qsp.query_plan
        )
FROM 
(
    SELECT
        qsq.query_hash,
        distinct_query_plans = 
            COUNT_BIG(DISTINCT qsp.query_plan_hash),
        total_query_plans = 
            COUNT_BIG(qsp.query_plan_hash)
    FROM sys.query_store_query AS qsq
    JOIN sys.query_store_plan AS qsp
        ON qsq.query_id = qsp.query_id
    GROUP BY 
        qsq.query_hash
    HAVING  COUNT_BIG(DISTINCT qsp.query_plan_hash) > 1
    AND     COUNT_BIG(DISTINCT qsp.query_plan_hash)
                <= COUNT_BIG(qsp.query_plan_hash)
) AS x
CROSS APPLY
(
    SELECT TOP (x.total_query_plans)
        qsq.*
    FROM sys.query_store_query AS qsq
    WHERE x.query_hash = qsq.query_hash
) AS qsq
CROSS APPLY
(
    SELECT
        qsp.*
    FROM sys.query_store_plan AS qsp
    WHERE qsp.query_id = qsq.query_id
) AS qsp
ORDER BY 
    x.total_query_plans DESC;

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Finding Single Use Queries In Query Store

Caching The Line


This metric gets looked at a lot in the plan cache to see how effective it is. The main problem is that with high enough churn, you might not catch all the queries involved in the problem. Here are a couple ways to look at this in Query Store.

WITH x AS
(
    SELECT
        single_use_queries = 
            SUM
            (
                CASE 
                    WHEN qsrs.count_executions = 1 
                    THEN 1 
                    ELSE 0 
                END
            ),
        total_queries = 
            COUNT_BIG(*)
    FROM sys.query_store_runtime_stats AS qsrs
)
SELECT
    x.*,
    percent_single_use_plans = 
        CONVERT
        (
            decimal(5,2),
            single_use_queries / 
                (
                    1. * 
                    NULLIF
                    (
                        x.total_queries, 
                        0
                    )
                ) * 100.
        )
FROM x;

SELECT
    qsqt.query_sql_text
FROM sys.query_store_query_text AS qsqt
WHERE EXISTS
      (
          SELECT 
              1/0
          FROM sys.query_store_query AS qsq
          JOIN sys.query_store_plan AS qsp
              ON qsq.query_id = qsp.query_id
          JOIN sys.query_store_runtime_stats AS qsrs
              ON qsp.plan_id = qsrs.plan_id
          WHERE qsqt.query_text_id = qsq.query_text_id
          AND   qsrs.count_executions = 1
      );

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Things SQL Server vNext Should Address: Unify Paid Editions

Cloudy With A Chance


The more I work with Standard Edition, the more frustrated I get, and the more I have to tell people about the cost difference between it and Enterprise Edition, the more people start asking me about Postgres.

I wish that were a joke. Or that I knew Postgres better. Or that I knew the PIVOT syntax

(That was a terrible joke)

Mold And Musted


I’ve written about my Standard Edition annoyances in the past:

In the past I’ve thought that offering something between Standard and Enterprise Edition, or add-ons depending on what you’re after would be a good move.

For example, let’s say you want to unlock the memory limit and performance features, or you want the full Availability Group experience, you could buy them for some SA-like tax. But that just… makes licensing more complicated, and it’s already bad enough.

One install, one code base, one set of features, no documentation bedazzled with asterisks.

Perhaps best of all, everyone can stop complaining that Developer Edition is misleading because you can’t turn off Enterprise Edition features.

And you could better line the bits up with that’s in Azure SQL DB and Managed Instances.

Priceline


I have no idea how to handle the pricing, here. Perhaps that could also better line up with Azure offerings as well.

At any rate, something here has to give. Standard Edition is entirely uncompetitive in too many ways, and the price is too far apart from Enterprise Edition to realistically compare. That $5,000 jump per core is quite a jaw-dropper.

One option might be to make Express Edition the new Standard Edition, keeping it free and giving it the limitations that Standard Edition currently has.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Dealing With SQL Server Performance Problems From Unparameterized IN Clauses From Entity Framework

Seriously


If you’re using Entity Framework, and sending in queries that build up IN clauses, they won’t end up getting parameterized.

Even Forced Parameterization won’t help you if you’re sending in other parameters. One limitation is that it doesn’t kick in for partially parameterized queries.

Even if they did get parameterized, well, what’s one funny thing about IN clauses? You don’t know how many values will be in them.

You’d end up with a different plan for every variation in the number of values, though at least you’d get plan reuse within those groups.

Griper


Like most people, I use Twitter to gripe and brag. So here we are:

Yes, I’m surprised that a code base that has been around for as long as Entity Framework hasn’t already dealt with this problem. I’ve said it before: someone ought to introduce the Entity Framework team to the SQL Server team.

But thankfully, some people use Twitter to be helpful, like ErikEJ (b|t|g), who has a solution that works with EF Core.

Alt Code


My friend Josh (b|t) helped me out with some code that works in non-Core versions of Entity Framework too.

using (var context = new StackOverflowContext())
    {
    context.Database.Log = Console.WriteLine;
    // http://www.albahari.com/nutshell/predicatebuilder.aspx
    var predicate = PredicateBuilder.False<User>();

    for (int i = 0; i < 100; i++)
    {
        var value = userIds[i >= userIds.Count - 1 ? userIds.Count - 1 : i];
        predicate = predicate.Or(u => u.Id == value);
    }

    var users = context.Users
        .AsExpandable() // http://www.albahari.com/nutshell/linqkit.aspx
        .Where(predicate)
        .ToList();

This is helpful when you have an upper limit to the number of values that could end up in your IN clause. This is cool because you’ll always generate 20 parameters, and pad out the list with the last value. That means you’ll get one query plan regardless of how many parameters actually end up in there.

I do not suggest setting this to an arbitrarily high number as a catch all. You will not be happy.

If you don’t have a known number, you can use this:

using (var context = new StackOverflowContext())
   {
   // http://www.albahari.com/nutshell/predicatebuilder.aspx
   var predicate = PredicateBuilder.False<User>();

   foreach (var id in userIds)
   {
       predicate = predicate.Or(u => u.Id == id);
   }

   var users = context.Users
       .AsExpandable() // http://www.albahari.com/nutshell/linqkit.aspx
       .Where(predicate)
       .ToList();

And of course, even though it’s probably better than no parameterization at all, you will still get different query plans for different numbers of values.

And did I mention that Josh has a Great Series Of Posts™ on bad EF practices?

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

How Many Threads Can One Query Use In SQL Server?

Overlook


Something that many people overlook about parallel plans is that while they are limited to DOP active CPUs, they can suck up way more threads.

How many more depends on how many branches can be active at the same time.

Plus the coordinator thread.

Big Query


In order to test these limits out, we need a rather large query.

Since I’m lazy, I’m using a bit of dynamic SQL to make the query I want. Along the way figuring out how many joins I could gin up before things went amok, I learned some fun things.

For example, if I used Hash Joins, I’d get an error that the Query Processor ran out of stack space, and if I used Nested Loops joins I wouldn’t get the requisite parallel exchanges necessary to have multiple parallel zones.

And if I don’t use a force order hint, I’ll end up spending a really long time waiting for a query plan to compile. It wasn’t a good time.

There’s also a tipping point with the number of joins, where if I go over a certain number, my query’s DOP gets downgraded to one.

yogurt land

Outcoming


After finding my sweet spot at 316 joins, I still had to toggle with DOP a little.

On my home server, I have 706 worker threads available.

With my 316 join query, I was able to reserve 634 worker threads.

bile

But that’s where I topped off. After that, the query would get downgraded to DOP 1 and only ask for 1 thread.

Qurious


I’m not sure if there’s some built-in cap on how many threads a query can use, or if the limit is global before downgrades start happening.

What I found more interesting was that even though the query reserves 634 workers, those workers weren’t immediately subtracted from available workers on the server.

Technically, reserved threads don’t exist yet. They haven’t been created or attached to a task. All the reservation does is prevent other queries from reserving threads beyond the configured limit.

For example, if I run two copies of the big query at the same time, one is downgraded to DOP 1, likely because it’s hinted to DOP 2 and that’s the next lowest DOP, and it can’t run at DOP 2 and reserve 634 more threads that the first query has already reserved.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Some Updates to sp_QuickieStore and sp_PressureDetector

In Trunks


I’ve made a few changes and improvements to these scripts. Here’s the changelog:

sp_PressureDetector:

  • Improved introduction
  • Added help section
  • Changed column order to put DOP information next to thread usage information
  • Added reserved threads to the CPU overview (2016+)

sp_QuickieStore:

  • Added a filter to the “maintenance” section to screen out automatic statistics updates (SELECT StatMan…)

A new function!

You can grab updated versions and get support over on GitHub.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Things SQL Server vNext Should Address: Adaptive DOP

Speed Demon


Intelligent Query Processing (IQP) is quite a neat set of features. It allows the SQL Server Engine some flexibility in the plans that get used and re-used.

One in-flight example of IQP is the Adaptive Join, where a choice between Hash and Nested Loops Joins can be made at runtime based on a row threshold.

I think that threshold should also apply to serial and parallel plans, too.

Riddled


Right now, SQL Server can downgrade DOP when a server is under CPU pressure. I have a demo of that in this video about sp_PressureDetector.

The query plan will still look like it’s going parallel, but in reality it will only be running on a single thread.

Here’s the thing: I think that should happen more often, and I think it should be based on the same row thresholds that are used for Adaptive Joins.

If a query starts running and

  • It’s a serial plan, but way more rows start getting processed, DOP should scale up
  • It’s a parallel plan, but way fewer rows get processed, DOP should scale down

Perhaps the first point could be addressed more aggressively than the second, because it’s far more likely to cause a performance issue, but hey.

Think big.

Ghost Whopper


Queries that process lots of rows are typically the ones that benefit from going parallel.

Eight threads dealing with a million rows a piece will go a lot better than one thread dealing with eight million rows on its own.

This is another important piece of the parameter sniffing puzzle, too. Often I’ll be able to tune queries and indexes so that the same general plan shape is used, but the key difference is a parallel plan still being much better for a large data set.

Right now, I’m a little stuck optimizing for the large value, or using dynamic SQL to get different query plans.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Why You Should Stop Looking At Query Costs In SQL Server

Spare No Expense


Over the years, there have been a lot of requests to get sp_BlitzCache to sort results by query cost. I understand why. It’s assumed that the optimizer is never wrong and that cost is directly associated with poor performance.

There are also rather misguided efforts to figure out parallelism settings based on plan costs. The main problem with that being that if you currently have a lot of parallel queries, all that means is that the estimated cost of the serial plan was higher than your current Cost Threshold For Parallelism setting, and the cost of the parallel plan was less than the cost of the serial plan.

If you increase Cost Threshold For Parallelism, you may very well still end up with a parallel plan, because the serial version was still more expensive. If you eventually change Cost Threshold For Parallelism to the point where some queries are no longer eligible for parallelism, you may eventually find yourself unhappy with the performance of the serial version of the query plan.

Albeit with less overall wait time on CX* doodads.

Next you’ll be complaining about all the SOS_SCHEDULER_YIELD waits you’ve got.

Insteads


Rather than look at estimated metrics, you should be looking at how queries actually perform. For most servers I look at, that means looking at queries with high average CPU time, and large memory grants. Those metrics typically represent tunable aspects of the query.

In other cases, you might look at wait stats to direct the type of queries you want to go after. Reads, writes, and executions are also valuable metrics at times.

One danger of looking at totals rather than averages is that you may find things that do a little bit of something a whole lot of times, and there’s no real way to tune the small bit of activity they generate other than to run the query less.

What’s A Cost For?


In general, I only tend to look at costs to figure out plan choices within a query, or when comparing two different plans for “the same” query.

This is where experimenting with hints to change the plan shapes and choices can show you why you got the plan you did, and what you might have to do to get the plan you want naturally.

Let’s say you want to figure out why you got a specific join type. You hint the type of join you want, and there’s a missing index request now. Adding the index gets you the plan shape you want without the hint. Everyone lived happily ever after.

Until the index got fragmented ???

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.