Erik Darling, Author at Darling Data

Common SQL Server Consulting Advice: Adding Indexes To Replace Multiple Indexes

Posted on April 19, 2022May 16, 2022 by Erik Darling

Buffet Tables

If you’ve ever spent any time reading about relational database theory, you’ve probably read all sorts of things about normal forms and all that good stuff. You’ve also probably immediately forgotten all of it.

At least that’s what I’d guess is what happened based on what I see when I look at database designs out there. Tables are often dozens of columns wide, with all sorts of things in there that should be in their own table.

Columns with a common prefix, like Customer_Name, Customer_Address, etc.
Columns with a common suffix, like PhoneNumber_1, PhoneNumber_2, etc.

Even fairly-well designed tables often get somewhat overloaded with information, especially when years of additional development requirements lead to a choice between adding one more column to a table and normalizing several column out to a new table.

Sound familiar?

Let’s move on.

Indexing

The more columns you have in a table, the more potential column combinations there are for indexes. Much like columns, indexes tend to get added following the path of least resistance.

Very rarely does someone consider current indexes when deciding to add an index. They might be doing something like:

Evaluating missing index requests in a query plan
Evaluating missing index requests in the DMVs
Running DTA like a darn fool and saying yes to everything

I’ll often run into tables that have indexes like these:

CREATE INDEX p1 ON dbo.Posts(OwnerUserId);
CREATE INDEX p2 ON dbo.Posts(OwnerUserId, PostTypeId);
CREATE INDEX p3 ON dbo.Posts(OwnerUserId, CreationDate, LastActivityDate);
CREATE INDEX p4 ON dbo.Posts(OwnerUserId, PostTypeId, CreationDate);
CREATE INDEX p5 ON dbo.Posts(OwnerUserId, PostTypeId, Score);
CREATE INDEX p6 ON dbo.Posts(CreationDate, LastActivityDate, Score);

I’m leaving off includes here, because you can combine those in any order without disturbing much.

Key columns are a different story, of course. Order matters quite a bit, because via the magic of co-dependency key column order defines how queries can efficiently access data.

Merging

Looking at this list of indexes, we know a couple things immediately:

The index on just OwnerUserId isn’t of any obvious value, and we can get rid of it
The index that leads on CreationDate serves a totally different type of query, and we should leave it alone for now

Assuming all of these indexes have a not-crappy read to write ratio, what do we do next?

Make sure none of them are unique
Make sure none of them are filtered

These indexes all have multiple key columns that sort of line up:

CREATE INDEX p2 ON dbo.Posts(OwnerUserId, PostTypeId); 
CREATE INDEX p3 ON dbo.Posts(OwnerUserId, CreationDate, LastActivityDate); 
CREATE INDEX p4 ON dbo.Posts(OwnerUserId, PostTypeId, CreationDate); 
CREATE INDEX p5 ON dbo.Posts(OwnerUserId, PostTypeId, Score);

Of these, three of them are on OwnerUserId and PostTypeId, and one isn’t. We can throw that one out.

CREATE INDEX p2 ON dbo.Posts(OwnerUserId, PostTypeId); 
CREATE INDEX p4 ON dbo.Posts(OwnerUserId, PostTypeId, CreationDate); 
CREATE INDEX p5 ON dbo.Posts(OwnerUserId, PostTypeId, Score);

Okay, now what? Just looking at the definitions here, we can safely get rid of the first index, because any query that uses it can safely use the other two indexes.

But for the second two, we can’t do much. It’s not necessarily safe to add the third key column in either one as a fourth key column.

We have to look at how they’re used, first. If one index is unused or rarely used, we could make that change.

Things might get more difficult to figure out if we have another index like this:

CREATE INDEX p4 ON dbo.Posts(OwnerUserId, PostTypeId, CreationDate); 
CREATE INDEX p5 ON dbo.Posts(OwnerUserId, PostTypeId, Score);
CREATE INDEX p7 ON dbo.Posts(OwnerUserId, PostTypeId, CreationDate, LastActivityDate);

Sure, it means we could get rid of the index that ends with CreationDate, but it makes the picture a little less clear about how we should deal with Score.

Ending

To be safe, I’d probably stop after just getting rid of the ones where the key columns are a match as far as order goes, and are a subset of wider indexes.

CREATE INDEX p1 ON dbo.Posts(OwnerUserId); --Drop this one
CREATE INDEX p2 ON dbo.Posts(OwnerUserId, PostTypeId); --Drop this one
CREATE INDEX p3 ON dbo.Posts(OwnerUserId, CreationDate, LastActivityDate); --Keep this one
CREATE INDEX p4 ON dbo.Posts(OwnerUserId, PostTypeId, CreationDate); --Drop this one
CREATE INDEX p5 ON dbo.Posts(OwnerUserId, PostTypeId, Score); --Keep this one 
CREATE INDEX p6 ON dbo.Posts(CreationDate, LastActivityDate, Score); --Keep this one
CREATE INDEX p7 ON dbo.Posts(OwnerUserId, PostTypeId, CreationDate, LastActivityDate); --Keep this one

After that, I’d probably look at usage stats, to see if any of them just aren’t holding up their end of the bargain.

Thanks for reading!

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Common SQL Server Consulting Advice: Adding And Removing Indexes

Posted on April 18, 2022April 21, 2022 by Erik Darling

A Good Thing

Since you’re here reading this, I’m going to assume you know what food is. Further, I’m going to assume that you also know what salt is, because salt plays a pretty big part in how good or bad food tastes.

Skimp on salt, and you’ll see a lot of disappointed faces
Go overboard on salt, and you’ll see a lot of sour faces

This is a lot like what you’ll see if you do either one of those things with indexes in your database, unless you have Special Medical Exemptions©.

If it’s a data warehouse, you’ll probably want fewer indexes, but you’ll also probably want to use column store indexes
If it’s normal throughput OLTP, you’ll probably want more indexes than a data warehouse, but with narrower key column choices to support seek and lookup plans
If it’s OLTP + Reporting (or analytics or whatever you wanna call it) you’re going to need to make a lot of tough indexing choices for them to live together
If it’s high throughput OLTP, you’ll probably want even fewer overall indexes, and you may need to look into in-memory options for the busiest tables

With all that in mind, there might be some local factors that make some if this stuff not align particularly well with your workload. Things like version, edition, query patterns, and hardware might be holding you back. And of course, being at the mercy of a terrible vendor application would also likely put a damper on many of these plans.

If you want a quick and easy way to find indexes to both remove and add, go ahead and grab sp_BlitzIndex.

Practically Common

In my consulting work, the most common type of workload I see is OLTP with reporting on top of it, followed by analytics workloads. Pure OLTP workloads are pretty rare.

The index strategy for these workloads might look something like this:

Narrow key indexes with very few includes or OLTP queries
Fairly narrow key indexes with wider includes for reporting queries, or…
Nonclustered column store indexes for reporting queries

It’s always tricky figuring out what the best set of indexes to have around for your workload is. There are so many missing index requests, new queries, users complaining, and new tables getting added that it’s hard not to feel like you’re drowning.

You can’t have the perfect index for every query, and you shouldn’t try to. You’d end up with the sour-salt face kind of database. But what I need you to do is make a deal with yourself: Any time you want to add an index, you have to look for an index to get rid of.

Why?

Because without fail, I see these patterns all the time:

Indexes with a single key column along wider indexes that lead with the same column
Indexes with the exact same key columns and key column order (sometimes one has more key columns at the end)
Indexes that are totally unused by read queries but get modified a lot
Indexes with only a few reads and a ton of writes

It’s all very bothersome clutter, and you’ve got to clear these out before you can add in more.

Commonly Practical

There are many good reasons to avoid over-indexing. Everyone says “writes get slow”, but that’s a shabby meme. Your writes are slow because your storage is crappy and possibly because of your poorly thought out Availability Group.

Why having an excessive number of indexes (again, barring Special Medical Exemptions©) can be bad for your database performance:

Locking:
- More objects to lock means more locking generally
- More objects to lock means a greater chance of lock escalation attempts
Buffer pool:
- More indexes means more objects in the buffer pool
- Even indexes unused by reads need to be modified when the table changes
Transaction logging:
- More indexes to modify means more to keep track of in the transaction log
- Even databases in simple recovery log the same amount 99% of the time

So the next time you want to go add an index to a table, I want you to try a few things:

Look for indexes on the table that are totally unused
Look for indexes on the table with overlapping key columns to merge
Look for indexes that could be tweaked rather than creating a whole new one

To do that, sp_BlitzIndex run like so:

EXEC dbo.sp_BlitzIndex
    @DatabaseName = N'YourDatabase',
    @SchemaName = N'ProbablyDBO',  
    @TableName = N'YourIndexingCatastrophe';

Thanks for reading!

Going Further

Common SQL Server Consulting Advice: Setting Target Recovery Interval

Posted on April 15, 2022May 16, 2022 by Erik Darling

Everywhere You Look

Finding out about this was probably like when some probably guy probably in a wig or something discovered atoms and was like “ohh wowwe therre are’st atommes ev’rywhislt I doth looketh, happie dais!”.

If that’s not historically accurate, I apologize. I’m a child of divorce and public school systems.

If you want a quick and dirty way to see if your server is having a tough time with checkpoint activity, you can run this to search your most recent error log for relevant messages.

If you recently restarted your server, you may need to search additional error logs
If you cycle your error logs frequently, you may need to search additional error logs
If your server is associated with Cloud Nonsense© these commands may not work

CREATE TABLE 
    #ErikLog
(
    LogDate datetime,
    ProcessInfo varchar(50),
    Text varchar(MAX)
);

INSERT 
    #ErikLog WITH(TABLOCK)
(
    LogDate,
    ProcessInfo,
    Text
)
EXEC sys.xp_readerrorlog 
    0, --Just the first log file
    1, --SQL Server error log
    N'FlushCache'; --Search string

INSERT 
    #ErikLog WITH(TABLOCK)
(
    LogDate,
    ProcessInfo,
    Text
)
EXEC sys.xp_readerrorlog 
    0, --Just the first log file
    1, --SQL Server error log
    N'saturation'; --Search string

SELECT
    el.*
FROM #ErikLog AS el
ORDER BY el.LogDate;

DROP TABLE IF EXISTS
    #ErikLog;

You can adjust the first parameter there to look back at older log files, too, if you need to.

So what happened? Was it everything you imagined?

Did You Find Anything?

If you got some hits searching through the error log, it may be a sign that you need to make a change.

Who can change this? You!
What change? Target Recovery Interval!
Where do I change it? At the database level!
When should I change it? Whenever, my friend!
Why database level and not at the server level? If your databases are in an AG or something, you want the change to follow them to other servers!
How do I change it? Like so!

DECLARE 
    @sql nvarchar(max) = N'';
 
SELECT 
    @sql += 
        N'ALTER DATABASE ' + 
        QUOTENAME(d.name) + 
        N' SET TARGET_RECOVERY_TIME = 60 SECONDS;' + 
        NCHAR(13)
FROM sys.databases AS d 
WHERE d.target_recovery_time_in_seconds <> 60
AND   d.database_id > 4 
AND   d.state = 0 
AND   d.is_in_standby = 0 
AND   d.is_read_only = 0;
 
SELECT 
    cmd = @sql;

Why Are We Doing This?

Long checkpoints can cause all sorts of weirdball issues while they’re happening, and they don’t just happen when you run a checkpoint command.

Log and differential backups cause them, as well as internal stuff around recovery time.

The good news is that this is the default for databases creating on SQL Server 2016 or better. The bad news is that most of you out there started creating your databases way before that, and you’ve just been restoring them to newer versions without checking out any of the hot new gadgets and gizmos available.

Not that I blame you. Stuff like this doesn’t exactly make headlines, and you’re busy with important stuff, like hemming and hawing over fill factor.

Thanks for reading!

Going Further

Common SQL Server Consulting Advice: Setting Instant File Initialization

Posted on April 14, 2022April 3, 2025 by Erik Darling

Did You Forget?

This is another one of those settings that’s so important, Microsoft put a switch for it in the installer. And yet, I end up looking at servers where it’s not enabled.

When I ask why, I usually expect to hear some 2003 wisdom about security. But that doesn’t happen; it’s just blank stares.

How do you know if this is enabled?

SELECT  
    dss.servicename,
    dss.service_account,
    dss.instant_file_initialization_enabled 
FROM sys.dm_server_services AS dss;

Simple as that.

Why Enable Instant File Initialization?

Turning this on allows SQL Server data files to grow without writing a bunch of zeroes out to disk. That can be useful in a number of ways.

Just picture SQL Server giving disk space a big ol’ hug and Windows not even getting jealous.

But Erik, you said data files, what about all those important transaction log files?

Sorry, there’s no helping them.

Put them on the fastest storage you can get and pray.

Enabling Instant File Initialization

You can use this command from dbatools to enable IFI. If you’re not comfortable with PowerShell, you can get it done this way:

If for some reason you didn’t turn this on when you installed SQL Server, just take the service account you got from up yonder, and type “secpol.msc” into the search bar.

After that, click on “User Rights Assignment” and look for the “Perform Volume Maintenance Tasks” entry in the Policy tab.

From there, it’s a matter of adding the service account here. If you’re using the default SQL Server service account, “NT Service\MSSQLSERVER…” it will likely already have it.

But you know, just in case. G’head and click that.

Thanks for reading!

SQL Server 2019 And Up

If you’re using a somewhat modern version of SQL Server, you can use SQL Server Configuration Manager to add this setting in.

Going Further

Common SQL Server Consulting Advice: Setting Lock Pages In Memory

Posted on April 13, 2022April 3, 2025 by Erik Darling

Locked Up

It seems lately that every time I get someone to enable lock pages in memory on a server with a lot of memory, things change for the better.

Not because SQL Server is some how holding onto memory better, but because it gets direct access to physical memory without having to muck about with a bunch of virtual memory first. SQL Servers with large amounts of memory can especially benefit from this setting, because it allows them to access memory via a different API. The easy way to think of it is that SQL Server will get direct access to physical memory, instead of virtual memory.

Allocates physical memory pages to be mapped and unmapped within any Address Windowing Extensions (AWE) region of a specified process.

The AllocateUserPhysicalPages function is used to allocate physical memory that can later be mapped within the virtual address space of the process. The SeLockMemoryPrivilege privilege must be enabled in the caller’s token or the function will fail with ERROR_PRIVILEGE_NOT_HELD.

I generally advise people with good chunks of memory to enable this setting. There are very few good reasons not to on big boxes, and that’s why it should be called out in the installation process. Enabling it later means rebooting, and that sucks.

Let’s figure out if we’re using that setting!

SELECT
    locked_page_allocations_gb = 
        CONVERT
        (
            decimal
            (38, 2), 
            dopm.locked_page_allocations_kb / 1024. / 1024.
        ),
    large_page_allocations_gb = 
        CONVERT
        (
            decimal
            (38, 2), 
            dopm.large_page_allocations_kb / 1024. / 1024.
        ),
    memory_model = 
        (
            SELECT 
                dosi.sql_memory_model_desc 
            FROM sys.dm_os_sys_info AS dosi
        )
FROM sys.dm_os_process_memory AS dopm;

This will tell you how many gigs of pages and large pages are currently locked in memory, along with the memory model in use for your server.

Most common are:

Conventional (boo, probably)
Lock Pages (yay, definitely)

You may see large pages if you use lock pages in memory alongside trace flag 834, but that’s less common.

In my experience, you’ll see large page allocations even without that trace flag if column store indexes are in use. There may be other factors, but that’s what I’ve noticed.

Locked Down

If you’re into PowerShell, you can use this command from dbatools to set “LPIM”.

I usually go in the ol’ fashioned way.

The first thing you need to do is get the login that SQL Server uses. You can get this from:

SQL Server Configuration Manager
The Services control panel
The below query

SELECT  
    dss.servicename,
    dss.service_account
FROM sys.dm_server_services AS dss;

I don’t care which one you use. I think the control panel route is more reliable, because like 9000% of the time when I search the *&^(*&% search bar for configuration manager it can’t find it, and then when it does you get some snap error.

So, services it is. Copy that haughty chieftain down somewhere. Then type “secpol.msc” into the search bar, because somehow Windows can always find that.

Click on “User Right Assignment”, Then find the “Lock Pages In Memory” entry in the Policy list, and pop your SQL Server login in there.

Cool. Great. You’re done, right?

Wrong. You need to recycle SQL Server services for it to kick in.

But you can do that whenever, right?

SQL Server 2019 And Up?

If you’re using a somewhat modern version of SQL Server, you can use SQL Server Configuration Manager to add this setting in.

Thanks for reading!

Going Further

Common SQL Server Consulting Advice: Setting Min Server Memory And Max Server Memory

Posted on April 12, 2022May 16, 2022 by Erik Darling

The Max For The Minimum

I’ve written several hundred thousand times about how important memory is to SQL Server, and to most databases in general. But once you have it, how do you make the most of it?

Well, there’ll be a few posts in this series that cover different ways of doing that, but we’re gonna start with some basic ones that I see folks tinker with in all the wrong ways.

If you don’t know how to change these settings, here are some example commands. Note that the max server memory command is supplied with -1, which will throw an error.

Why? Because this is highly dependent on Local Factors©

EXEC sys.sp_configure
    @configname = 'show advanced options',
    @configvalue = 1;
RECONFIGURE;

EXEC sys.sp_configure
    @configname = 'min server memory (MB)',
    @configvalue = 0;
RECONFIGURE;

EXEC sys.sp_configure
    @configname = 'max server memory (MB)',
    @configvalue = -1;
RECONFIGURE;

Let’s figure out those local factors!

First, let’s get a stupid query to see what you have things set to:

SELECT
    x.name, 
    x.value_gb, 
    x.value_in_use_gb, 
    x.total_physical_memory_gb,
    percent_of_total_memory = 
        (x.value_in_use_gb / x.total_physical_memory_gb) * 100
FROM 
    (
    SELECT
        c.name,
        value_gb = 
            CONVERT
            (
                bigint,
                c.value
            ) / 1024,
        value_in_use_gb = 
            CONVERT
            (
                bigint,
                c.value_in_use
            ) / 1024,
        dosm.total_physical_memory_gb     
    FROM sys.configurations AS c
    CROSS JOIN
    (
        SELECT 
            total_physical_memory_gb = 
                CEILING
                (
                    dosm.total_physical_memory_kb / 1024. / 1024.
                )
        FROM sys.dm_os_sys_memory AS dosm
    ) dosm
    WHERE c.name IN 
    (
        N'min server memory (MB)',
        N'max server memory (MB)'
    )
) AS x;

This will tell you what you have min and max server memory set to, what the total physical memory in the server is, and what percent of the total memory each setting is.

Now that you have those numbers in mind, let’s talk about them.

Min Server Memory

This section can be nice and short: If you have this set to a number other than zero (which is the default, don’t play with me on this one), someone either changed the wrong setting, or took all their goofy pills.

Or both. One thing leads to another, as they say.

There are some weird cases where you might see 16, which shows up on some servers running 64bit versions of SQL Server.

If you see an 8 here, that means you’ve temporarily spawned in the wrong timeline and you need to speak to your handler about being sent back to when 32bit software was acceptable.

The worst case here is having min server memory set equal to max server memory, which prevents SQL Server from using dynamic memory, and can cause all sorts of weird issues. Bottom line: don’t do it.

If the same value is specified for both min server memory and max server memory, then once the memory allocated to the SQL Server Database Engine reaches that value, the SQL Server Database Engine stops dynamically freeing and acquiring memory for the buffer pool.

If you have weird stuff going on with your server, like the plan cache throwing up constantly, queries that are struggling with memory grants, or other oddities, you should check to make sure min server memory is not set to max server memory.

Max Server Memory

Okay, let’s make this creakin’ easy, too. If the percent_of_total_memory column is over 100% then we’re back to the goofy pills theory of database administration.

If it’s a number between 70 and 90%, you’re probably in okay shape. Any lower, and you’re probably doing something asinine like stacking SQL Server instances and we’re back to the goofy pills theory of database administration.

But look, there’s all sorts of stuff you have to think about when you set max server memory in some situations:

Is SSAS, SSIS, or SSRS installed?
Are you running a workload with column store and/or batch mode components?

I’ve never opened up any SS*S component, so I’m not going to tell you how much memory you should set aside for them. I also don’t want to get in the habit of giving the wrong advice, either. You should install these things on their own server where they can use their own local resources and be much less of a bother to your production workload.

The middle point is becoming more and more of an issue though. Column store indexes and Batch Mode query execution are… different about memory.

You’ll either need to over-provision memory for those workloads, or set max server memory to a lower number than you’d would usually to accommodate them.

Pacer

That’s recap backwards, in case you were wondering.

Hopefully this post gives you a useful query to look at memory settings, and how they compare to the memory your SQL Server has in it.

Tomorrow, we’ll talk about Lock Pages In Memory. It’s one of my favorite settings.

Thanks for reading!

Going Further

Common SQL Server Consulting Advice: Setting MAXDOP And Cost Threshold For Parallelism

Posted on April 11, 2022May 16, 2022 by Erik Darling

Dirty Work

There are three classes of administrator when it comes to parallelism settings

Never changed the defaults
Changed the defaults to something wacky
Changed the defaults to something safe

The beginning of this post is for the first couple of genus of administrator.

Even though Microsoft added a MAXDOP calculation to the installer starting with SQL Server 2016, I’ve seen cases where the recommended setting was wrong because hardware changed after the initial config.

As an example: Your VM admin created the smallest possible machine to get things set up for you, and added CPU and memory later on, and you ended up with MAXDOP set to 1.

The installer still doesn’t let you change Cost Threshold For Parallelism at setup, which is plum bizarre. I get why making a recommendation here is hard (and I’ll talk about that later on), but you should at least give folks the option.

I want to add this here, before we get too far along: The point of adjusting these values is not “because there’s too much parallelism” or “because there’s too much CX… waits”. The goal is to increase concurrency by only allowing queries that benefit from parallelism to use it, because parallel queries can reserve many CPU threads (DOP * parallel branches), and if you throw enough of them at a server you can end up running out of worker threads, and hitting THREADPOOL waits.

Some folks do Good Enough© with the stock advice of a max degree of parallelism of 4 to 8, and cost threshold for parallelism of 50. Others have a harder time finding a sweet spot that properly balances performance and concurrency.

But you need to start somewhere.

If you truly have no idea how to change these settings, review these example commands, but don’t just blindly run them.

EXEC sys.sp_configure
    @configname = 'show advanced options',
    @configvalue = 1;
RECONFIGURE;

EXEC sys.sp_configure
    @configname = 'cost threshold for parallelism',
    @configvalue = 50;
RECONFIGURE;

EXEC sys.sp_configure
    @configname = 'max degree of parallelism',
    @configvalue = 8;
RECONFIGURE;

Read the rest of the post first.

Evaluating

The easy way to see your settings and if they’re relatively sane is like so:

SELECT
    c.name,
    c.value,
    c.value_in_use,
    c.description,
    socket_count = 
        CASE c.name
             WHEN N'max degree of parallelism'
             THEN osi.socket_count
             ELSE NULL
        END, 
    cores_per_socket = 
        CASE c.name
             WHEN N'max degree of parallelism'
             THEN osi.cores_per_socket
             ELSE NULL
        END
FROM sys.configurations AS c
CROSS JOIN
(
    SELECT 
        osi.socket_count,
        osi.cores_per_socket
    FROM sys.dm_os_sys_info AS osi
) AS osi
WHERE c.name IN 
(
    N'cost threshold for parallelism',
    N'max degree of parallelism'
);

Whenever I query sys.configurations, I get the value and value in use, because sometimes folks forget to run RECONFIGURE; after they change something.

But back to the point! Whatcha got here?

Is Cost Threshold For Parallelism a number higher than 5 and lower than 200?
Is Max Degree Of Parallelism an even number between 4 and 8?

If so, you probably don’t need to make any changes here, unless something else is going on.

And hey, if you’ve got one of those really awful vendor products that tells you to set MAXDOP to one, you should be using database scoped configurations for that, these days.

For most folks, this will get you to an okay place. You may have some other considerations based on the number of cores and/or sockets, and that can get pretty confusing.

Changing these setting will clear out the plan cache, so you should be a little careful when you do it, but you don’t need to restart or anything.

Thinking Harder About Cost Threshold For Parallelism

Many folks who end up here are already pretty sophisticated when it comes to SQL Server, and I’d hate to leave them hanging.

Let’s talk more about each setting, and how they interact with your server and queries.

First up, alphabetically, is cost threshold for parallelism. Right now, you can only set it at the server level. You can technically remove it from the equation by:

Setting MAXDOP to 1
Using undocumented trace flag 8649, or…
Using undocumented query hint ENABLE_PARALLEL_PLAN_PREFERENCE to force a parallel plan

It would be neat if you could set it at the database level, like MAXDOP, but whatever.

One thing that comes up consistently when talking about this setting is that folks often see parallel queries with costs lower than the cost threshold for parallelism. All that means is that the serial query plan broke the threshold, and the parallel plan was chosen because it was cheaper. Cheaper can be lower than cost threshold for parallelism.

When a query executes, assuming there isn’t some parallelism-inhibiting factor in place (scalar UDFs, table variable inserts, etc.), it has to “cost” more than this threshold for a parallel plan to get considered. If it does, and the optimizer can come up with a parallel plan that’s cheaper than the serial plan, then you can pretty well count on it being chosen.

This is where things get tricky! Right off the bat!

See, query costs are all estimates, and there’s no actual counterpart to them in your query plans. If you’re anything like me, you’ve probably seen some low “cost” queries run for a really long time.

Many things contribute to pinning this setting down being difficult:

Incorrect assumptions about too much of a CX wait (this goes for MAXDOP too)
Untuned workloads having consistently high cost queries
Wasting time looking at the plan cache for average costs
Changing the setting not having an appreciable effect
Misunderstanding what costs actually mean

What does it all mean for you? Well, usually increasing the setting from the default — which is 5 — to a meaningfully higher number, will reduce the total number of queries that are eligible for a parallel plan.

This doesn’t mean performance is doomed for those queries, heck, they may not have needed a parallel plan to begin with. The optimizer is pretty good at costing queries generally. But it’s not perfect, and that’s where generalized workload settings need to be overridden or changed.

Having fewer queries be eligible for gobbling up a whole bunch of CPU threads is one way of increasing overall concurrency. Individual query performance may change for better or worse, and you may need to raise or lower the number you picked here, but that’s part of the process.

Thinking Harder About Max Degree Of Parallelism

The second option you have to address concurrency and performance issues stemming from parallelism is setting an upper limit on the number of cores, and threads per parallel branch that a parallel query is allowed to engage and acquire.

In a way, this setting is a little easier to settle on than Cost Threshold for Parallelism, but there are some things you should be aware of:

The parallel version of a plan isn’t just the serial version with parallelism; you can end up with a very different set of operators
The optimizer may choose a serial plan if DOP is set too low for a parallel plan to make sense
Whatever you set DOP to, parallel plans will use at least one extra coordinator thread to manage DOP threads
Leaving this set to zero is only really dangerous on servers with > 8 cores
Setting this to an odd number is generally not a good global setting, though it can be useful in edge cases for a single query

There have been times when I’ve suggested MAXDOP of 4 on servers with a single 8 core processor, but that was mostly a temporary fix until some more meaningful tuning work could be done.

It’s okay to use these settings as band aids, just don’t let them be habit-forming.

How can you tell if a parallel query plan is effective, and how can you decide if you need higher or lower DOP? Great question(s)! Look at you. I’m so proud.

If your parallel and serial plans have roughly equivalent CPU and duration, the parallel plan isn’t better
If your parallel plan isn’t anywhere near DOP faster than your serial query, the parallel plan probably isn’t better
If your parallel plan hits exchange spills or deadlocks, buckle up, you’re in for a fun day (of MAXDOP 1 hints)
If your parallel plan gets slower as you touch more data, you might want to try hinting a higher DOP

I know, it sounds kooky, but this is all stuff I have to think about and mess with when I’m tuning queries for clients. Many times, it seems like an attractive concept to force a parallel plan on a slow query, but the end result isn’t necessarily better.

It’s sort of like when I expect a query to use a nested loops join, but it doesn’t. If you hint the query to do the loop join, you might get a missing index request that lead the optimizer to think that a different join type was cheaper. Or you might just find out the optimizer was wrong. Again.

Getting back to the main point, though: parallelism is a very effective equalizer for the right kind of queries. It just doesn’t necessarily help every query.

Slug Away

To summarize and recap a little bit, here. Most SQL Server users out there will do fine following basic advice about cost threshold for parallelism and max degree of parallelism.

There will be outliers, edge cases, and times when you need to work outside of those settings for particular queries to run optimally. Heck, if a client tells me they’re running a data warehouse, the last thing I’m going to get hung up on is cost threshold for parallelism.

If you change those settings away from the defaults, and you’re still for some reason dissatisfied with the amount of CX_ waits on your server, try focusing on the queries that are going parallel and tuning them to reduce their estimated costs to lower numbers.

Sometimes those waits really are your fault.

Thanks for reading!

Going Further

Working With A SQL Server Consultant Checklist: Patches, Patches, Patches

Posted on April 8, 2022May 16, 2022 by Erik Darling

Whacked

It’s obvious to most that you should keep some track of SQL Server patches. You don’t need to install them all the hot minute that they drop, but staying within range of current is pretty important.

If you have an extra 5-10 minutes a month, read the Cumulative Update notes that Microsoft adds along with releases and see if any of the issues sound like ones you’re hitting.

I’m not gonna pretend that the release notes are getting better — in fact they’ve really gone downhill over the last few years — but you can scan through them and get a reasonable idea of what’s in the update as far as fixes and improvements go.

What no reasonable consultant wants to see is a SQL Server that has been up and unpatched for a year. That messes you up in two ways:

Your problems could be solved by patching SQL Server
Your server metrics are all gonna be cumulative since your last patch

That makes any recent problems really tough to makes heads or tails out of, without a monitoring tool that breaks things down by time.

Remember when I said monitoring tools are important?

They still are.

More Than One

Monitoring tools are also useful for patching, not because they help you monitor for updates or tell you when things are out of date, but because a good monitoring tool will help you baseline to see if there’s any difference after patching.

That can help you just as much when things go well as when things go poorly. Even if it wasn’t the patch — maybe a bad plan got in the cache after you rebooted or something — you have a much earlier warning system.

But what’s really important is having that reservoir of information about your SQL Server.

This is where I get selfish.

If you patch SQL Server, you have to restart it. If you restart it, all the good stuff goes bye-bye, then we might have to delay working together for metrics to build back up.

So pretty and please, patch that thing regularly.

And get a monitoring tool.

For both of us.

Not Just SQL Server

I’ve seen lots of folks using really old versions of SSMS for no good reason. You should keep that up to date, too. Just don’t forget to skip the Azure Data Studio install, if you don’t need it.

There have been a lot of important improvements and fixes to SSMS, especially since version 18. If you’re not running at least that, go grab it.

Especially if we’re going to be working together, because I’ll stop and make you get it if you don’t already have it.

And get a monitoring tool.

For both of us.

Thanks for reading!

Going Further

Working With A SQL Server Consultant Checklist: Tools Of The Trade

Posted on April 7, 2022May 16, 2022 by Erik Darling

Accustomed

There are thousands of scripts out there on the internet that tell you different things about SQL Server.

Some are packaged nicely into a stored procedure, and others… aren’t. I like to make things as easy as possible for folks to take and run, without worrying about database context or SQL Server version/edition compatibility.

In shops with DBAs on staff, I get nervous when I see (or don’t see) certain things:

sp_WhoIsActive isn’t installed
sp_Blitz scripts are years out of date
They’re using maintenance plans instead of Ola’s scripts

You don’t have to look every day, but you should check in every three months or so to see if your favorite troubleshooting scripts have any updates to them.

Thankfully, GitHub makes this really easy.

Repo City

A lot of folks went and embraced GitHub as an easy way to release up to date scripts to the masses, with the ability for them to open issues for new features, bugs, and Q&A.

sp_WhoIsActive: Get tons of detailed information about queries currently executing
First Responder Kit: Basically anything you need to do a health and performance check of your servers
Ola Hallengren’s Maintenance Solution: Backups, CheckDB, Index and Stats maintenance, highly configurable
sp_HumanEvents: Easily set up and analyze Extended Events to troubleshoot common issues like blocking, compilations, query performance, and more
sp_PressureDetector: Detect memory and CPU pressure on SQL Server really fast
sp_QuickieStore: Get all the important stuff out of Query Store with the press of a button

With the exception of Ola’s scripts, I contribute code to all of the procedures listed above, and I use them in just about every engagement.

In tomorrow’s post, I’m gonna cover a few other things you should do to get ready for someone to look at your SQL server.

Thanks for reading!

Going Further

Working With A SQL Server Consultant Checklist: Monitoring Tools And Query Store

Posted on April 6, 2022May 16, 2022 by Erik Darling

Forensics

I get called in to help with a lot of interesting performance problems. I’m gonna say something that sounds harsh right up front: very few people are prepared for sudden issues.

After the fact (and usually after a reboot that clears out all the useful information) people start doing all sorts of things, like logging sp_WhoIsActive to a table.

Consider a couple things for a second though (aside from the fact that rebooting clears out all your important server metrics)

Nothing captures blocking by default (you need the blocked process report for this)
The system health session keeps limited information in it about things like deadlocks
SQL Server doesn’t time-slice wait stats or any other metrics
The plan cache is usually highly volatile for a number of reasons

With SQL Server, you need to prepare today for emergencies tomorrow.

Query Store

One way you can get some of this stuff is by enabling Query Store on your important databases.

It’s not perfect, and there’s certainly a risk of introducing unacceptable observer overheard in environments with a really high rate of transactions.

But it’s a whole lot better than the plan cache for looking back in time at query performance issues, and starting with SQL Server 2017 we started getting high-level waits stats too.

It’s not a replacement for a monitoring tool, and there’s enough functionality missing that I wrote a stored procedure called sp_QuickieStore to search through it, but it’s better than nothing.

The odds of relevant query plans from a performance meltdown being in the plan cache several days later is nearly zero. People just do too many bad things that make it fill up or clear out:

They don’t parameterize queries
They turn on optimize for ad hoc workloads
They don’t give SQL Server enough memory
They stack dozens of databases or schema

SQL Server 2019 added a bunch of switches you can tweak to reduce the overhead and the queries you collect in Query Store, which is super helpful.

Monitoring Tools

I hear a lot of people say they have SQL Server monitoring tools, but they either:

Are really crappy and impossible to use
Don’t really monitor SQL Server well

That doesn’t even begin to account for end-user issues, like:

Broken monitoring tool installs not collecting data
Software being way out of date, or…
Them not not knowing what to look at because they’ve never opened it

But if you have a good monitoring tool collecting a variety of important SQL Server metrics, you have a much better chance of being able to root cause performance issues:

Blocking
Deadlocks
Wait Stats
Query Performance
Disk Stuff
Memory Stuff
CPU Stuff

You get the idea — your monitoring tool should collect a wide enough variety of metrics that common problems don’t totally escape it — but probably not show you every single meaningless metric that you can possibly collect.

Easy Street

As a consultant, I love having these things to comb through to either troubleshoot a past performance issue, or just find candidate queries to tune for clients.

Without them, it makes my job a lot harder — or even impossible sometimes, when data is gone for good — and I have to spend a lot of time waiting for an issue to crop up again to see what was involved, or do a bunch of generalized performance tuning in order to hopefully avoid future issues.

In tomorrow’s post, I’m going to talk about some of the common tools and scripts that I use while consulting (though not how to use them). If you’re talking to anyone in the SQL Server consulting space, these are good tools to have available.

Thanks for reading!