create temp table sql – Page 73

Common SQL Server Consulting Advice: Adding Computed Columns

Posted on April 21, 2022May 16, 2022 by Erik Darling

Active Blooper

Remember yesterday? Yeah, me either. But I do have access to yesterday’s blog post, so I can at least remember that.

What a post that was.

We talked about filtered indexes, some of the need-to-know points, when to use them, and then a sad shortcoming.

Today we’re going to talk about how to overcome that shortcoming, but… there’s stuff you need to know about these things, too.

We’re gonna start off with some Deja Vu!

First, there are some session-level settings that need to be appropriately applied for them to be considered by the optimizer. This is especially important if you’re putting any logic into a SQL Server Agent job, because it uses the wrong settings for some reason.

Here are the correct settings:

QUOTED_IDENTIFIER ON
ANSI_NULLS ON
ANSI_PADDING ON
ANSI_WARNINGS ON
ARITHABORT ON
CONCAT_NULL_YIELDS_NULL ON
NUMERIC_ROUNDABORT OFF

Second, computed columns are sort of like regular columns: you can only search them efficiently if you index them.

This may come as a surprise to you, but indexes put data in order so that it’s easier to find things in them.

The second thing you should know about the second thing here is that you don’t need to persist computed columns to add an index to them, or to get statistics generated for the computed values (but there are some rules we’ll talk about later).

For example, let’s say you do this:

ALTER TABLE dbo.Users ADD TotalVotes AS (UpVotes + DownVotes);
CREATE INDEX u ON dbo.Users (TotalVotes) WITH(SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);

The index gets created just fine. This is incredibly handy if you need to add a computed column to a large table, because there won’t be any blocking while adding the column. The index is another matter, depending on if you’re using Enterprise Edition.

Third, SQL Server is picky about them, kind of. The problem is a part of the query optimization process called expression matching that… matches… expressions.

For example, these two queries both have expressions in them that normally wouldn’t be SARGable — meaning you couldn’t search a normal index on (Upvotes, Downvotes) efficiently.

But because we have an indexed computed column, one of them gets a magic power, and the other one doesn’t.

Because it’s backwards.

SELECT c = COUNT_BIG(*) FROM dbo.Users AS u WHERE (u.UpVotes + u.DownVotes) > 1000;
SELECT c = COUNT_BIG(*) FROM dbo.Users AS u WHERE (u.DownVotes + u.UpVotes) > 1000;

See what happens when you confuse SQL Server?

If you have full control of the code, it’s probably safer to reference the computed column directly rather than rely on expression matching, but expression matching can be really useful when you can’t change the code.

Fourth, don’t you ever ever never ever ever stick a scalar UDF in a computed column or check constraint. Let’s see what happens:

CREATE FUNCTION dbo.suck(@Upvotes int, @Downvotes int)
RETURNS int
WITH SCHEMABINDING, RETURNS NULL ON NULL INPUT
AS
BEGIN
    RETURN (SELECT @Upvotes + @Downvotes);
END;
GO 

ALTER TABLE dbo.Users ADD TotalVotes AS dbo.suck(UpVotes, DownVotes);
CREATE INDEX u ON dbo.Users (TotalVotes) WITH(SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);

SELECT c = COUNT_BIG(*) FROM dbo.Users AS u WHERE (u.DownVotes + u.UpVotes) > 1000;

Remember that this is the query that has things backwards and doesn’t use the index on our computed column, but look what happened to the query plan:

Querying a completely different index results in a plan that SQL Server can’t parallelize because of the function.

Things that won’t fix this:

SQL Server 2019 scalar UDF inlining
Persisting the computed column
Using a different kind of T-SQL function; you can’t use them in computed columns

Things that might fix this:

Follow the answers here

Fifth: Column store indexes are weird with them. There’s an odd bit of a matrix, too.

Anything before SQL Server 2017, no dice
Any nonclustered columnstore index through SQL Server 2019, no dice
For 2017 and 2019, you can create a clustered columnstore index on a table with a computed column as long as it’s not persisted

--Works
CREATE TABLE clustered_columnstore (id int, some_date datetime, next_date datetime, diff_date AS DATEDIFF(MILLISECOND, some_date, next_date));
CREATE CLUSTERED COLUMNSTORE INDEX c ON dbo.clustered_columnstore;

--Doesn't work
CREATE TABLE nonclustered_columnstore (id int, some_date datetime, next_date datetime, diff_date AS DATEDIFF(MILLISECOND, some_date, next_date));
CREATE NONCLUSTERED COLUMNSTORE INDEX n ON dbo.nonclustered_columnstore(id, some_date, next_date, diff_date);

--Clean!
DROP TABLE dbo.clustered_columnstore, dbo.nonclustered_columnstore;

--Doesn't work, but throws a misleading error
CREATE TABLE clustered_columnstore (id int, some_date datetime, next_date datetime, diff_date AS DATEDIFF(MILLISECOND, some_date, next_date) PERSISTED);
CREATE CLUSTERED COLUMNSTORE INDEX c ON dbo.clustered_columnstore;

--Still doesn't work
CREATE TABLE nonclustered_columnstore (id int, some_date datetime, next_date datetime, diff_date AS DATEDIFF(MILLISECOND, some_date, next_date) PERSISTED);
CREATE NONCLUSTERED COLUMNSTORE INDEX n ON dbo.nonclustered_columnstore(id, some_date, next_date, diff_date);

--Clean!
DROP TABLE dbo.clustered_columnstore, dbo.nonclustered_columnstore;

General Uses

The most general use for computed columns is to materialize an expression that a query has to filter on, but that wouldn’t otherwise be able to take advantage of an index to locate rows efficiently, like the UpVotes and DownVotes example above.

Even with an index on UpVotes, DownVotes, nothing in your index keeps track of what row values added together would be.

SQL Server has to do that math every time the query runs and then filter on the result. Sometimes those expressions can be pushed to an index scan, and other times they need a Filter operator later in the plan.

Consider a query that inadvisably does one of these things:

function(column) = something
column + column = something
column + value = something
value + column = something
column = case when …
value = case when column…
convert_implicit(column) = something

As long as all values are known ahead of time — meaning they’re not a parameter, variable, or runtime constant like GETDATE() — you can create computed columns that you can index and make searches really fast.

Take this query and index as an example:

SELECT c = COUNT_BIG(*) FROM dbo.Posts AS p WHERE DATEDIFF(YEAR, p.CreationDate, p.LastActivityDate) > 9;

CREATE INDEX p ON dbo.Posts(CreationDate, LastActivityDate) WITH(SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);

The best we can do is still to read every row via a scan:

But we can fix that by computing and indexing:

ALTER TABLE dbo.Posts ADD ComputedDiff AS DATEDIFF(YEAR, CreationDate, LastActivityDate);

CREATE INDEX p ON dbo.Posts(ComputedDiff) WITH(SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE, DROP_EXISTING = ON);

And now our query plan is much faster, without needing to go parallel, or more parallel, to get faster:

SQL Server barely needs to flinch to finish that query, and we get an actually good estimate to boot.

Crappy Limitations

While many computed columns can be created, not all can be indexed. For example, something like this would be lovely to have and to have indexed:

ALTER TABLE dbo.Users ADD RecentUsers AS DATEDIFF(DAY, LastAccessDate, SYSDATETIME());
CREATE INDEX u ON dbo.Users (RecentUsers);

While the column creation does succeed, the index creation failed:

Msg 2729, Level 16, State 1, Line 177

Column ‘RecentUsers’ in table ‘dbo.Users’ cannot be used in an index or statistics or as a partition key because it is non-deterministic.

You also can’t reach out to other tables:

ALTER TABLE dbo.Users ADD HasABadge AS CASE WHEN EXISTS (SELECT 1/0 FROM dbo.Badges AS b WHERE b.UserId = Id) THEN 1 ELSE 0 END;

SQL Server doesn’t like that:

Msg 1046, Level 15, State 1, Line 183

Subqueries are not allowed in this context. Only scalar expressions are allowed.

There are other, however these are the most common disappointments I come across.

Some of the things that computed columns fall flat with are things we can remedy with indexed views, but boy howdy are there a lot of gotchas.

We’ll talk about those tomorrow!

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Common SQL Server Consulting Advice: Adding Filtered Indexes

Posted on April 20, 2022May 16, 2022 by Erik Darling

Home Star

For all the good indexes do, sometimes you just don’t need them to cover all of the data in a table for various reasons.

There are a number of fundamentals that you need to understand about them (that any good consultant will tell you about), but I wanna cover them here just in case you had an unfortunate run-in with a less-than-stellar consultant.

Here are the correct settings:

QUOTED_IDENTIFIER ON
ANSI_NULLS ON
ANSI_PADDING ON
ANSI_WARNINGS ON
ARITHABORT ON
CONCAT_NULL_YIELDS_NULL ON
NUMERIC_ROUNDABORT OFF

Second, you need to be careful about parameterized queries. If your filtered index and query looks like this:

CREATE INDEX u ON dbo.Users(Reputation) WHERE Reputation > 100000;

SELECT c = COUNT_BIG(*) FROM dbo.Users AS u WHERE u.Reputation > @Reputation;

The optimizer will not use your filtered index, because it has to pick a safe and cache-able plan that works for any input to the @Reputation parameter (this goes for local variables, too).

To get around this, you can use:

Recompile hint
Literal value
Potentially unsafe dynamic SQL

Third, you need to have the columns in your filtering expression be somewhere in your index definition (key or include) to help the optimizer choose your index in some situations.

Let’s say you have a filtered index that looks like this:

CREATE INDEX u ON dbo.Users(DisplayName) WHERE Reputation > 100000;

As thing stand, all the optimizer knows is that the index is filtered to Reputation values over 100k. If you need to search within that range, like 100k-200k, or >= 500k, it has to get those values from somewhere, and it has the same options as it does for other types of non-covering indexes:

Ignore the index you thoughtfully created and use another index
Use a key lookup to go back to the clustered index to filter specific values

General Uses

The most common uses that I see are:

Indexing for soft deletes
Indexing unique values and avoiding NULLs
Indexing for hot data
Indexing skewed data to create a targeted histogram

There are others, but one thing to consider when creating filtered indexes is how much of your data will be excluded by the definition.

If more than half of your data is going to end up in there, you may want to think hard about what you’re trying to accomplish.

Another potential use that I sometimes find for them is using the filter to avoid needing an overly-wide index key.

Let’s say you have a super important query that looks like this:

SELECT
    u.DisplayName,
    TotalScore = 
        SUM(p.Score)
FROM dbo.Posts AS p
JOIN dbo.Users AS u
    ON p.OwnerUserId = u.Id
WHERE p.PostTypeId = 1
AND   p.AcceptedAnswerId > 0
AND   p.ClosedDate IS NULL
AND   p.Score > @Score
AND   p.OwnerUserId = @OwnerUserId
ORDER BY TotalScore DESC;

You’re looking at indexing for one join clause predicate and four additional where clause predicates. Do you really want five key columns in your index? No?

How about this?

CREATE INDEX 
    p 
ON dbo.Posts
    (OwnerUserId, Score)
INCLUDE
    (PostTypeId, AcceptedAnswerId, ClosedDate)
WHERE
    (PostTypeId = 1 AND AcceptedAnswerId > 0 AND ClosedDate IS NULL)
WITH
    (SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);

This will allow your query to pre-seek to the the literal value predicates, and then evaluate the parameterized predicates in the key of the index.

Crappy Limitations

There are lots of things you might want to filter an index on, like an expression. But you can’t.

I think one of the best possible use cases for a filtered index that is currently not possible is to isolate recent data. For example, I’d love to be able to create a filtered index like this:

CREATE INDEX u ON dbo.Users(DisplayName) WHERE CreationDate > DATEADD(DAY, -30, CONVERT(date, SYSDATETIME()));

So I could just just isolate data in the table that was added in the last 30 days. This would have a ton of applications!

But then the index would have to be self-updating, churning data in and out on its own.

For something like that, you’ll need a computed column. But even indexing those can be tricky, so we’ll talk about those tomorrow.

Thanks for reading!

Going Further

Common SQL Server Consulting Advice: Adding Indexes To Replace Multiple Indexes

Posted on April 19, 2022May 16, 2022 by Erik Darling

Buffet Tables

If you’ve ever spent any time reading about relational database theory, you’ve probably read all sorts of things about normal forms and all that good stuff. You’ve also probably immediately forgotten all of it.

At least that’s what I’d guess is what happened based on what I see when I look at database designs out there. Tables are often dozens of columns wide, with all sorts of things in there that should be in their own table.

Columns with a common prefix, like Customer_Name, Customer_Address, etc.
Columns with a common suffix, like PhoneNumber_1, PhoneNumber_2, etc.

Even fairly-well designed tables often get somewhat overloaded with information, especially when years of additional development requirements lead to a choice between adding one more column to a table and normalizing several column out to a new table.

Sound familiar?

Let’s move on.

Indexing

The more columns you have in a table, the more potential column combinations there are for indexes. Much like columns, indexes tend to get added following the path of least resistance.

Very rarely does someone consider current indexes when deciding to add an index. They might be doing something like:

Evaluating missing index requests in a query plan
Evaluating missing index requests in the DMVs
Running DTA like a darn fool and saying yes to everything

I’ll often run into tables that have indexes like these:

CREATE INDEX p1 ON dbo.Posts(OwnerUserId);
CREATE INDEX p2 ON dbo.Posts(OwnerUserId, PostTypeId);
CREATE INDEX p3 ON dbo.Posts(OwnerUserId, CreationDate, LastActivityDate);
CREATE INDEX p4 ON dbo.Posts(OwnerUserId, PostTypeId, CreationDate);
CREATE INDEX p5 ON dbo.Posts(OwnerUserId, PostTypeId, Score);
CREATE INDEX p6 ON dbo.Posts(CreationDate, LastActivityDate, Score);

I’m leaving off includes here, because you can combine those in any order without disturbing much.

Key columns are a different story, of course. Order matters quite a bit, because via the magic of co-dependency key column order defines how queries can efficiently access data.

Merging

Looking at this list of indexes, we know a couple things immediately:

The index on just OwnerUserId isn’t of any obvious value, and we can get rid of it
The index that leads on CreationDate serves a totally different type of query, and we should leave it alone for now

Assuming all of these indexes have a not-crappy read to write ratio, what do we do next?

Make sure none of them are unique
Make sure none of them are filtered

These indexes all have multiple key columns that sort of line up:

CREATE INDEX p2 ON dbo.Posts(OwnerUserId, PostTypeId); 
CREATE INDEX p3 ON dbo.Posts(OwnerUserId, CreationDate, LastActivityDate); 
CREATE INDEX p4 ON dbo.Posts(OwnerUserId, PostTypeId, CreationDate); 
CREATE INDEX p5 ON dbo.Posts(OwnerUserId, PostTypeId, Score);

Of these, three of them are on OwnerUserId and PostTypeId, and one isn’t. We can throw that one out.

CREATE INDEX p2 ON dbo.Posts(OwnerUserId, PostTypeId); 
CREATE INDEX p4 ON dbo.Posts(OwnerUserId, PostTypeId, CreationDate); 
CREATE INDEX p5 ON dbo.Posts(OwnerUserId, PostTypeId, Score);

Okay, now what? Just looking at the definitions here, we can safely get rid of the first index, because any query that uses it can safely use the other two indexes.

But for the second two, we can’t do much. It’s not necessarily safe to add the third key column in either one as a fourth key column.

We have to look at how they’re used, first. If one index is unused or rarely used, we could make that change.

Things might get more difficult to figure out if we have another index like this:

CREATE INDEX p4 ON dbo.Posts(OwnerUserId, PostTypeId, CreationDate); 
CREATE INDEX p5 ON dbo.Posts(OwnerUserId, PostTypeId, Score);
CREATE INDEX p7 ON dbo.Posts(OwnerUserId, PostTypeId, CreationDate, LastActivityDate);

Sure, it means we could get rid of the index that ends with CreationDate, but it makes the picture a little less clear about how we should deal with Score.

Ending

To be safe, I’d probably stop after just getting rid of the ones where the key columns are a match as far as order goes, and are a subset of wider indexes.

CREATE INDEX p1 ON dbo.Posts(OwnerUserId); --Drop this one
CREATE INDEX p2 ON dbo.Posts(OwnerUserId, PostTypeId); --Drop this one
CREATE INDEX p3 ON dbo.Posts(OwnerUserId, CreationDate, LastActivityDate); --Keep this one
CREATE INDEX p4 ON dbo.Posts(OwnerUserId, PostTypeId, CreationDate); --Drop this one
CREATE INDEX p5 ON dbo.Posts(OwnerUserId, PostTypeId, Score); --Keep this one 
CREATE INDEX p6 ON dbo.Posts(CreationDate, LastActivityDate, Score); --Keep this one
CREATE INDEX p7 ON dbo.Posts(OwnerUserId, PostTypeId, CreationDate, LastActivityDate); --Keep this one

After that, I’d probably look at usage stats, to see if any of them just aren’t holding up their end of the bargain.

Thanks for reading!

Going Further

Common SQL Server Consulting Advice: Adding And Removing Indexes

Posted on April 18, 2022April 21, 2022 by Erik Darling

A Good Thing

Since you’re here reading this, I’m going to assume you know what food is. Further, I’m going to assume that you also know what salt is, because salt plays a pretty big part in how good or bad food tastes.

Skimp on salt, and you’ll see a lot of disappointed faces
Go overboard on salt, and you’ll see a lot of sour faces

This is a lot like what you’ll see if you do either one of those things with indexes in your database, unless you have Special Medical Exemptions©.

If it’s a data warehouse, you’ll probably want fewer indexes, but you’ll also probably want to use column store indexes
If it’s normal throughput OLTP, you’ll probably want more indexes than a data warehouse, but with narrower key column choices to support seek and lookup plans
If it’s OLTP + Reporting (or analytics or whatever you wanna call it) you’re going to need to make a lot of tough indexing choices for them to live together
If it’s high throughput OLTP, you’ll probably want even fewer overall indexes, and you may need to look into in-memory options for the busiest tables

With all that in mind, there might be some local factors that make some if this stuff not align particularly well with your workload. Things like version, edition, query patterns, and hardware might be holding you back. And of course, being at the mercy of a terrible vendor application would also likely put a damper on many of these plans.

If you want a quick and easy way to find indexes to both remove and add, go ahead and grab sp_BlitzIndex.

Practically Common

In my consulting work, the most common type of workload I see is OLTP with reporting on top of it, followed by analytics workloads. Pure OLTP workloads are pretty rare.

The index strategy for these workloads might look something like this:

Narrow key indexes with very few includes or OLTP queries
Fairly narrow key indexes with wider includes for reporting queries, or…
Nonclustered column store indexes for reporting queries

It’s always tricky figuring out what the best set of indexes to have around for your workload is. There are so many missing index requests, new queries, users complaining, and new tables getting added that it’s hard not to feel like you’re drowning.

You can’t have the perfect index for every query, and you shouldn’t try to. You’d end up with the sour-salt face kind of database. But what I need you to do is make a deal with yourself: Any time you want to add an index, you have to look for an index to get rid of.

Why?

Because without fail, I see these patterns all the time:

Indexes with a single key column along wider indexes that lead with the same column
Indexes with the exact same key columns and key column order (sometimes one has more key columns at the end)
Indexes that are totally unused by read queries but get modified a lot
Indexes with only a few reads and a ton of writes

It’s all very bothersome clutter, and you’ve got to clear these out before you can add in more.

Commonly Practical

There are many good reasons to avoid over-indexing. Everyone says “writes get slow”, but that’s a shabby meme. Your writes are slow because your storage is crappy and possibly because of your poorly thought out Availability Group.

Why having an excessive number of indexes (again, barring Special Medical Exemptions©) can be bad for your database performance:

Locking:
- More objects to lock means more locking generally
- More objects to lock means a greater chance of lock escalation attempts
Buffer pool:
- More indexes means more objects in the buffer pool
- Even indexes unused by reads need to be modified when the table changes
Transaction logging:
- More indexes to modify means more to keep track of in the transaction log
- Even databases in simple recovery log the same amount 99% of the time

So the next time you want to go add an index to a table, I want you to try a few things:

Look for indexes on the table that are totally unused
Look for indexes on the table with overlapping key columns to merge
Look for indexes that could be tweaked rather than creating a whole new one

To do that, sp_BlitzIndex run like so:

EXEC dbo.sp_BlitzIndex
    @DatabaseName = N'YourDatabase',
    @SchemaName = N'ProbablyDBO',  
    @TableName = N'YourIndexingCatastrophe';

Thanks for reading!

Going Further

Common SQL Server Consulting Advice: Setting Target Recovery Interval

Posted on April 15, 2022May 16, 2022 by Erik Darling

Everywhere You Look

Finding out about this was probably like when some probably guy probably in a wig or something discovered atoms and was like “ohh wowwe therre are’st atommes ev’rywhislt I doth looketh, happie dais!”.

If that’s not historically accurate, I apologize. I’m a child of divorce and public school systems.

If you want a quick and dirty way to see if your server is having a tough time with checkpoint activity, you can run this to search your most recent error log for relevant messages.

If you recently restarted your server, you may need to search additional error logs
If you cycle your error logs frequently, you may need to search additional error logs
If your server is associated with Cloud Nonsense© these commands may not work

CREATE TABLE 
    #ErikLog
(
    LogDate datetime,
    ProcessInfo varchar(50),
    Text varchar(MAX)
);

INSERT 
    #ErikLog WITH(TABLOCK)
(
    LogDate,
    ProcessInfo,
    Text
)
EXEC sys.xp_readerrorlog 
    0, --Just the first log file
    1, --SQL Server error log
    N'FlushCache'; --Search string

INSERT 
    #ErikLog WITH(TABLOCK)
(
    LogDate,
    ProcessInfo,
    Text
)
EXEC sys.xp_readerrorlog 
    0, --Just the first log file
    1, --SQL Server error log
    N'saturation'; --Search string

SELECT
    el.*
FROM #ErikLog AS el
ORDER BY el.LogDate;

DROP TABLE IF EXISTS
    #ErikLog;

You can adjust the first parameter there to look back at older log files, too, if you need to.

So what happened? Was it everything you imagined?

Did You Find Anything?

If you got some hits searching through the error log, it may be a sign that you need to make a change.

Who can change this? You!
What change? Target Recovery Interval!
Where do I change it? At the database level!
When should I change it? Whenever, my friend!
Why database level and not at the server level? If your databases are in an AG or something, you want the change to follow them to other servers!
How do I change it? Like so!

DECLARE 
    @sql nvarchar(max) = N'';
 
SELECT 
    @sql += 
        N'ALTER DATABASE ' + 
        QUOTENAME(d.name) + 
        N' SET TARGET_RECOVERY_TIME = 60 SECONDS;' + 
        NCHAR(13)
FROM sys.databases AS d 
WHERE d.target_recovery_time_in_seconds <> 60
AND   d.database_id > 4 
AND   d.state = 0 
AND   d.is_in_standby = 0 
AND   d.is_read_only = 0;
 
SELECT 
    cmd = @sql;

Why Are We Doing This?

Long checkpoints can cause all sorts of weirdball issues while they’re happening, and they don’t just happen when you run a checkpoint command.

Log and differential backups cause them, as well as internal stuff around recovery time.

The good news is that this is the default for databases creating on SQL Server 2016 or better. The bad news is that most of you out there started creating your databases way before that, and you’ve just been restoring them to newer versions without checking out any of the hot new gadgets and gizmos available.

Not that I blame you. Stuff like this doesn’t exactly make headlines, and you’re busy with important stuff, like hemming and hawing over fill factor.

Thanks for reading!

Going Further

Common SQL Server Consulting Advice: Setting Instant File Initialization

Posted on April 14, 2022May 16, 2022 by Erik Darling

Did You Forget?

This is another one of those settings that’s so important, Microsoft put a switch for it in the installer. And yet, I end up looking at servers where it’s not enabled.

When I ask why, I usually expect to hear some 2003 wisdom about security. But that doesn’t happen; it’s just blank stares.

How do you know if this is enabled?

SELECT  
    dss.servicename,
    dss.service_account,
    dss.instant_file_initialization_enabled 
FROM sys.dm_server_services AS dss;

Simple as that.

Why Enable Instant File Initialization?

Turning this on allows SQL Server data files to grow without writing a bunch of zeroes out to disk. That can be useful in a number of ways.

Just picture SQL Server giving disk space a big ol’ hug and Windows not even getting jealous.

But Erik, you said data files, what about all those important transaction log files?

Sorry, there’s no helping them.

Put them on the fastest storage you can get and pray.

Enabling Instant File Initialization

You can use this command from dbatools to enable IFI. If you’re not comfortable with PowerShell, you can get it done this way:

If for some reason you didn’t turn this on when you installed SQL Server, just take the service account you got from up yonder, and type “secpol.msc” into the search bar.

After that, click on “User Rights Assignment” and look for the “Perform Volume Maintenance Tasks” entry in the Policy tab.

From there, it’s a matter of adding the service account here. If you’re using the default SQL Server service account, “NT Service\MSSQLSERVER…” it will likely already have it.

But you know, just in case. G’head and click that.

Thanks for reading!

Going Further

Common SQL Server Consulting Advice: Setting Lock Pages In Memory

Posted on April 13, 2022May 16, 2022 by Erik Darling

Locked Up

It seems lately that every time I get someone to enable lock pages in memory on a server with a lot of memory, things change for the better.

Not because SQL Server is some how holding onto memory better, but because it gets direct access to physical memory without having to muck about with a bunch of virtual memory first. SQL Servers with large amounts of memory can especially benefit from this setting, because it allows them to access memory via a different API. The easy way to think of it is that SQL Server will get direct access to physical memory, instead of virtual memory.

Allocates physical memory pages to be mapped and unmapped within any Address Windowing Extensions (AWE) region of a specified process.

The AllocateUserPhysicalPages function is used to allocate physical memory that can later be mapped within the virtual address space of the process. The SeLockMemoryPrivilege privilege must be enabled in the caller’s token or the function will fail with ERROR_PRIVILEGE_NOT_HELD.

I generally advise people with good chunks of memory to enable this setting. There are very few good reasons not to on big boxes, and that’s why it should be called out in the installation process. Enabling it later means rebooting, and that sucks.

Let’s figure out if we’re using that setting!

SELECT
    locked_page_allocations_gb = 
        CONVERT
        (
            decimal
            (38, 2), 
            dopm.locked_page_allocations_kb / 1024. / 1024.
        ),
    large_page_allocations_gb = 
        CONVERT
        (
            decimal
            (38, 2), 
            dopm.large_page_allocations_kb / 1024. / 1024.
        ),
    memory_model = 
        (
            SELECT 
                dosi.sql_memory_model_desc 
            FROM sys.dm_os_sys_info AS dosi
        )
FROM sys.dm_os_process_memory AS dopm;

This will tell you how many gigs of pages and large pages are currently locked in memory, along with the memory model in use for your server.

Most common are:

Conventional (boo, probably)
Lock Pages (yay, definitely)

You may see large pages if you use lock pages in memory alongside trace flag 834, but that’s less common.

In my experience, you’ll see large page allocations even without that trace flag if column store indexes are in use. There may be other factors, but that’s what I’ve noticed.

Locked Down

If you’re into PowerShell, you can use this command from dbatools to set “LPIM”.

I usually go in the ol’ fashioned way.

The first thing you need to do is get the login that SQL Server uses. You can get this from:

SQL Server Configuration Manager
The Services control panel
The below query

SELECT  
    dss.servicename,
    dss.service_account
FROM sys.dm_server_services AS dss;

I don’t care which one you use. I think the control panel route is more reliable, because like 9000% of the time when I search the *&^(*&% search bar for configuration manager it can’t find it, and then when it does you get some snap error.

So, services it is. Copy that haughty chieftain down somewhere. Then type “secpol.msc” into the search bar, because somehow Windows can always find that.

Click on “User Right Assignment”, Then find the “Lock Pages In Memory” entry in the Policy list, and pop your SQL Server login in there.

Cool. Great. You’re done, right?

Wrong. You need to recycle SQL Server services for it to kick in.

But you can do that whenever, right?

Thanks for reading!

Going Further

Common SQL Server Consulting Advice: Setting Min Server Memory And Max Server Memory

Posted on April 12, 2022May 16, 2022 by Erik Darling

The Max For The Minimum

I’ve written several hundred thousand times about how important memory is to SQL Server, and to most databases in general. But once you have it, how do you make the most of it?

Well, there’ll be a few posts in this series that cover different ways of doing that, but we’re gonna start with some basic ones that I see folks tinker with in all the wrong ways.

If you don’t know how to change these settings, here are some example commands. Note that the max server memory command is supplied with -1, which will throw an error.

Why? Because this is highly dependent on Local Factors©

EXEC sys.sp_configure
    @configname = 'show advanced options',
    @configvalue = 1;
RECONFIGURE;

EXEC sys.sp_configure
    @configname = 'min server memory (MB)',
    @configvalue = 0;
RECONFIGURE;

EXEC sys.sp_configure
    @configname = 'max server memory (MB)',
    @configvalue = -1;
RECONFIGURE;

Let’s figure out those local factors!

First, let’s get a stupid query to see what you have things set to:

SELECT
    x.name, 
    x.value_gb, 
    x.value_in_use_gb, 
    x.total_physical_memory_gb,
    percent_of_total_memory = 
        (x.value_in_use_gb / x.total_physical_memory_gb) * 100
FROM 
    (
    SELECT
        c.name,
        value_gb = 
            CONVERT
            (
                bigint,
                c.value
            ) / 1024,
        value_in_use_gb = 
            CONVERT
            (
                bigint,
                c.value_in_use
            ) / 1024,
        dosm.total_physical_memory_gb     
    FROM sys.configurations AS c
    CROSS JOIN
    (
        SELECT 
            total_physical_memory_gb = 
                CEILING
                (
                    dosm.total_physical_memory_kb / 1024. / 1024.
                )
        FROM sys.dm_os_sys_memory AS dosm
    ) dosm
    WHERE c.name IN 
    (
        N'min server memory (MB)',
        N'max server memory (MB)'
    )
) AS x;

This will tell you what you have min and max server memory set to, what the total physical memory in the server is, and what percent of the total memory each setting is.

Now that you have those numbers in mind, let’s talk about them.

Min Server Memory

This section can be nice and short: If you have this set to a number other than zero (which is the default, don’t play with me on this one), someone either changed the wrong setting, or took all their goofy pills.

Or both. One thing leads to another, as they say.

There are some weird cases where you might see 16, which shows up on some servers running 64bit versions of SQL Server.

If you see an 8 here, that means you’ve temporarily spawned in the wrong timeline and you need to speak to your handler about being sent back to when 32bit software was acceptable.

The worst case here is having min server memory set equal to max server memory, which prevents SQL Server from using dynamic memory, and can cause all sorts of weird issues. Bottom line: don’t do it.

If the same value is specified for both min server memory and max server memory, then once the memory allocated to the SQL Server Database Engine reaches that value, the SQL Server Database Engine stops dynamically freeing and acquiring memory for the buffer pool.

If you have weird stuff going on with your server, like the plan cache throwing up constantly, queries that are struggling with memory grants, or other oddities, you should check to make sure min server memory is not set to max server memory.

Max Server Memory

Okay, let’s make this creakin’ easy, too. If the percent_of_total_memory column is over 100% then we’re back to the goofy pills theory of database administration.

If it’s a number between 70 and 90%, you’re probably in okay shape. Any lower, and you’re probably doing something asinine like stacking SQL Server instances and we’re back to the goofy pills theory of database administration.

But look, there’s all sorts of stuff you have to think about when you set max server memory in some situations:

Is SSAS, SSIS, or SSRS installed?
Are you running a workload with column store and/or batch mode components?

I’ve never opened up any SS*S component, so I’m not going to tell you how much memory you should set aside for them. I also don’t want to get in the habit of giving the wrong advice, either. You should install these things on their own server where they can use their own local resources and be much less of a bother to your production workload.

The middle point is becoming more and more of an issue though. Column store indexes and Batch Mode query execution are… different about memory.

You’ll either need to over-provision memory for those workloads, or set max server memory to a lower number than you’d would usually to accommodate them.

Pacer

That’s recap backwards, in case you were wondering.

Hopefully this post gives you a useful query to look at memory settings, and how they compare to the memory your SQL Server has in it.

Tomorrow, we’ll talk about Lock Pages In Memory. It’s one of my favorite settings.

Thanks for reading!

Going Further

Common SQL Server Consulting Advice: Setting MAXDOP And Cost Threshold For Parallelism

Posted on April 11, 2022May 16, 2022 by Erik Darling

Dirty Work

There are three classes of administrator when it comes to parallelism settings

Never changed the defaults
Changed the defaults to something wacky
Changed the defaults to something safe

The beginning of this post is for the first couple of genus of administrator.

Even though Microsoft added a MAXDOP calculation to the installer starting with SQL Server 2016, I’ve seen cases where the recommended setting was wrong because hardware changed after the initial config.

As an example: Your VM admin created the smallest possible machine to get things set up for you, and added CPU and memory later on, and you ended up with MAXDOP set to 1.

The installer still doesn’t let you change Cost Threshold For Parallelism at setup, which is plum bizarre. I get why making a recommendation here is hard (and I’ll talk about that later on), but you should at least give folks the option.

I want to add this here, before we get too far along: The point of adjusting these values is not “because there’s too much parallelism” or “because there’s too much CX… waits”. The goal is to increase concurrency by only allowing queries that benefit from parallelism to use it, because parallel queries can reserve many CPU threads (DOP * parallel branches), and if you throw enough of them at a server you can end up running out of worker threads, and hitting THREADPOOL waits.

Some folks do Good Enough© with the stock advice of a max degree of parallelism of 4 to 8, and cost threshold for parallelism of 50. Others have a harder time finding a sweet spot that properly balances performance and concurrency.

But you need to start somewhere.

If you truly have no idea how to change these settings, review these example commands, but don’t just blindly run them.

EXEC sys.sp_configure
    @configname = 'show advanced options',
    @configvalue = 1;
RECONFIGURE;

EXEC sys.sp_configure
    @configname = 'cost threshold for parallelism',
    @configvalue = 50;
RECONFIGURE;

EXEC sys.sp_configure
    @configname = 'max degree of parallelism',
    @configvalue = 8;
RECONFIGURE;

Read the rest of the post first.

Evaluating

The easy way to see your settings and if they’re relatively sane is like so:

SELECT
    c.name,
    c.value,
    c.value_in_use,
    c.description,
    socket_count = 
        CASE c.name
             WHEN N'max degree of parallelism'
             THEN osi.socket_count
             ELSE NULL
        END, 
    cores_per_socket = 
        CASE c.name
             WHEN N'max degree of parallelism'
             THEN osi.cores_per_socket
             ELSE NULL
        END
FROM sys.configurations AS c
CROSS JOIN
(
    SELECT 
        osi.socket_count,
        osi.cores_per_socket
    FROM sys.dm_os_sys_info AS osi
) AS osi
WHERE c.name IN 
(
    N'cost threshold for parallelism',
    N'max degree of parallelism'
);

Whenever I query sys.configurations, I get the value and value in use, because sometimes folks forget to run RECONFIGURE; after they change something.

But back to the point! Whatcha got here?

Is Cost Threshold For Parallelism a number higher than 5 and lower than 200?
Is Max Degree Of Parallelism an even number between 4 and 8?

If so, you probably don’t need to make any changes here, unless something else is going on.

And hey, if you’ve got one of those really awful vendor products that tells you to set MAXDOP to one, you should be using database scoped configurations for that, these days.

For most folks, this will get you to an okay place. You may have some other considerations based on the number of cores and/or sockets, and that can get pretty confusing.

Changing these setting will clear out the plan cache, so you should be a little careful when you do it, but you don’t need to restart or anything.

Thinking Harder About Cost Threshold For Parallelism

Many folks who end up here are already pretty sophisticated when it comes to SQL Server, and I’d hate to leave them hanging.

Let’s talk more about each setting, and how they interact with your server and queries.

First up, alphabetically, is cost threshold for parallelism. Right now, you can only set it at the server level. You can technically remove it from the equation by:

Setting MAXDOP to 1
Using undocumented trace flag 8649, or…
Using undocumented query hint ENABLE_PARALLEL_PLAN_PREFERENCE to force a parallel plan

It would be neat if you could set it at the database level, like MAXDOP, but whatever.

One thing that comes up consistently when talking about this setting is that folks often see parallel queries with costs lower than the cost threshold for parallelism. All that means is that the serial query plan broke the threshold, and the parallel plan was chosen because it was cheaper. Cheaper can be lower than cost threshold for parallelism.

When a query executes, assuming there isn’t some parallelism-inhibiting factor in place (scalar UDFs, table variable inserts, etc.), it has to “cost” more than this threshold for a parallel plan to get considered. If it does, and the optimizer can come up with a parallel plan that’s cheaper than the serial plan, then you can pretty well count on it being chosen.

This is where things get tricky! Right off the bat!

See, query costs are all estimates, and there’s no actual counterpart to them in your query plans. If you’re anything like me, you’ve probably seen some low “cost” queries run for a really long time.

Many things contribute to pinning this setting down being difficult:

Incorrect assumptions about too much of a CX wait (this goes for MAXDOP too)
Untuned workloads having consistently high cost queries
Wasting time looking at the plan cache for average costs
Changing the setting not having an appreciable effect
Misunderstanding what costs actually mean

What does it all mean for you? Well, usually increasing the setting from the default — which is 5 — to a meaningfully higher number, will reduce the total number of queries that are eligible for a parallel plan.

This doesn’t mean performance is doomed for those queries, heck, they may not have needed a parallel plan to begin with. The optimizer is pretty good at costing queries generally. But it’s not perfect, and that’s where generalized workload settings need to be overridden or changed.

Having fewer queries be eligible for gobbling up a whole bunch of CPU threads is one way of increasing overall concurrency. Individual query performance may change for better or worse, and you may need to raise or lower the number you picked here, but that’s part of the process.

Thinking Harder About Max Degree Of Parallelism

The second option you have to address concurrency and performance issues stemming from parallelism is setting an upper limit on the number of cores, and threads per parallel branch that a parallel query is allowed to engage and acquire.

In a way, this setting is a little easier to settle on than Cost Threshold for Parallelism, but there are some things you should be aware of:

The parallel version of a plan isn’t just the serial version with parallelism; you can end up with a very different set of operators
The optimizer may choose a serial plan if DOP is set too low for a parallel plan to make sense
Whatever you set DOP to, parallel plans will use at least one extra coordinator thread to manage DOP threads
Leaving this set to zero is only really dangerous on servers with > 8 cores
Setting this to an odd number is generally not a good global setting, though it can be useful in edge cases for a single query

There have been times when I’ve suggested MAXDOP of 4 on servers with a single 8 core processor, but that was mostly a temporary fix until some more meaningful tuning work could be done.

It’s okay to use these settings as band aids, just don’t let them be habit-forming.

How can you tell if a parallel query plan is effective, and how can you decide if you need higher or lower DOP? Great question(s)! Look at you. I’m so proud.

If your parallel and serial plans have roughly equivalent CPU and duration, the parallel plan isn’t better
If your parallel plan isn’t anywhere near DOP faster than your serial query, the parallel plan probably isn’t better
If your parallel plan hits exchange spills or deadlocks, buckle up, you’re in for a fun day (of MAXDOP 1 hints)
If your parallel plan gets slower as you touch more data, you might want to try hinting a higher DOP

I know, it sounds kooky, but this is all stuff I have to think about and mess with when I’m tuning queries for clients. Many times, it seems like an attractive concept to force a parallel plan on a slow query, but the end result isn’t necessarily better.

It’s sort of like when I expect a query to use a nested loops join, but it doesn’t. If you hint the query to do the loop join, you might get a missing index request that lead the optimizer to think that a different join type was cheaper. Or you might just find out the optimizer was wrong. Again.

Getting back to the main point, though: parallelism is a very effective equalizer for the right kind of queries. It just doesn’t necessarily help every query.

Slug Away

To summarize and recap a little bit, here. Most SQL Server users out there will do fine following basic advice about cost threshold for parallelism and max degree of parallelism.

There will be outliers, edge cases, and times when you need to work outside of those settings for particular queries to run optimally. Heck, if a client tells me they’re running a data warehouse, the last thing I’m going to get hung up on is cost threshold for parallelism.

If you change those settings away from the defaults, and you’re still for some reason dissatisfied with the amount of CX_ waits on your server, try focusing on the queries that are going parallel and tuning them to reduce their estimated costs to lower numbers.

Sometimes those waits really are your fault.

Thanks for reading!

Going Further

Working With A SQL Server Consultant Checklist: Patches, Patches, Patches

Posted on April 8, 2022May 16, 2022 by Erik Darling

Whacked

It’s obvious to most that you should keep some track of SQL Server patches. You don’t need to install them all the hot minute that they drop, but staying within range of current is pretty important.

If you have an extra 5-10 minutes a month, read the Cumulative Update notes that Microsoft adds along with releases and see if any of the issues sound like ones you’re hitting.

I’m not gonna pretend that the release notes are getting better — in fact they’ve really gone downhill over the last few years — but you can scan through them and get a reasonable idea of what’s in the update as far as fixes and improvements go.

What no reasonable consultant wants to see is a SQL Server that has been up and unpatched for a year. That messes you up in two ways:

Your problems could be solved by patching SQL Server
Your server metrics are all gonna be cumulative since your last patch

That makes any recent problems really tough to makes heads or tails out of, without a monitoring tool that breaks things down by time.

Remember when I said monitoring tools are important?

They still are.

More Than One

Monitoring tools are also useful for patching, not because they help you monitor for updates or tell you when things are out of date, but because a good monitoring tool will help you baseline to see if there’s any difference after patching.

That can help you just as much when things go well as when things go poorly. Even if it wasn’t the patch — maybe a bad plan got in the cache after you rebooted or something — you have a much earlier warning system.

But what’s really important is having that reservoir of information about your SQL Server.

This is where I get selfish.

If you patch SQL Server, you have to restart it. If you restart it, all the good stuff goes bye-bye, then we might have to delay working together for metrics to build back up.

So pretty and please, patch that thing regularly.

And get a monitoring tool.

For both of us.

Not Just SQL Server

I’ve seen lots of folks using really old versions of SSMS for no good reason. You should keep that up to date, too. Just don’t forget to skip the Azure Data Studio install, if you don’t need it.

There have been a lot of important improvements and fixes to SSMS, especially since version 18. If you’re not running at least that, go grab it.

Especially if we’re going to be working together, because I’ll stop and make you get it if you don’t already have it.

And get a monitoring tool.

For both of us.

Thanks for reading!