CHAR vs VARCHAR Memory Grants In SQL Server Queries

Strings Were A Mistake


While working with a client recently, we found that someone, at some point in time, probably during the original migration from Access, had chosen CHAR columns rather than VARCHAR columns.

Okay, fine. How bad could it be?

Bad enough that… A whole bunch of columns that only had a single character in them were stored in CHAR(1000) columns.

Which was wildly inflating memory grants, and causing all sorts of nasty issues.

Table create script it at the end of the post!

Del Granto


To show how you much different grants are for char vs varchar, I need to give you a little background on sort memory grant estimates.

When the optimizer estimates how much memory it’ll need, the calculation is based on the number and width of the rows that will have to get sorted.

For variable length string columns, it estimates that half the number of bytes will be filled. So if you have a varchar(100) column it’ll estimate 50 bytes will be filled for every row, and for an nvarchar(100) column it’ll estimate that 100 bytes will be filled for every row, because unicode characters are stored as double-byte to account for accent marks, etc.

So, yes, identically sized varchar and nvarchar columns will have different memory grant requirements.

And yes, identically sized (n)char and (n)varchar columns will also have different memory grant requirements.

Granto Loco


Let’s take these two queries, and let memory grant feedback right-size the grants for these two queries:

DECLARE 
    @c char(1000);
SELECT
    @c = m.char_col
FROM dbo.murmury AS m
ORDER BY m.some_date DESC;

DECLARE 
    @v varchar(1000);
SELECT
    @v = m.varchar_col
FROM dbo.murmury AS m
ORDER BY m.some_date DESC;

Here are the fully adjusted and stabilized grants:

SQL Server Query Plan
hangin’ tough

Around 9GB vs 441MB. All because those CHAR columns are padded out with 999 empty characters.

SQL Server Query Results
camp nightmare

So hey, if you need a reason to review schema for char column sanity, this just might be it.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Scripto


DROP TABLE IF EXISTS dbo.murmury;

SELECT
    ISNULL
    (
        x.n, 
        0
    ) AS id,
    DATEADD
    (
        MINUTE,
        x.n,
        '20210101'
    ) AS some_date,
    CONVERT
    (
        char(1000),
        LEFT(x.text, 1)
    ) AS char_col,
    CONVERT
    (
        varchar(1000),
        LEFT(x.text, 1)
    ) AS varchar_col
INTO dbo.murmury
FROM 
(
SELECT TOP (1000 * 5000)
    ROW_NUMBER() OVER
    (
        ORDER BY 
            1/0
    ) AS n,
    m.text
FROM sys.messages AS m
CROSS JOIN sys.messages AS m2
) AS x;

ALTER TABLE dbo.murmury
ADD CONSTRAINT pk_mr PRIMARY KEY CLUSTERED(id);

 

Why Expressions Are Better Than Local Variables For Performance In SQL Server Queries

Often Enough


I see people writing stored procedures that use local variables either because they read that all functions in a where clause are dangerous, or because they need to perform a calculation on one of the parameters that get passed in.

In nearly all cases, you’re better off… not doing that.

Here are a couple examples of why.

Hard Yes


Passing in a function like GETDATE, and even passing it to another function like DATEADD, the optimizer can interpret them to their current values and make a pretty decent guess based on them.

SQL Server Query Plan
goo

This is a Good Enough™ guess.

Hard Pass


Once you assign that function to a value, everything gets awkward.

SQL Server Query Plan
bang bang

That’s a Pretty Bad™ guess, but not the end of the world. What’s bad is that the guess never changes even if you change the span of time.

Look what happens if we just add one day instead of one year.

SQL Server Query Plan
one day at a time

We get the exact same guess as before — 821,584 rows. Bummer.

Storied Tradition


It’s nearly the same thing in stored procedures. What I see more often is people there is people passing in one parameter for a start date, and then using a local variable to figure out an end date.

CREATE OR ALTER PROCEDURE dbo.dangit_bobby
(
    @start_date datetime
)
AS
BEGIN
SET NOCOUNT, XACT_ABORT ON;

DECLARE
    @later datetime = DATEADD(DAY, 1, GETDATE());

    SELECT
        records = 
            COUNT_BIG(*)
    FROM dbo.express_yourself AS ey
    WHERE ey.some_date >= @start_date
    AND   ey.some_date <  @later;

END;
GO

EXEC dbo.dangit_bobby 
    @start_date = '2021-05-19';
GO
SQL Server Query Plan
FAR OFF DUDE

Advanced Calculus


Let’s change how we use the parameter, and put it into some date math in the where clause instead.

CREATE OR ALTER PROCEDURE dbo.dangit_bobby
(
    @start_date datetime
)
AS
BEGIN
SET NOCOUNT, XACT_ABORT ON;

    SELECT
        records = 
            COUNT_BIG(*)
    FROM dbo.express_yourself AS ey
    WHERE ey.some_date >= @start_date
    AND   ey.some_date <  DATEADD(DAY, 1, @start_date);

END;
GO 
EXEC dbo.dangit_bobby 
    @start_date = '2021-05-19';
GO

We get a Much Closer™ estimate. What a lovely day.

SQL Server Query Plan
go for both

Hardest Part


It’s often tempting to take shortcuts like this, especially if you need to reuse the same calculation multiple times.

Now, look, if you stare closely enough at these plans, you’ll notice that they all finish quickly enough. That’s because I have a small table with a wonderful index on the column I care about.

The point here is to show you how bad estimates can turn up even in ideal circumstances when you do things The Wrong Way™.

As long as you’re not wrapping columns in functions like this, you’re probably okay.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Inlining Financial Functions In SQL Server For Better Query Performance

Big Tuning


I work with a lot of clients who do things with money. Loads of money. I’m a very trustworthy person.

At just about every client site, I see a common set of financial functions being used to calculate various things. The code is all the same, too.

Some of it comes from published government guidelines, and some of it comes straight out of accounting 101 books.

The big problem is that all of these functions were written as scalar UDFs, and performance becomes dead.

Recently, one of my clients was nice enough to agree to let me publish my rewrites of their functions as inline table valued functions.

You can download them here.

License and Fair Warning


These functions are provided with no license whatsoever. You can take them and do whatever you want with them

The caveat here is that I take no responsibility for anything you do with them. I did not write any of the mathematical formulas in these functions, and I take no credit for any of the code for anything in this folder, other than rewriting the scalar UDFs as inline table valued functions. I was given permission to publish these by the people who wrote and/or incorporated the code originally.

They returned the same results as the scalar UDFs in local tests, so to that end the results are accurate. You will have to do your own local tests to make sure they work the same as whatever they’re replacing.

Aside from logical requirements, you also need to make sure they satisfy any legal or regulatory requirements for your industry. That sounds important, too.

The bottom line here is that you can’t sue me, so sayeth the law offices of Na Na Na Boo Boo.

You can download them here.

Contributing


If you find any issues with these functions, you have another version, or you want to add a function to the library, you can contribute over at the GitHub repo.

If you’d prefer to do so anonymously, you can email me.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Allow Memory Grant Percent In SQL Server Standard Edition

Voice Of Users


I recently opened a User Voice issue for this, and I’d appreciate if you’d vote for it (note: this link is dead now, long live the first party solution)

From the issue:

In 2016 SP1 we got many features handed down to Standard Edition, but Resource Governor was not one of them.

In Enterprise Edition, Resource Governor is frequently used to reduce the maximum memory grant percent down from 25. That prevents queries that you don’t have control over from totally trashing your server. Since memory grant feedback is not available in Standard Edition, users aren’t left with many options. Query hints are nice, but only if you can change the queries.

Max grant percent should be either settable in Standard Edition, or decoupled from Resource Governor and settable at the database or server level.

Furthers Day


Adding in a bit more editorializing than I thought was appropriate in the User Voice issue, this setting is irresponsibly high for most workloads. If you have Max Server Memory set to 100GB, do you really want queries being able to ask for 25GB of it?

Because, like, they can.

If you have control over the queries and indexes on a server, there are all sorts of potential things you can do to “fix” memory grants. Sometimes they come from bad estimates, other times they come from inflated string column sizes — think about that column you have called “State” that’s an nvarchar max — because the optimizer guesses that every value will be about half the size of the definition.

There are lots of reasons! So many reasons, in fact, that Microsoft has a performance feature called Memory Grant Feedback. The problem is that this feature is only available in Enterprise Edition, and not readily available to Row Mode queries until SQL Server 2019. In SQL Server 2017, you needed some column store/batch mode operations in your query for it to apply.

While there are hints that can be used to control memory grants at the query level, query hints can be hard to apply in some circumstances. Vendor code, ORMs, ad hoc heavy workloads, and more make the broad application of query hints impossible.

Having this as a setting outside of Resource Governor (or, heck, making Resource Governor available in Standard Edition), would be a huge benefit to what seems like a forgotten group of SQL Server customers.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Things SQL Server vNext Should Address: Optional Parameters

Lamp


This issue is one that could be linked to other times when the optimizer defers certain portions of optimization to later stages. It’s also something that could lead to complications, because the end result is multiple execution plans for the same query.

But it goes back to a couple basic approaches to query writing that I think people need to keep in mind: write single purpose queries, and things that make your job easier make the optimizer’s job harder.

A good example of a multi-purpose query is a merge statement. It’s like throwing SQL Server a knuckleball.

Fiji


Another example of a knuckleball is this knucklehead pattern:

SELECT 
    p.*
FROM dbo.Posts AS p
WHERE (p.OwnerUserId   = @OwnerUserId OR @OwnerUserId IS NULL)
AND   (p.CreationDate >= @CreationDate OR @CreationDate IS NULL);

SELECT 
    p.*
FROM dbo.Posts AS p
WHERE (p.OwnerUserId   = ISNULL(@OwnerUserId, p.OwnerUserId))
AND   (p.CreationDate >= ISNULL(@CreationDate, p.CreationDate));
GO 

SELECT 
    p.*
FROM dbo.Posts AS p
WHERE (p.OwnerUserId   = COALESCE(@OwnerUserId, p.OwnerUserId))
AND   (p.CreationDate >= COALESCE(@CreationDate, p.CreationDate))
ORDER BY p.Score DESC;
GO

I hate seeing this, because I know how many bad things can happen as a result of this.

One example I love is creating these two indexes and running the first query up there.

CREATE INDEX onesie ON dbo.Posts(OwnerUserId, Score, CreationDate);
CREATE INDEX threesie ON dbo.Posts(ParentId, OwnerUserId);

The optimizer chooses the wrong index — the one that starts with ParentId — even though the query is clearly looking for a potential equality predicate on OwnerUserId.

SQL Server Query Plan
180

Deferential


It would be nice if the optimizer did more to sniff out NULL values here to come up with more stable plans for the non-NULL values, essentially doing the job that dynamic SQL does by only adding predicates to the where clause when they’re not NULL.

It doesn’t have to look further at actual values on compilation, because that’s essentially a RECOMPILE hint on every query.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Things SQL Server vNext Should Address: Lookup Placement

All Looked Up


Lookups are interesting. On the one hand, I think the optimizer should be less biased against them, and on the other hand they can cause a lot of issues.

They’re probably the most common issue in queries that suffer from parameter sniffing that I see, though far from the only unfortunate condition.

Under the read committed isolation level, lookups can cause readers to block writers, and even cause deadlocks between readers and writers.

This isn’t something that happens under optimistic isolation levels, which may or may not have something to do with my earlier suggestion to make new databases use RCSI by default and work off the local version store associated with accelerated database recovery.

Ahem.

Leafy Greens


One thing that would make lookups less aggravating would be giving the optimizer the ability to move them around.

But that really only works depending on what the lookup is doing. For example, some Lookups just grab output columns, and some evaluate predicates:

SQL Server Query Plan
all one word

Further complicating things is if one of the columns being output is used in a join.

SQL Server Query Plan
bad movie

Outside Chance


There are likely other circumstances where decoupling the lookup and moving the join to another part of the plan would be impossible or maybe even make things worse. It might even get really weird when dealing with a bunch of left joins, but that’s the sort of thing the optimizer should be allowed to explore during, you know, optimization.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Things SQL Server vNext Should Address: Table Variable Modification Performance

Canard


People still tell me things like “I only put 100 rows in table variables”, and think that’s the only consideration for their use.

There are definitely times when table variables can be better, but 100 rows is meaningless.

Even if you put one row in a table variable it can fudge up performance because SQL Server doesn’t know what’s in your table variable. That’s still true in SQL Server 2019, even if the optimizer knows how many rows are in your table variable.

The problem that you can run into, even with just getting 100 rows into a table variable, is that it might take a lot of work to get those 100 rows.

Bernard


I’ve blogged before about workarounds for this problem, but the issue remains that inserts, updates, and deletes against table variables aren’t naturally allowed to go parallel.

The reason why is a bit of a mystery to me, since table variables are all backed by temp tables anyway. If you run this code locally, you’ll see what I mean:

SET NOCOUNT ON;
SET STATISTICS IO ON;
DECLARE @t table(id int);
SELECT * FROM @t AS t;
SET STATISTICS IO OFF;

Over in the messages tab you’ll see something like this:

Table '#B7A53B3E'. Scan count 1, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.

Now, look, I’m not asking for update or delete portions of the query plan to go parallel, but it might be nice if other child operators could go parallel. That’s how things go with regular tables and #temp tables. It would be nice if inserts could go parallel, but hey

Ardbeg


The problem this solves is one that I see often, usually from vendor code where the choice of which temporary object to use was dependent on individual developer preference, or they fell for the meme that table variables are “in memory” or something. Maybe the choice was immaterial at first with low data volume, and over time performance slowly degraded.

If I’m allowed to change things, it’s easy enough to replace @table variables with #temp tables, or use a workaround like from the above linked post about them to improve performance. But when I’m not, clients are often left begging vendors to make changes, who aren’t receptive.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Things SQL Server vNext Should Address: Common Table Expression Materialization

Repetition Is Everything


I know what you’re thinking: this is another post that asks for a hint to materialize CTEs.

You’re wrong. I don’t want another hint that I can’t add to queries to solve a problem because the code is coming from a vendor or ORM.

No, I want the optimizer to smarten up about this sort of thing, detect CTE re-use, and use one of the New And Improved Spools™ to cache results.

Let’s take a look at where this would come in handy.

Standalone


If we take this query by itself and look at the execution plan, it conveniently shows one access of Posts and Users, and a single join between the two.

SELECT
    u.Id AS UserId,
    u.DisplayName,
    p.Id AS PostId,
    p.AcceptedAnswerId,
    TotalScore = 
        SUM(p.Score)
FROM dbo.Users AS u
JOIN dbo.Posts AS p
    ON p.OwnerUserId = u.Id
WHERE u.Reputation > 100000
GROUP BY 
    u.Id, 
    u.DisplayName,
    p.Id, 
    p.AcceptedAnswerId
HAVING SUM(p.Score) > 1
SQL Server Query Plan
invitational

Now, let’s go MAKE THINGS MORE READABLE!!!

Ality


WITH spool_me AS
(
    SELECT
        u.Id AS UserId,
        u.DisplayName,
        p.Id AS PostId,
        p.AcceptedAnswerId,
        TotalScore = 
            SUM(p.Score)
    FROM dbo.Users AS u
    JOIN dbo.Posts AS p
        ON p.OwnerUserId = u.Id
    WHERE u.Reputation > 100000
    GROUP BY 
        u.Id, 
        u.DisplayName,
        p.Id, 
        p.AcceptedAnswerId
    HAVING SUM(p.Score) > 1
)
SELECT
    a.UserId,
    a.DisplayName,
    a.PostId,
    a.AcceptedAnswerId,
    a.TotalScore,
    q.UserId,
    q.DisplayName,
    q.PostId,
    q.AcceptedAnswerId,
    q.TotalScore
FROM spool_me AS a
JOIN spool_me AS q
    ON a.PostId = q.AcceptedAnswerId
ORDER BY a.TotalScore DESC;

Wowee. We really done did it. But now what does the query plan look like?

SQL Server Query Plan
oh, you

There are now two accesses of Posts and two accesses of Users, and three joins (one Hash Join isn’t in the screen cap).

Detection


Obviously, the optimizer knows it has to build a query plan that reflects the CTE being joined.

Since it’s smart enough to do that, it should be smart enough to use a Spool to cache things and prevent the additional accesses.

Comparatively, using a #temp table to simulate a Spool, is about twice as fast. Here’s the CTE plan:

SQL Server Query Plan
double

Here’s the Spool Simulator Plan™

SQL Server Query Plan
professionals

Given the optimizer’s penchant for spools, this would be another chance for it to shine on like the crazy diamond it is.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Things SQL Server vNext Should Address: Handling Disjunctive Predicates

Bite Sized Gripes


I sat down to write this blog post, and I got distracted. I got distracted for two hours.

SQL Server Query Plan
take two

So, pretty obviously, we have a query performance issue.

What’s the cause of this malady? OR. Just one little OR.

Aware


It’s not like I don’t have indexes. They’re fabulous.

CREATE INDEX c 
ON dbo.Comments
    (PostId, UserId);

CREATE INDEX v 
ON dbo.Votes
    (PostId, UserId);

CREATE INDEX cc
ON dbo.Comments
    (UserId, PostId);

CREATE INDEX vv
ON dbo.Votes
    (UserId, PostId);

Look at those things. Practically glowing.

But this query just wrecks them

SELECT
    records = 
        COUNT_BIG(*)
FROM dbo.Comments AS c
JOIN dbo.Votes AS v
    ON c.UserId = v.UserId
    OR c.PostId = v.PostId;

That’s the plan up there that ran for a couple hours.

Unrolling


A general transformation that the optimizer can apply in this case is to union two result sets together.

SELECT
    records = 
        COUNT_BIG(*)
FROM 
(
    SELECT
        n = 1
    FROM dbo.Comments AS c 
    JOIN dbo.Votes AS v 
        ON  c.UserId = v.UserId
        AND c.PostId <> v.PostId
    
    UNION ALL
    
    SELECT
        n = 1
    FROM dbo.Comments AS c
    JOIN dbo.Votes AS v 
        ON  c.PostId = v.PostId
        AND c.UserId <> v.UserId
) AS x;

The following are two executions plans for this transformation. One in compatibility level 150, where Batch Mode On Row Store has kicked in. The second is in combability level 140, in regular old Row Mode. Though the Row Mode only plan is much slower, it’s still a hell of a lot faster than however much longer than two hours the original query would have run for.

SQL Server Query Plan
light of day

The reason the Row Mode plan is so slow is because of a god awful Repartition Streams.

SQL Server Query Plan
dunked

Abstract


This is one of those “easy for you, hard for the optimizer” scenarios that really should be easy for the optimizer by now.

I don’t even care if it can be applied to every instance of this — after all there may be other complicating factors — it should at least be available for simple queries.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.