Software Vendor Mistakes With SQL Server: Selecting Everything From Everywhere

Proble-Matic


It seems that most developers are either too naive or too lazy to choose just the columns that they need to satisfy a particular requirement, and so their ORM queries end up doing the equivalent of SELECT *from every table involved in the query.

I have seen some worthwhile uses for this, where people would select all the data that they needed to only make one round trip to the server rather than 5 or more. In those cases, it’s okay to break the rule, especially if the query is a Heavy Hitter™.

But this can end up causing all sorts of problems with queries. Back in the Oldenn Dayes of the internet, you’d find people talking about the overhead of metadata lookups and blah blah.

I’m sure there’s some tiny amount, but that’s hardly worth talking about in light of the other, more serious issues that you’ll encounter.

Memory Grants


Memory grants come into play most commonly when you need to Sort or Hash data in a query plan. There are some other cases, like Optimized Nested Loops, and columnstore inserts, but those are somewhat less common.

Memory grants can get really thrown off by a couple things: cardinality misestimations, and string columns. Right now, the optimizer makes a guess that every row in the column is “half full”. That means whatever the byte length of your column is.

For instance:

  • varchar(100): 50 bytes
  • nvarchar(100): 100 bytes

The wider your columns get, the bigger your memory grants get. That becomes a bigger deal on servers that are starving for memory, because those grants can really eat up a lot of memory that you need for other stuff, like caching data or query plans.

Some Columns Are Bigger Than Others


These queries will all request additional memory from SQL Server, because we’re asking it to return dated sorted by Reputation, and we don’t have any useful indexes that put the Reputation column in order. That means we need some scratch space to do the work in.

SELECT TOP (1000) 
    u.DisplayName 
FROM dbo.Users AS u 
ORDER BY 
    u.Reputation;

SELECT TOP (1000)
    u.DisplayName,
    u.Location
FROM dbo.Users AS u
ORDER BY
    u.Reputation;

SELECT TOP (1000)
    u.DisplayName,
    u.Location,
    u.WebsiteUrl
FROM dbo.Users AS u
ORDER BY
    u.Reputation;

SELECT TOP (1000)
    u.DisplayName,
    u.Location,
    u.WebsiteUrl,
    u.AboutMe
FROM dbo.Users AS u
ORDER BY
    u.Reputation;

The query plans for these are unremarkable. They all have a Sort operator in them used to order our data, but the memory grants for all of these go up, up, up.

SQL Server Query Plan Tool Tips
sunshine

We go from 284MB, to 584MB, to 1178MB, to 13GB. And worse, because of the 50% guess for column fullness, we only end up using 369MB of that 13GB.

Good luck with that.

Newer versions of SQL Server attempt to help you with this by adjusting memory grants between executions, but there are a bunch of caveats.

Lifetime


There are other problems that can occur with queries like these, but I’m going to cover them in other parts of the series because they tie into other concepts like indexing and parameter sniffing that I need to lay some foundation for first.

The memory grant thing is bad enough on its own, but it may not become a real issue until your databases reach a size where they outpace RAM by a significant margin.

That’s usually when you start seeing significant signs of memory pressure, which you can use my sp_PressureDetector stored procedure to help detect.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Software Vendor Mistakes With SQL Server: Passing Long IN Clauses To Queries From An ORM

Mirror Backwards


When it comes to weird performance problems, I’ve seen this cause all sorts of them. The worst part is that there are much smarter ways to handle this.

While Table Valued Parameters aren’t perfect, I’d argue that they’re far more perfect than the alternative.

The first issue I see is that these clauses aren’t ever parameterized, which brings us to a couple familiar issues: plan cache bloat and purge, and constant query compilation.

Even when they are parameterized, some bad stuff can happen.

Since We’re Here


Let’s say you take steps like in the linked post about above to parameterize those IN clauses, though. Does that fix all your performance problems?

You can probably guess. Hey look, index!

CREATE INDEX u ON dbo.Users(DisplayName, Reputation) 
WITH(SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);

Now, we’re gonna use a stored procedure I wrote to generate a long parameterized in clause. It’s not pretty, and there’s really no reason to do something like this outside of a demo to show what a bad idea it is.

Here’s how we’re gonna execute it:

EXEC dbo.Longingly
    @loops = 50,
    @debug = 0;

EXEC dbo.Longingly
    @loops = 100,
    @debug = 0;

EXEC dbo.Longingly
    @loops = 1000,
    @debug = 0;

If anyone thinks people sending 1000-long lists to an IN clause isn’t realistic, you should really get into consulting. There’s some stuff out there.

And Wave


Here’s what the query plans for those executions look like:

SQL Server Query Plan
stingy

The important detail here is that even with our super duper powered index, the optimizer decides that life would be too difficult to apply our “complicated” predicate. We end up with query plans that scan the index, and then apply the predicate in a Filter operator afterwards.

You’re not misreading, either. The 1000-value filter runs for roughly 7 seconds. Recompile hints do not work here, and, of course, that’d undo all your hard work to parameterize these queries.

Moral of the story: Don’t do this. Pass in a TVP, like the link above shows you how to do. If your ORM doesn’t support that, you might need to switch over to stored procedures.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Software Vendor Mistakes With SQL Server: Sticking With ORMs And Not Using Stored Procedures

Repeat It We Must


If you have a small to medium sized application, and your developers are highly skilled not only in their preferred method of development, and you have someone who can manage SQL Server performance well (think settings, indexes, etc.) then you might be able to get away with having zero stored procedures.

This post is for everyone else, and by “everyone else” I mean “99% of software shops” out there.

You are seen. You are seen by me.

One Eye Up


The problems you’ll run into eventually will come from a few places, in general:

  • SQL Server has some wonky setup issues (this is still surprisingly common)
  • No one is managing the indexing (and I don’t mean maintenance, here)
  • Developer naïveté about what the generated queries look like (text and plan complexity)

Certain PaaS offerings, and software shops who offload certain tasks to customer DBA teams will partially be in the clear, here. But more often than not, DBAs are accidental and not much more confident than the average sysadmin when it comes down to digging into performance issues.

DBAs skilled in the art of performance tuning will be happy to tweak indexes where possible, but a lot of their feedback to you as a vendor might be a wishlist of things that they could do if the code were in a stored procedure where more traditional tuning techniques are available.

Matter Of Years


I’ve helped more than a few vendors dig their way out of ORM hell now, and the process isn’t so bad once you understand the logic involved to get the right results back.

Once you’re writing stored procedures, you have a whole world of possibilities when it comes to adjusting the code to better manage performance. For critical applications, this often becomes a necessity once a database gets big or busy enough.

I totally understand the appeal of ORMs for many development tasks and priorities, but once you run out of tricks to get it working faster, you need to learn how to let go.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Software Vendor Mistakes With SQL Server: Not Parameterizing Queries

You Won’t Stop


Continuing a bit on yesterday’s theme of parameterization, another type of mistake I see software vendors make quite a bit is not parameterizing queries at all, or only partially parameterizing them. When you do this, you harm SQL Server’s ability to cache and reuse execution plans. There are very few situations where this is advisable outside of data warehouses.

There are all sorts of things that can cause this that aren’t just confined to places where you’d traditionally consider parameterization, like TOP, OFFSET/FETCH, and even static values in a SELECT list.

If you’re reading this with some knowledge of SQL Server, the reason I say it harms the ability to cache plans is because there are limits to plan cache size, and the more pollution you cause the more churn there is.

Partially parameterizing queries has the additional misfortune of not being a problem that the forced parameterization setting can solve.

Hold Up


To simulate what happens when you don’t parameterize queries, we can use unsafe dynamic SQL. In this case, it’s probably not the worst sin against dynamic SQL since we’re using an integer limited to a two byte string, but you know, I’d be a bad blogger if I didn’t point that out.

DECLARE
    @i int = 1,
    @sql nvarchar(MAX) = N'';


WHILE @i <= 10
BEGIN
    
    SELECT 
        @sql = N'    
        SELECT
            c = COUNT_BIG(*),
            s = SUM(p.Score * 1.),
            m = MAX(u.DisplayName),
            r = MIN(u.Reputation)
        FROM dbo.Users AS u
        JOIN dbo.Posts AS p
            ON p.OwnerUserId = u.Id
        WHERE u.Reputation = ' + CONVERT(nvarchar(2), @i) + ';
    ';
    
    RAISERROR
    ( 
        @sql,
        0,
        1
    )
    WITH
        NOWAIT;

    SET STATISTICS XML ON;
        EXEC sys.sp_executesql
            @sql;
    SET STATISTICS XML OFF;
    
    SELECT 
        @i += 1;
    
END;

After that, we can look at the plan cache and see what happened.

Large Amounts


These 10 executions will generate 3-4 different query plans, but even when the same plan is chosen, it has different identifiers.

beleefs

For high frequency execution queries, or more complicated queries (think lots of left joins, etc.) that may spend a lot of time in the optimization process trying out different join orders and plan shapes, you probably want to avoid not parameterizing queries, or only partially parameterizing them.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Software Vendor Mistakes With SQL Server: Not Strongly Typing Parameters

Month Out


Far and away, my most common clients are people who create or host applications that run on top of SQL Server. Over the years, I’ve noticed some common trends, but I’ve never written a cohesive set of posts that bring them all together in an easy to find reference.

All the posts in this series will be under the “Software Vendor Mistakes” tag, and I’ll have a roundup post at the end that links to them all. The topics I’m going to cover are purely from the perspective of software development, not infrastructure.

Many infrastructure decisions are made before problems with code, indexes, etc. are fixed, or to cover them up. I understand that line of thinking, but I want to reach the people who can make fundamental changes so a clear picture of what hardware is necessary comes through.

Enough preamble. Business time.

Parameter Inference


When working with ORMs, care has to be taken to strongly type your parameters to match the data type, length, precision, and scale of the columns those parameters will be compared to. Time and time again, I see the same patterns with string parameters:

  • They’re unnecessarily typed as Unicode/nvarchar
  • They’re not defined with an appropriate length
  • They’re used as catch-all parameters for temporal types (dates, etc.)

These coding malfeasances cause issues with:

  • Implicit conversions (index scans 🙀)
  • Unnecessary later filters (predicates that can’t be used as seek operators)
  • Plan cache pollution (compiling a new plan for every query)

I see this in particular with applications that attempt to be “database agnostic”, which almost seems to really mean “database atheistic”. It’s like developers don’t believe that databases really exist, and they can flaunt the rules that govern reliable performance.

Sure, the implicit conversion thing can happen outside of ORMs, but the length thing isn’t generally an issue.

Examples


I don’t have a handy example of the length issue causing plan cache pollution outside of this Stack Exchange Q&A: Parameterized query creating many plans

But that’s only because I recently moved computers and I’m not quite set up for that demo. But an implicit conversion demo is quite easy enough to demonstrate when you have a varchar column and you compare it to an nvarchar predicate. I do that in this video, normally part of my paid training:

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

I’ve Got A Full Training Day At SQLBits: The Professional Performance Tuning Blueprint

The Professional Performance Tuning Blueprint


Searching the internet for every problem isn’t cutting it. You need to be more proactive and efficient when it comes to finding and solving database performance fires.

I work with consulting customers around the world to put out SQL Server performance fires. In this day of learning, I will teach you how to find and fix your worst SQL Server problems using the same modern tools and techniques which I use every week.

You’ll learn tons of new and effective approaches to common performance problems, how to figure out what’s going on in your query plans, and how indexes really work to make your queries faster. Together, we’ll tackle query rewrites, batch mode, how to design indexes, and how to gather all the information you need to analyze performance.

This day of learning will teach you cutting edge techniques which you can’t find in training by folks who don’t spend time in the real world tuning performance. Performance tuning mysteries can easily leave you stumbling through your work week, unsure if you’re focusing on the right things. You’ll walk out of this class confident in your abilities to fix performance issues once and for all.

If you want to put out SQL Server performance fires, this is the precon you need to attend. Anyone can have a plan, it takes a professional to have a blueprint.

See you there!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Black Christmas Training Sale

Ho Ho Ho


For the month of December I’m putting my training on sale at 75% off.

That means you get over 24 hours of SQL Server performance tuning training content for around $100 USD when you click on this link and add everything to your cart. Discount applies at checkout, so you have to click through for it to apply

The way my training is structured is to take you from where you are to where you want to be. There’s beginner, intermediate, and advanced stuff.

You pick where you want to start — but I guarantee no matter where you are you’ll learn something new — I don’t want to waste anyone’s time. There’s enough of that in the world.

What I cover is all of the things that confused, confounded, and cut me up when I first started learning about SQL Server querying, indexing, execution plans, locking and deadlocks, and more.

Hi Hi Hi


If you’re planning on attending SQLBits in London this March, you can catch me doing a full day session live and in person.

It’ll be nice to see those smiling faces again!

Enjoy, and have yourself a happy holiday.

Thanks for reading!

Is Read Uncommitted (NOLOCK) An Optimistic Or Pessimistic Isolation Level?

Poll Worker


I ran a Twitter poll recently, because as I was writing another blog post, I wasn’t quite sure how to group uncommitted/nolock between pessimistic or optimistic isolation levels.

On the one hand, locking and blocking does still come into play with them. anyone who has seen locking on schema stability can attest to that. On the other hand, they will read just about anything you want. That’s sort of optimistic in spirit, but maybe not in letter.

Most of you kind folks out there called it optimistic, though some others had more colorful language to describe it:

At the end of the day, it’s probably not the right isolation to use, no matter what you choose to call it.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

How To Tell If You Need To Partition Your Tables In SQL Server

You Probably Don’t


When I work with clients, nearly every single one has this burning question about partitioning.

“We’ve got this huge table, should we partition it?”

“Do you need to insert or delete data in big chunks?”

“No, it’s all transactional.”

“Do you have last page contention problems?”

“No, but won’t it help performance?”

“No, not unless you’re using clustered column store.”

“…”

Burial


Buried deep in that conversation are the two reasons you might want to partition a table:

  • Data management features
  • Clustered columnstore indexes

Included in “data management” is the ability to put different partitions in different files/file groups, which you can make read only, and/or allow you do do piecemeal backups and restores.

Outside of those, you’re only introducing a whole mess of complexity to a whole bunch of different places, not to mention some pretty severe pains rebuilding or shadow-copying and switching your current “huge table” as a partitioned on.

Complexity


If you partition a table and plan on using it for data management, all of your other indexes need to be aligned to the partitioning scheme as well.

That adds complexity to index tuning, because if you mess up and don’t do that, your partition swapping will error out. A lot of people add DDL triggers to reject index definitions without the partition alignment.

Add to that, query plans and query writing both get more complicated, too. Not only that, but the query optimizer has to work harder to figure out if partitions can be eliminated or not.

Problems that might have been less problematic against non-partitioned tables and indexes (like some implicit conversions), can be a death spiral against partitioned tables.

There have been a lot of bugs and performance regressions over the years with partitioning, too. Some have been fixed, and others are only “fixed” in higher compatibility levels.

Lonely Road


I’ve seen a lot of people partition tables, expecting performance fireworks and afternoon delights, only to find no joy. In some cases, I’ve had to help people undo partitioning because critical queries slowed down.

It seems like a “free” thing you can do, especially if you’re clueless about how to find and fix real problems in your workload. Like most marquee features in SQL Server, there’s an air of mystery and misunderstanding about partitioning excels at and is best used for.

In most cases, SQL Server users seem to skip basic index tuning and go right to WE NEED TO PARTITION THIS THING.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

How To Tell You Need An Optimistic Isolation Level In SQL Server

Into The Yonder


When you create a database in SQL Server (everything except Azure SQL DB), you get this garbage isolation level called Read Committed.

That isolation level, along with others like Repeatable Read and Serializable, are considered pessimistic. Though Repeatable Read and Serializable are less garbage, it comes with a strictness of locking that most applications don’t need across the board. They may need it for certain niche activities, but you know…

Then there are isolation levels that are quite useful for most isolation levels, and they’re called optimistic isolation levels. In SQL Server, they’re Snapshot Isolation (SI), and Read Committed Snapshot Isolation (RCSI).

I think they are very much not-garbage, and so do other major database platforms that use specific implementations of MVCC (Multi Version Concurrency Control) by default. There may be some historical reason for SQL Server not doing it by default, which is also garbage.

Differences


There are some differences between the two optimistic isolation levels, which makes them useful in different situations. Most people don’t need both turned on, which is something I see quite a bit, but there just might be someone out there who turns on and actually uses both.

To generalize a little bit:

  • SI is good when you only want certain queries to read versioned data
  • RCSI is good when you want every query to read versioned data

What’s versioned data? You can think of it like the “last known good” version of a row before a modification started.

When an update or a delete starts to change data, SQL Server will send those last known good versions up to tempdb for read queries to grab what they need rather than getting blocked. Inserts are a little different, because they are the only known good version of a row.

There are some other differences, too.

SI:

  • Can be turned on without exclusive access to the database
  • Queries all read data as it looked at the beginning of a transaction

RCSI:

  • Does need exclusive access to the database, but it’s not as bad as it sounds
  • Reads data as it looked when each query in a transaction starts

Getting exclusive access to the database can be done without the single-user/multi-user dance:

ALTER DATABASE YourDatabase
    SET READ_COMMITTED_SNAPSHOT ON
WITH ROLLBACK IMMEDIATE;

If You Know You Know


The reasons why you might want to turn these on are when your application performance suffers because of locking or deadlocking.

If you want some quick and dirty queries to figure out if you’ve got those happening, you can run these queries.

/*Lock waits?*/
SELECT
    dows.wait_type,
    dows.waiting_tasks_count,
    dows.wait_time_ms,
    dows.max_wait_time_ms,
    dows.signal_wait_time_ms
FROM sys.dm_os_wait_stats AS dows
WHERE dows.wait_type LIKE 'LCK%'
AND   dows.waiting_tasks_count > 0
ORDER BY dows.wait_time_ms DESC;

/*Deadlocks?*/
SELECT 
    p.object_name,
    p.counter_name,
    p.cntr_value
FROM sys.dm_os_performance_counters p
WHERE TRIM(p.counter_name) = 'Number of Deadlocks/sec'
AND   TRIM(p.instance_name) = '_Total';

If you need deeper analysis of waits or deadlocks, I’d suggest you use sp_BlitzFirst or sp_BlitzLock.

What you want to look for in general are when readers and writers are interfering with each other. If your blocking or deadlocking problems are between any kind of exclusive locks, optimistic isolation levels won’t help you.

Wait stats from readers will generally have an “S” in them, like LCK_M_S. The same goes for deadlocks, where the lock mode will have an S in either the owner or the waiter.

Reader Writer Fighter


It’s important to keep in mind that it’s not just writers that block readers, or writers that can deadlock.

This is where the “Shared” lock and lock mode stuff comes into play. Again, if all your locks and deadlocks are between modification queries — locks and lock modes with X (exclusive) or U (update) — they’ll still block each other.

There’s a lot more details at the linked posts above, but that’s the general pattern. Another pattern to look for is if your developers keep adding more and more NOLOCK hints to “fix performance issues”.

A lot of times they’re just covering up other issues with indexing or the way queries are written, or they’re totally misunderstood. I’ve said it before, but it doesn’t mean your query doesn’t take any locks, it means that your query doesn’t respect locks taken by other queries.

That often comes as a surprise to people when I tell them, so I say it whenever I write about it. But that’s where the bad reputation comes from — it can read all sorts of in-flight data that may not reflect reality.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.