SQL Server Community Tools: Capturing Blocking With sp_HumanEvents

Month Of Sundays


A while back, Brent blogged about using September to raise awareness about free SQL Server community tools.

I thought this was a great idea because I maintain a number of them: my own scripts, the First Responder Kit scripts, and sp_WhoIsActive.

Unlike most people who left links in the comments, I read the entire post and decided to use the whole darn month to write about scripts I maintain and how I use them in my work.

Lest I be accused of not being able to read a calendar, I know that these are dropping a little earlier than the 1st of September. I do apologize for September not starting on a Monday.

There are other great tools and utilities out there, like Andy Mallon’s DBA Utility Database, but I don’t use them enough personally to be able to write about them fluently.

My goal here is to help you use each script with more confidence and understanding. Or even any confidence and understanding, if none existed beforehand.

Oral Board


First up is (well, I think) my most ambitious script: sp_HumanEvents. If you’re wondering why I think it’s so ambitious, it’s because the goal is to make Extended Events usable by Humans.

At present, that’s around 4000 lines of T-SQL. Now, sp_HumanEvents can do a lot of stuff, including logging event data to tables for a bunch of different potential performance issues.

When I was first writing this thing, I wanted it to be able to capture data in a few different ways to fit different monitoring scenarios. In this post, I’m going to show you how I most often use it on systems that have are currently experiencing blocking.

First, you need to have the Blocked Process Report enabled, which is under advanced options:

EXEC sys.sp_configure 'show advanced options', 1;
RECONFIGURE;
EXEC sys.sp_configure 'blocked process threshold', 5;
RECONFIGURE;

If you want to flip the advanced options setting back, you can. I usually leave it set to 1.

The second command turns on the blocked process report, and tells SQL Server to log any instances of blocking going on for 5 or more seconds. You can adjust that to meet your concerns with blocking duration, but I wouldn’t set it too low because there will be overhead, like with any type of monitoring.

Blockeroos


The way I usually set up to look at blocking that’s currently happening on a system — which is what I most often have to do — is to set up a semi-permanent session and watch what data comes in.

When I want to parse that data, I use sp_HumanEventsBlockViewer to do that. At first, I just want to see what kind of stuff starts coming in.

To set that session up, here’s what I do:

EXEC sp_HumanEvents
    @event_type = 'blocking',                   
    @keep_alive = 1;

What this will do is set up an Extended Event session to capture blocking from the Blocked Process Report. That’s it.

From there, you can either use my script (linked above), or watch data coming in via the GUI. Usually I watch the GUI until there’s some stuff in there to gawk at:

EXEC dbo.sp_HumanEventsBlockViewer
    @session_name = 'keeper_HumanEvents_blocking'

Blockerinos


Once you have the blocking collected, troubleshooting it is a whole… Fun… thing.

  • Misguided use of transactions
  • Misguided use of isolation levels
  • Misguided use of locking hints

Are common reasons behind blocking, but not always. Common solutions are things like:

  • Making sure foreign keys have supporting indexes, especially ones with cascading actions
  • Getting angry at Triggers and throwing your computer out the window
  • Making sure modification queries have adequate supporting indexes
  • Batching modifications on large chunks of data
  • Enabling an optimistic isolation level because Read Committed Is A Garbage Isolation Level

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.

Correlating Data From sp_WhoIsActive to Query Store Or The Plan Cache

sp_QuickiePost


If you’re the type of person who logs sp_WhoIsActive to a table to capture executing queries, you may want to find some additional details about the statements that end up there.

Out of the box, it’s arduous, tedious, and cumbersome to click around on a bunch of columns and grab handles and hashes and blah blah.

Now, these two queries depend on you grabbing a couple specific columns in your output. If you’re not getting these, you’re kinda screwed:

From query plans, you can get the plan handle and plan hash:

WITH XMLNAMESPACES(DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan')
SELECT
   session_id,
   query_plan,
   additional_info,
   query_hash = 
       q.n.value('@QueryHash', 'varchar(18)'),
   query_plan_hash = 
       q.n.value('@QueryPlanHash', 'varchar(18)')
FROM dbo.WhoIsActive AS w
CROSS APPLY w.query_plan.nodes('//StmtSimple') AS q(n);

From additional info, you can get the SQL Handle and Plan Handle:

SELECT
  session_id,
  query_plan,
  additional_info,
  sql_handle =
      w.additional_info.value('(//additional_info/sql_handle)[1]', 'varchar(131)'),
  plan_handle = 
      w.additional_info.value('(//additional_info/plan_handle)[1]', 'varchar(131)')
FROM dbo.WhoIsActive AS w;

Causation


For the plan cache, you can use your favorite script. Mine is, of course, sp_BlitzCache.

You you can use the @OnlyQueryHashes or @OnlySqlHandles parameters to filter down to queries you’re interested in.

For Query Store, you can use my script sp_QuickieStore to do the same thing.

It has parameters for @include_query_hashes, @include_plan_hashes or @include_sql_handles.

You might want to add some other filtering or sorting to the queries up there to find what you’re interested in, but this should get you started.

I couldn’t find a quick or easy way to combine the two queries, since we’re dealing with two different columns of XML data, and the query plan XML needs a little special treatment to be queried.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.

Some Interesting Questions And Answers Of Mine On Database Administrator’s Stack Exchange About SQL Server

Assurance, Assurance


Here’s a quick roundup on some of the more interesting Q&A I’ve taken part in on dba.stackexchange.com over the past few months.

Whew. That’s not even all of it. I should get out more.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.

How Microsoft Could Make Problems In Execution Plans Easier To Understand

Round Up


Execution plans have come a long way over the years, gradually adding more and more details as computing power becomes less of a hurdle to collecting metrics.

The thing is, it’s not always obvious where to look or dig deeper into a query plan to figure out where problems are.

Right now, there are some warnings:

  • At the root operator for a few different things
  • For memory consuming operators when they spill

But there are some other things in query plans that should be loud and clear, because they’re not going to be obvious to folks just getting started out reading query plans.

Non-SARGable Predicates:

These can cause a lot of issues:

  • Unnecessary scans
  • Poor cardinality estimates

It’s primarily caused by:

  • function(column) = something
  • column + column = something
  • column + value = something
  • value + column = something
  • column = @something or @something IS NULL
  • column like ‘%something’
  • column = case when …
  • value = case when column…
  • Mismatching data types (implicit conversion)

The thing is, it’s hard to see where this stuff happens in a plan, unless the plan is very small, or you’re looking directly at the query text, which is often truncated when pulled from a query plan. It would be nice if we got a warning of some sort on operators where this happened.

Predicates That Result In Scans

If you write a where clause, but don’t have an index with a key that matches that where clause, sometimes you’ll get a missing index request and sometimes you won’t. It’s a bit of a gamble of course.

For large tables, this can be painful, burn a lot of CPU, and result in a parallel plan where you could get by without one if you had a better index in place.

SQL Server Query Plan
bigscan4u

Of course, not every scan has a predicate: think joins without a where clause, or where only one table has a predicate against it. You don’t have much choice but to scan an index.

Eager Index Spools

Sometimes SQL Server wants an index so badly that it creates one on its own for you. When this happens on a large enough table, you can spend an awful lot of time waiting for it.

You know like when you put something in the microwave and you’re standing there staring at the timer and even though you set it for two minutes it seems to hang out at 1:30 forever? That’s what an Eager Index Spool is like. A Hungry Man Dinner that you microwave for an hour but still comes out with ice around the edges of your Salisbury Steak.

SQL Server Query Plan
community

Okay, I stretched that one a bit. But here’s the thing: If SQL Server is gonna spend all that time creating a temporary index for you, it should tell you. Maybe a missing index request, maybe a warning on the spool itself. Just… anything that would help alert more casual execution plan observers to the fact that an index might not be the worst idea, here.

Why Indexes Weren’t Used

I know you. You create indexes all the time, then for some strange reason your queries don’t use them, or stop using them.

When SQL Server optimizes a query, part of the flow chart is a pit stop called index matching. At this point, SQL Server looks at available indexes and then chooses to use or not use them based on various pieces of feedback.

Sometimes it’s obvious why an index wasn’t used, like if it only covers a portion of the query, or if the key columns weren’t in the best order. Other times, it’s really unclear.

It would be nice if we had reasons for that available, even if it’s only in actual plans.

Louder Warnings For Deeper Problems

Right now, SQL Server buries some information that can be really important to why a query didn’t perform well:

  • When estimated and actual rows or executions are way off
  • When something forces a query to run serially
  • When operators execute more than once (including rebinds and rewinds)
  • When rows are badly skewed across parallel threads

The thing is, like a lot of these other items on this list, it takes real digging to figure out if any of them apply to you, and if they’re why your query slowed down. They just need some basic visual indicators to draw attention to them at the right times.

Different Per-Operator Details

When you look at each individual operator in an actual execution plan, you get sort of a confusing story:

  • Estimated cost
  • Wall clock time
  • Actual rows
  • Estimated rows
  • Percent of actual to estimated rows

I’d throw out some of that, and show:

  • CPU time
  • Wall clock time
  • Actual Rows
  • Actual Executions
  • Percent of actual to estimated

It would also be nice to have per-operator wait stats at this juncture, since we’d need to know why there’s a discrepancy between CPU and wall clock time, e.g. because of blocking or waiting on some other resource.

While we’re talking about all this, it might be helpful to consider the direction plans show their work. Right to left for data and left to right for logic are… fine. I guess. But up and down might make more sense. A lot of folks I know have a tough time understanding when things happen in horizontal execution plans, where vertical plans would be far more clear.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.

Different Ways To Parameterize Queries In SQL Server

The Importance Of Being Parameterized


Parameterization has many benefits for queries.

But first, let’s cover two things that aren’t exactly parameters!

  1. Local Variables
  2. Unsafe Dynamic SQL

Much more details at the post, but local variables are any variable that you declare inside a code block, e.g.

DECLARE
    @a_local_variable some_data_type;

And unsafe dynamic SQL is when parameters or local variables are concatenated into a string like so:

@sql += N'AND H.user_name = ''' + @injectable + ''';';

Note the series of single quotes and + operators (though the same would happen if you used the CONCAT function), and that square brackets alone won’t save you.

Now let’s talk about actual parameterization.

The same concept applies to ORM queries, but I can’t write that kind of code so go to this post to learn more about that.

Stored Procedures


The most obvious way is to use a stored procedure.

CREATE OR ALTER PROCEDURE
    dbo.Obvious
(
    @ParameterOne int
)
AS
BEGIN
SET NOCOUNT, XACT_ABORT ON;

    SELECT
        records = COUNT_BIG(*)
    FROM dbo.Users AS u
    WHERE u.Id = @ParameterOne;

END;

There are millions of upsides to stored procedures, but they can get out of hand quickly.

Also, the longer they get, the harder it can become to troubleshoot individual portions for performance or logical issues.

Developers without a lot of SQL experience can make a ton of mistakes with them, but don’t worry: young, good-looking consultants are standing by to take your call.

Inline Table Valued Functions


There are other kinds of functions in SQL Server, but these are far and away the least-full of performance surprises.

CREATE OR ALTER FUNCTION
    dbo.TheOnlyGoodKindOfFunction
(
    @ParameterOne int
)
RETURNS table
AS
RETURN

    SELECT
        records = COUNT_BIG(*)
    FROM dbo.Users AS u
    WHERE u.Id = @ParameterOne;
GO

Both scalar and multi-statement types of functions can cause lots of issues, and should generally be avoided when possible.

Inline table valued functions are only as bad as the query you put in them, but don’t worry: young, good-looking consultants are standing by to take your call.

Dynamic SQL


Dynamic SQL gets a bad rap from people who have:

  1. No idea what they’re talking about
  2. All the wrong kinds of experience with it
DECLARE
    @sql nvarchar(MAX) = N'',
    @ParameterOne int;

SELECT
    @sql += N'
    SELECT
        records = COUNT_BIG(*)
    FROM dbo.Users AS u
    WHERE u.Id = @ParameterOne;	
    ';

EXEC sys.sp_executesql
    @sql,
  N'@ParameterOne int',
    @ParameterOne;

This kind of dynamic SQL is just as safe and reusable as stored procedures, but far less flexible. It’s not that you can’t cram a bunch of statements and routines into it, it’s just not advisable to get overly complicated in here.

Note that even though we declared @ParameterOne as a local variable, we pass it to the dynamic SQL block as a parameter, which makes it behave correctly. This is also true if we were to pass it to another stored procedure.

Dynamic SQL is only as bad as the query you put in it, but don’t worry: young, good-looking consultants are standing by to take your call.

Forced Parameterization


Forced parameterization is a great setting. It’s unfortunate that everything thinks they want to turn on optimize for adhoc workloads, which is a pretty useless setting.

You can turn it on like so:

ALTER DATABASE [YourDatabase] SET PARAMETERIZATION FORCED;

Forced parameterization will take queries with literal values and replace them with parameters to promote plan reuse. It does have some limitations, but it’s usually a quick fix to constant-compiling and plan cache flushing from unparameterized queries.

Deciding whether or not to turn on this feature can be tough if you’re not sure what problem you’re trying to solve, but don’t worry: young, good-looking consultants are standing by to take your call.

Other


SQL Server may attempt simple parameterization in some cases, but this is not a guaranteed or reliable way to get the majority of the queries in your workload parameterized.

In general, the brunt of the work falls on you to properly parameterize things. Parameters are lovely things, which can even be output and shared between code blocks. Right now, views don’t accept parameters as part of their definitions, so they won’t help you here.

Figuring out the best thing to use and when to use it can be tough, but don’t worry: young, good-looking consultants are standing by to take your call.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.

Some Notes on sp_prepare And SQL Server 2022

OldBad


I admit that sp_prepare is an odd bird, and thankfully one that isn’t used a ton. I still run into applications that are unfortunate enough to have been written by people who hate bloggers and continue to use it, though, so here goes.

When you use sp_prepare, parameterized queries behave differently from normal: the parameters don’t get histogram cardinality estimates, they get density vector cardinality estimates.

Here’s a quick demo to show you that in action:

CREATE INDEX 
    p 
ON dbo.Posts
    (ParentId) 
WITH
(
    SORT_IN_TEMPDB = ON, 
    DATA_COMPRESSION = PAGE
);

DECLARE 
    @handle int = 
        NULL,
    @parameters nvarchar(MAX) = 
        N'@ParentId int',
    @sql nvarchar(MAX) = 
        N'
        SELECT
            c = COUNT_BIG(*)
        FROM dbo.Posts AS p
        WHERE p.ParentId = @ParentId;
        ';

EXEC sys.sp_prepare 
    @handle OUTPUT,
    @parameters,
    @sql;

EXEC sys.sp_execute 
    @handle, 
    184618;
 
EXEC sys.sp_execute 
    @handle, 
    0;
 
EXEC sys.sp_unprepare 
    @handle;

OldPlan


The plans for the two executions have the same poor cardinality estimate. In this case, since we have an ideal index and there’s no real complexity, there’s no performance issue.

But you can probably guess (at least for the second query) how being off by 201,694,000% might cause issues in queries that ask a bit more of the optimizer.

The point here is that both queries get the same incorrect estimate of 3 rows. If you add a recompile hint, or execute the same code using sp_executesql, the first query will get a histogram cardinality estimate, and the second query will reuse it.

SQL Server Query Plan
one up

Given the historical behavior of sp_prepare, I was a little surprised that the Parameter Sensitive Plan (PSP) optimization available in SQL Server 2022 kicked in.

NewDifferent


If we change the database compatibility level to 160, the plans change a bit.

ALTER DATABASE StackOverflow2013 SET COMPATIBILITY_LEVEL = 160;

Now we see two different plans without a recompilation, as well as the plan per value option text at the end of the queries, indicating the PSP optimization kicked in.

SQL Server Query Plan
two up

The differences here are fairly obvious, but…

  • Each plan gets accurate cardinality
  • The second plan goes parallel to make processing ~6 million rows faster
  • Different aggregates more suited to the amount of data in play are chosen (the hash match aggregate is eligible for batch mode)

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.

How To Convert Binary And Varbinary Strings In SQL Server

Noticeable


This is a short post that I wanted to write on the heels of doing a bunch of work in sp_QuickieStore.

Many times, pulling data out of odd structures like XML or JSON can lead to difficulty in correctly typing each output element. I run into this commonly with query plan XML, of course. You may run into it elsewhere.

The main issue is that I often need to compare what comes out of those odd data structures to data stored more properly in other system views. For example:

  • Query Hash: Binary 8
  • Query Plan Hash: Binary 8
  • SQL Handle: Varbinary 64
  • Plan Handle: Varbinary 64

There’s some shenanigans you can use around big ints, but I’ve run into a lot of bugs with that. I don’t want to talk about it.

Nutty


As an example, this won’t match:

SELECT
    c = 
        CASE
            WHEN '0x1AB614B461F4D769' = 0x1AB614B461F4D769
            THEN 1
            ELSE 0
        END;

The string does not implicitly convert to the binary 8 value. The same is true when you use varbinary values.

You might think that just converting the string to binary 8 would be enough, but no! This will still return a zero.

SELECT
    c = 
        CASE
            WHEN CONVERT(binary(8), '0x1AB614B461F4D769') = 0x1AB614B461F4D769
            THEN 1
            ELSE 0
        END;

In this case (ha ha ha), you need to use the additional culture parameter to make this work.

Objective


Here’s why:

SELECT
    no =
        CONVERT(binary(8), '0x1AB614B461F4D769'),
    yes = CONVERT(binary(8), '0x1AB614B461F4D769', 1);
no	                yes
0x3078314142363134	0x1AB614B461F4D769

The same is true with varbinary, too:

SELECT
    no =
        CONVERT(varbinary(64), '0x0900F46AC89E66DF744C8A0AD4FD3D3306B90000000000000000000000000000000000000000000000000000'),
    yes = CONVERT(varbinary(64), '0x0900F46AC89E66DF744C8A0AD4FD3D3306B90000000000000000000000000000000000000000000000000000', 1);
no
0x30783039303046343641433839453636444637343443384130414434464433443333303642393030303030303030303030303030303030303030303030303030	

yes
0x0900F46AC89E66DF744C8A0AD4FD3D3306B90000000000000000000000000000000000000000000000000000

The real answer here is to not rely on conversions, implicit or otherwise, when comparing data.

But, if you ever find yourself having to deal with some wonky binary data, this is one way to get yourself out of a scrape.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Views vs Indexed Views In SQL Server

Award Winning


Imagine you have a rather complicated query that you want to abstract into a simple query for your less-than-with-it end users.

A view is probably a pretty good way of doing that, since you can shrink your preposterously-constructed tour through every table in the schema down to a simple select-from-one-object.

The problem is that now everyone expects it to perform well throughout all time, under any circumstances, come what may. It’s sort of like how your parents expect dinner to be $20 and tips to be 20% regardless of where they go or what they order.

  • Lobster? $5.
  • Steak? $5.
  • Bottle of wine? $5.
  • Any dessert you can imagine? $5.
  • Tip? Gosh, mister, another $5?

I sincerely apologize to anyone who continues to live in, or who moved to Europe to avoid tipping.

If you’d like some roommates, I have some parents you’d get along with.

Viewfinder


Creating a view in SQL Server doesn’t do anything special for you, outside of not making people remember your [reference to joke above] query.

You can put all manner of garbage in your view, make it reference another half dozen views full of garbage, and expect sparkling clean query performance every time.

Guess what happens?

Reality.

When you use views, the only value is abstraction. You still need to be concerned with how the query is written, and if the query has decent indexes to support it. In other words, you can’t just write a view and expect the optimizer to do anything special with it.

SQL Server doesn’t cache results, it only caches raw data. If you want the results of a view to be saved, you need to index it.

Take these two dummy queries, one against a created view, and the other an ad hoc query identical to what’s in the view:

CREATE OR ALTER VIEW
    dbo.just_a_query
WITH SCHEMABINDING
AS
SELECT
    p.OwnerUserId,
    TotalScore = 
        ISNULL
        (
            SUM(p.Score), 
            0
        ),
    TotalPosts = 
        COUNT_BIG(*)
FROM dbo.Posts AS p
WHERE EXISTS
      (
          SELECT
              1/0
          FROM dbo.Votes AS v
          WHERE 
              v.PostId = p.Id      
      )
GROUP BY
    p.OwnerUserId;
GO 

SELECT
    p.OwnerUserId,
    TotalScore = 
        ISNULL
        (
            SUM(p.Score), 
            0
        ),
    TotalPosts = 
        COUNT_BIG(*)
FROM dbo.Posts AS p
WHERE EXISTS
      (
          SELECT
              1/0
          FROM dbo.Votes AS v
          WHERE 
              v.PostId = p.Id      
      )
AND 
    p.OwnerUserId = 22656
GROUP BY
    p.OwnerUserId;
GO 

SELECT
    jaq.*
FROM dbo.just_a_query AS jaq
WHERE 
    jaq.OwnerUserId = 22656;
GO 

The plans are identical, and identically bad. Why? Because I didn’t try very hard, and there’s no good indexes for them.

Remember when I said that’s important?

SQL Server Query Plan
avenues lined with trees

Keep in mind this is a query with some batch mode involved, so it could be a lot worse. But both instances complete within a second or so of each other.

So much for view performance.

Maintainer


The rules around indexed views are pretty strict, and the use cases are fairly narrow. I do find them quite useful on SQL Server Standard Edition where batch mode is terribly hobbled.

The horrible thing is that indexed views are so strict in SQL Server that we can’t even create one on the view in question. That really sucks. We get this error.

CREATE UNIQUE CLUSTERED INDEX 
    cuqadoodledoo
    ON dbo.not_just_a_query
(
    OwnerUserId
)
WITH
(
    SORT_IN_TEMPDB = ON,
    DATA_COMPRESSION = PAGE
);

Msg 10127, Level 16, State 1, Line 95

Cannot create index on view “StackOverflow2013.dbo.not_just_a_query” because it contains one or more subqueries. 

Consider changing the view to use only joins instead of subqueries. Alternatively, consider not indexing this view.

Alternatively, go screw yourself. Allowing joins but not exists is somewhat baffling, since they’re quite different in that joins allow for multiple matches but exists does not. We’d have to do a lot of fancy grouping footwork to get equivalent results with a join, since distinct isn’t allowed in an indexed view in SQL Server either.

We could also pull the exists out of the view, add the Id column to the select list, group by that and OwnerUserId, index both of them, and… yeah nah.

I have no idea who’s in charge of indexed views in the product at this point, but a sufficiently lubricated republic would likely come calling with tar and feathers in the face of this injustice.

This is basic query syntax. It’s not like uh… min, max, sum, avg, except, intersect, union, union all, cross apply, outer apply, outer joins, or um, hey, is it too late for me to change careers?

The Pain In Pain Falls Painly On The Pain


You may have ended up here looking to learn all the minute differences between views and indexed views in SQL Server.

You may be disappointed in reading this post, but I can assure you that you’re not nearly as disappointed in this post as I am with indexed views in SQL Server.

They’re like one of those articles about flying cars where you read the headline and you’re like “woah, I’m living in the future”, but then three paragraphs in you find out the cars don’t really fly or drive and they might actually just be igloos that are only big enough for an Italian Greyhound or a paper plane that the author’s kid glued wheels to.

If you actually have a use case for indexed views, you’ll have to be really careful about making sure their maintenance doesn’t kill performance.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.

Why MAX Data Types Are Usually A Bad Choice For SQL Server Columns

Easy Rider


When you’re trying to figure out how to store string data, it often seems easiest to just choose an extra long — even MAX — data type to avoid future truncation errors.

Even if you’re storing strings with a known, absolute length, developers may choose to not enforce that in the application, either via a drop down menu or other form of validation.

And so to avoid errors when users try to put their oh-so-important data in their oh-so-expensive database, we get columns added to tables that can fit a galaxy of data in them, when we only need to store an ashtray worth of data.

While getting data into those columns is relatively easy — most application inserts are single rows — getting data out of those columns can be quite painful, whether it’s searching or just presenting in the select portion of a query.

Let’s look at a couple simple examples of how that happens.

Search Engine


Let’s take a query like this one:

SELECT TOP (20)
    p.Id,
    p.Title,
    p.Body
FROM dbo.Posts AS p
WHERE p.Body LIKE N'SQL Server%';

The Body column in the Posts table is nvarchar and MAX, but the same thing would happen with a varchar column.

If you need a simple way to remember how to pronounce those data types, just remember to Pahk yah (n)vahcah in Hahvahd Yahd.

Moving on – while much has been written about leading wildcard searches (that start with a % sign), we don’t do that here. Also, in general, using charindex or patindex instead of leading wildcard like searching won’t buy you all that much (if anything at all).

Anyway, since you can’t put a MAX datatype in the key of an index, part of the problem with them is that there’s no way to efficiently organize the data for searching. Included columns don’t do that, and so we end up with a query plan that looks some-such like this:

SQL Server Query Plan
ouch in effect

We spend ~13.5 seconds scanning the clustered index on the Posts table, then about two minutes and twenty seven seconds (minus the original 13.5) applying the predicate looking for posts that start with SQL Server.

That’s a pretty long time to track down and return 19 rows.

Let’s change the query a little bit and look at how else big string columns can cause problems.

Memory Bank


Rather than search on the Body column, let’s select some values from it ordered by the Score column.

Since Score isn’t indexed, it’s not sorted in the database. That means SQL Server needs to ask for memory to put the data we’re selecting in the order we’re asking for.

SELECT TOP (200)
    p.Body
FROM dbo.Posts AS p
ORDER BY p.Score DESC;

The plan for this query asks for a 5GB memory grant:

SQL Server Query Pla
quietly

I know what you’re thinking: the Body column probably has some pretty big data in it, and you’re right. In this case, it’s the right data type to use.

The bad news is that SQL Server will makes the same memory grant estimation based on the size of the data we need to sort whether or not it’s a good choice.

I talk more about that in this Q&A on Stack Exchange.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.

Stop Making SQL Server Weird, Microsoft.

Ledgers And Tridents And Toads, Fido-hmy


With every release of SQL Server, Microsoft adds in a bunch of tadpole features. It’s up to those tadpoles to survive long enough to turn into beautiful uh… Frogs Toads.

The way they do that is by having enough paying customers use them — usually prodded by #MVPBuzz chasers — to get continued support and development.

All of this is subject to internal whims (like Microsoft rebranding, renaming, or rebasing a feature, take Mirroring >> Availability Groups), and external customer fancies (Tableau vs PowerBI, etc).

I feel bad for folks who latch onto these things with any sort of seriousness, like Big Data Clusters, or any of the dozens of other features that died before they could bump their butts on a log.

One marvels at the sheer number of “you better get on board” style session titles that have been molded around features where absolutely no one got on board.

Even Kubernetes (which I know isn’t a SQL Server feature) is sort of like K-Pop: Apparently millions of people like it, but I’ve never met a single person who actually uses it.

Maybe it’s more like a bot net. K-Bot? Not sure, here. I’m bad at creative writing.

They Don’t Like You Anyway


Microsoft has this weird perceived popularity issue.  Perhaps it’s the diminishing self-esteem that comes with age, with SQL Server turning 30-something and all. Microsoft keeps trying to make SQL Server appeal to swaths of people who will never like it, because it’s not the shiny new thing on the block.

They won’t love you like I do.

Adding in every passing fad to the product to try to stay young and hip does a great disservice to the product as a whole, which has many addressable flaws. Microsoft dedicates countless development cycles to goofy memes that will wither on the vine shortly after the first version.

Imagine if cars still came with cassette players, but not ones that can do anything cool like play both sides without flipping the tape over, or being able to fast forward or rewind to the start or end of a track.

You’d basically have Spatial Data And The Graph Nodes in your car trying to play Garage Days Re-Re-Revisited.

Duets


I do wish the Microsoft bean counters would dance with who they brung a bit more. You know, people who need a relational database with as few bugs and preventable performance issues as possible, with useful development features, and a coherent toolset?

Clearing the minefield of pitfalls that developers fall int0 (hello, MERGE!) would allow less effort to be spent on chasing down oddball problems and more time spent developing quality features.

It’s sort of like if I had a keyboard that fixed all my typos immediately, I’d have more time to write quality blog posts that don’t need several editorial passes.

Maybe someday.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.