How To Convert Binary And Varbinary Strings In SQL Server

Noticeable


This is a short post that I wanted to write on the heels of doing a bunch of work in sp_QuickieStore.

Many times, pulling data out of odd structures like XML or JSON can lead to difficulty in correctly typing each output element. I run into this commonly with query plan XML, of course. You may run into it elsewhere.

The main issue is that I often need to compare what comes out of those odd data structures to data stored more properly in other system views. For example:

  • Query Hash: Binary 8
  • Query Plan Hash: Binary 8
  • SQL Handle: Varbinary 64
  • Plan Handle: Varbinary 64

There’s some shenanigans you can use around big ints, but I’ve run into a lot of bugs with that. I don’t want to talk about it.

Nutty


As an example, this won’t match:

SELECT
    c = 
        CASE
            WHEN '0x1AB614B461F4D769' = 0x1AB614B461F4D769
            THEN 1
            ELSE 0
        END;

The string does not implicitly convert to the binary 8 value. The same is true when you use varbinary values.

You might think that just converting the string to binary 8 would be enough, but no! This will still return a zero.

SELECT
    c = 
        CASE
            WHEN CONVERT(binary(8), '0x1AB614B461F4D769') = 0x1AB614B461F4D769
            THEN 1
            ELSE 0
        END;

In this case (ha ha ha), you need to use the additional culture parameter to make this work.

Objective


Here’s why:

SELECT
    no =
        CONVERT(binary(8), '0x1AB614B461F4D769'),
    yes = CONVERT(binary(8), '0x1AB614B461F4D769', 1);
no	                yes
0x3078314142363134	0x1AB614B461F4D769

The same is true with varbinary, too:

SELECT
    no =
        CONVERT(varbinary(64), '0x0900F46AC89E66DF744C8A0AD4FD3D3306B90000000000000000000000000000000000000000000000000000'),
    yes = CONVERT(varbinary(64), '0x0900F46AC89E66DF744C8A0AD4FD3D3306B90000000000000000000000000000000000000000000000000000', 1);
no
0x30783039303046343641433839453636444637343443384130414434464433443333303642393030303030303030303030303030303030303030303030303030	

yes
0x0900F46AC89E66DF744C8A0AD4FD3D3306B90000000000000000000000000000000000000000000000000000

The real answer here is to not rely on conversions, implicit or otherwise, when comparing data.

But, if you ever find yourself having to deal with some wonky binary data, this is one way to get yourself out of a scrape.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Views vs Indexed Views In SQL Server

Award Winning


Imagine you have a rather complicated query that you want to abstract into a simple query for your less-than-with-it end users.

A view is probably a pretty good way of doing that, since you can shrink your preposterously-constructed tour through every table in the schema down to a simple select-from-one-object.

The problem is that now everyone expects it to perform well throughout all time, under any circumstances, come what may. It’s sort of like how your parents expect dinner to be $20 and tips to be 20% regardless of where they go or what they order.

  • Lobster? $5.
  • Steak? $5.
  • Bottle of wine? $5.
  • Any dessert you can imagine? $5.
  • Tip? Gosh, mister, another $5?

I sincerely apologize to anyone who continues to live in, or who moved to Europe to avoid tipping.

If you’d like some roommates, I have some parents you’d get along with.

Viewfinder


Creating a view in SQL Server doesn’t do anything special for you, outside of not making people remember your [reference to joke above] query.

You can put all manner of garbage in your view, make it reference another half dozen views full of garbage, and expect sparkling clean query performance every time.

Guess what happens?

Reality.

When you use views, the only value is abstraction. You still need to be concerned with how the query is written, and if the query has decent indexes to support it. In other words, you can’t just write a view and expect the optimizer to do anything special with it.

SQL Server doesn’t cache results, it only caches raw data. If you want the results of a view to be saved, you need to index it.

Take these two dummy queries, one against a created view, and the other an ad hoc query identical to what’s in the view:

CREATE OR ALTER VIEW
    dbo.just_a_query
WITH SCHEMABINDING
AS
SELECT
    p.OwnerUserId,
    TotalScore = 
        ISNULL
        (
            SUM(p.Score), 
            0
        ),
    TotalPosts = 
        COUNT_BIG(*)
FROM dbo.Posts AS p
WHERE EXISTS
      (
          SELECT
              1/0
          FROM dbo.Votes AS v
          WHERE 
              v.PostId = p.Id      
      )
GROUP BY
    p.OwnerUserId;
GO 

SELECT
    p.OwnerUserId,
    TotalScore = 
        ISNULL
        (
            SUM(p.Score), 
            0
        ),
    TotalPosts = 
        COUNT_BIG(*)
FROM dbo.Posts AS p
WHERE EXISTS
      (
          SELECT
              1/0
          FROM dbo.Votes AS v
          WHERE 
              v.PostId = p.Id      
      )
AND 
    p.OwnerUserId = 22656
GROUP BY
    p.OwnerUserId;
GO 

SELECT
    jaq.*
FROM dbo.just_a_query AS jaq
WHERE 
    jaq.OwnerUserId = 22656;
GO 

The plans are identical, and identically bad. Why? Because I didn’t try very hard, and there’s no good indexes for them.

Remember when I said that’s important?

SQL Server Query Plan
avenues lined with trees

Keep in mind this is a query with some batch mode involved, so it could be a lot worse. But both instances complete within a second or so of each other.

So much for view performance.

Maintainer


The rules around indexed views are pretty strict, and the use cases are fairly narrow. I do find them quite useful on SQL Server Standard Edition where batch mode is terribly hobbled.

The horrible thing is that indexed views are so strict in SQL Server that we can’t even create one on the view in question. That really sucks. We get this error.

CREATE UNIQUE CLUSTERED INDEX 
    cuqadoodledoo
    ON dbo.not_just_a_query
(
    OwnerUserId
)
WITH
(
    SORT_IN_TEMPDB = ON,
    DATA_COMPRESSION = PAGE
);

Msg 10127, Level 16, State 1, Line 95

Cannot create index on view “StackOverflow2013.dbo.not_just_a_query” because it contains one or more subqueries. 

Consider changing the view to use only joins instead of subqueries. Alternatively, consider not indexing this view.

Alternatively, go screw yourself. Allowing joins but not exists is somewhat baffling, since they’re quite different in that joins allow for multiple matches but exists does not. We’d have to do a lot of fancy grouping footwork to get equivalent results with a join, since distinct isn’t allowed in an indexed view in SQL Server either.

We could also pull the exists out of the view, add the Id column to the select list, group by that and OwnerUserId, index both of them, and… yeah nah.

I have no idea who’s in charge of indexed views in the product at this point, but a sufficiently lubricated republic would likely come calling with tar and feathers in the face of this injustice.

This is basic query syntax. It’s not like uh… min, max, sum, avg, except, intersect, union, union all, cross apply, outer apply, outer joins, or um, hey, is it too late for me to change careers?

The Pain In Pain Falls Painly On The Pain


You may have ended up here looking to learn all the minute differences between views and indexed views in SQL Server.

You may be disappointed in reading this post, but I can assure you that you’re not nearly as disappointed in this post as I am with indexed views in SQL Server.

They’re like one of those articles about flying cars where you read the headline and you’re like “woah, I’m living in the future”, but then three paragraphs in you find out the cars don’t really fly or drive and they might actually just be igloos that are only big enough for an Italian Greyhound or a paper plane that the author’s kid glued wheels to.

If you actually have a use case for indexed views, you’ll have to be really careful about making sure their maintenance doesn’t kill performance.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.

Why MAX Data Types Are Usually A Bad Choice For SQL Server Columns

Easy Rider


When you’re trying to figure out how to store string data, it often seems easiest to just choose an extra long — even MAX — data type to avoid future truncation errors.

Even if you’re storing strings with a known, absolute length, developers may choose to not enforce that in the application, either via a drop down menu or other form of validation.

And so to avoid errors when users try to put their oh-so-important data in their oh-so-expensive database, we get columns added to tables that can fit a galaxy of data in them, when we only need to store an ashtray worth of data.

While getting data into those columns is relatively easy — most application inserts are single rows — getting data out of those columns can be quite painful, whether it’s searching or just presenting in the select portion of a query.

Let’s look at a couple simple examples of how that happens.

Search Engine


Let’s take a query like this one:

SELECT TOP (20)
    p.Id,
    p.Title,
    p.Body
FROM dbo.Posts AS p
WHERE p.Body LIKE N'SQL Server%';

The Body column in the Posts table is nvarchar and MAX, but the same thing would happen with a varchar column.

If you need a simple way to remember how to pronounce those data types, just remember to Pahk yah (n)vahcah in Hahvahd Yahd.

Moving on – while much has been written about leading wildcard searches (that start with a % sign), we don’t do that here. Also, in general, using charindex or patindex instead of leading wildcard like searching won’t buy you all that much (if anything at all).

Anyway, since you can’t put a MAX datatype in the key of an index, part of the problem with them is that there’s no way to efficiently organize the data for searching. Included columns don’t do that, and so we end up with a query plan that looks some-such like this:

SQL Server Query Plan
ouch in effect

We spend ~13.5 seconds scanning the clustered index on the Posts table, then about two minutes and twenty seven seconds (minus the original 13.5) applying the predicate looking for posts that start with SQL Server.

That’s a pretty long time to track down and return 19 rows.

Let’s change the query a little bit and look at how else big string columns can cause problems.

Memory Bank


Rather than search on the Body column, let’s select some values from it ordered by the Score column.

Since Score isn’t indexed, it’s not sorted in the database. That means SQL Server needs to ask for memory to put the data we’re selecting in the order we’re asking for.

SELECT TOP (200)
    p.Body
FROM dbo.Posts AS p
ORDER BY p.Score DESC;

The plan for this query asks for a 5GB memory grant:

SQL Server Query Pla
quietly

I know what you’re thinking: the Body column probably has some pretty big data in it, and you’re right. In this case, it’s the right data type to use.

The bad news is that SQL Server will makes the same memory grant estimation based on the size of the data we need to sort whether or not it’s a good choice.

I talk more about that in this Q&A on Stack Exchange.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.

Stop Making SQL Server Weird, Microsoft.

Ledgers And Tridents And Toads, Fido-hmy


With every release of SQL Server, Microsoft adds in a bunch of tadpole features. It’s up to those tadpoles to survive long enough to turn into beautiful uh… Frogs Toads.

The way they do that is by having enough paying customers use them — usually prodded by #MVPBuzz chasers — to get continued support and development.

All of this is subject to internal whims (like Microsoft rebranding, renaming, or rebasing a feature, take Mirroring >> Availability Groups), and external customer fancies (Tableau vs PowerBI, etc).

I feel bad for folks who latch onto these things with any sort of seriousness, like Big Data Clusters, or any of the dozens of other features that died before they could bump their butts on a log.

One marvels at the sheer number of “you better get on board” style session titles that have been molded around features where absolutely no one got on board.

Even Kubernetes (which I know isn’t a SQL Server feature) is sort of like K-Pop: Apparently millions of people like it, but I’ve never met a single person who actually uses it.

Maybe it’s more like a bot net. K-Bot? Not sure, here. I’m bad at creative writing.

They Don’t Like You Anyway


Microsoft has this weird perceived popularity issue.  Perhaps it’s the diminishing self-esteem that comes with age, with SQL Server turning 30-something and all. Microsoft keeps trying to make SQL Server appeal to swaths of people who will never like it, because it’s not the shiny new thing on the block.

They won’t love you like I do.

Adding in every passing fad to the product to try to stay young and hip does a great disservice to the product as a whole, which has many addressable flaws. Microsoft dedicates countless development cycles to goofy memes that will wither on the vine shortly after the first version.

Imagine if cars still came with cassette players, but not ones that can do anything cool like play both sides without flipping the tape over, or being able to fast forward or rewind to the start or end of a track.

You’d basically have Spatial Data And The Graph Nodes in your car trying to play Garage Days Re-Re-Revisited.

Duets


I do wish the Microsoft bean counters would dance with who they brung a bit more. You know, people who need a relational database with as few bugs and preventable performance issues as possible, with useful development features, and a coherent toolset?

Clearing the minefield of pitfalls that developers fall int0 (hello, MERGE!) would allow less effort to be spent on chasing down oddball problems and more time spent developing quality features.

It’s sort of like if I had a keyboard that fixed all my typos immediately, I’d have more time to write quality blog posts that don’t need several editorial passes.

Maybe someday.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.

SQL Server 2022 CTP 2.1 Improvements To Parameter Sensitive Plan Optimization

Hop On Your Good Foot


Several weeks back, I blogged about a missed opportunity with the new parameter sensitive plan feature. At the time, I thought that there was indeed sufficient skewness available to trigger the additional plan variants, and apparently some nice folks at Microsoft agreed.

If we step back through the old demo, we’ll get different results.

EXEC dbo.VoteSniffing 
    @VoteTypeId = 4;

EXEC dbo.VoteSniffing 
    @VoteTypeId = 2;

I’m not gonna go into a lot of depth here, because I’m writing this while traveling, but we get the (mostly) desired outcome of two different plans being generated. Each plan is more suitable to the amount of data that the query has to process.

SQL Server Query Plan
home improvement

I’d consider this a success, even if it adds more queries to the unidentifiable corpse-pile.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.

SQL Server 2022 Introduces IS [NOT] DISTINCT FROM; World Underwhelmed

Back In Time


Feeling quite old while remembering the first time I ever read this, and this note at the end:

This is the SQL language IS DISTINCT FROM feature —implemented in the query processor, but not yet available in the T-SQL language.

Groovy. Since at least 2011, this has been in the Query Processor, and here in 2022 we finally get the linguistic support.

The thing is, it’s pretty underwhelming, and I’m going to show you why I think so.

First, in the docs for SQL Server, all the examples use a single literal value, like so:

SELECT * FROM #SampleTempTable WHERE id IS DISTINCT FROM 17;

I went looking for other docs examples from vendors who have had the syntax around for 10+ years, and there wasn’t anything all that much more interesting.

Mostly case expressions and whatnot.

Big deal.

Alignment


First, if I try to run either of these queries, I’ll get an error after about 6 seconds.

SELECT
    c = COUNT_BIG(*)
FROM dbo.Comments AS c
WHERE c.UserId IS DISTINCT FROM 
               (
                   SELECT
                       v.UserId
                   FROM dbo.Votes AS v
               );

SELECT
    c = COUNT_BIG(*)
FROM dbo.Comments AS c
WHERE c.UserId IS NOT DISTINCT FROM 
               (
                   SELECT
                       v.UserId
                   FROM dbo.Votes AS v
               );

Why does it take 6 seconds to get an error? Because a few parts of the query plan have to do some work, and then finally:

Msg 512, Level 16, State 1, Line 1
Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.

Sort of like how sometimes you try to SUM a column and after a while you get an error about arithmetic overflow.

This is a bit annoying, because that means we need a way to return a single value to evaluate.

So Yeah…


We can’t even rewrite the queries like this to get around the error, but I do want to show you the plans.

This is why we have to wait several seconds to get an error (unless you change it to IS DISTINCT FROM ALL/ANY):

SELECT
    c = COUNT_BIG(*)
FROM dbo.Comments AS c
WHERE c.UserId IS DISTINCT FROM 
               (
                   SELECT
                       v.UserId
                   FROM dbo.Votes AS v
                   WHERE v.UserId = c.UserId
               );

SELECT
    c = COUNT_BIG(*)
FROM dbo.Comments AS c
WHERE c.UserId IS NOT DISTINCT FROM 
               (
                   SELECT
                       v.UserId
                   FROM dbo.Votes AS v
                   WHERE v.UserId = c.UserId
               );

Adding a where clause inside the subquery doesn’t help.

But these query plans are total motherchucking disasters, anyway. We’ll get into indexing later, but right now they both have the same shape and operators, though slightly different semantics to deal with is/is not distinct.

SQL Server Query Plan
is not cool

Both plans run single threaded, and using Nested Loops as the physical join type, which stinks because we’re putting together two pretty big tables.

Not to mention that Eager Index Spool. What a filth.

Adding Indexes


We need these indexes to make things go any faster. Before we do anything else, let’s create these so we’re not just sitting around thumb-twiddling.

CREATE INDEX
    c
ON dbo.Comments
    (UserId)
WITH
(
    SORT_IN_TEMPDB = ON,
    DATA_COMPRESSION = PAGE
);

CREATE INDEX
    v
ON dbo.Votes
    (UserId)
WITH
(
    SORT_IN_TEMPDB = ON,
    DATA_COMPRESSION = PAGE
);

Thinking About It


Okay, so writing the query like we did up there isn’t going to get us anything. Perhaps my expectations are a bit too exotic.

Let’s try something a bit more austere:

SELECT
    c = COUNT_BIG(*)
FROM dbo.Comments AS c
JOIN dbo.Votes AS v
    ON c.UserId IS DISTINCT FROM v.UserId;

SELECT
    c = COUNT_BIG(*)
FROM dbo.Comments AS c
JOIN dbo.Votes AS v
    ON c.UserId IS NOT DISTINCT FROM v.UserId;

The first thing to be aware of here is that the IS DISTINCT FROM is an inequality predicate, so you’re stuck with Nested Loops as the physical join type:

SQL Server Query Plan
nightmare

I ran out of care-juice waiting for this to finish, so all you’re getting is an estimated plan. The lack of an equality predicate here means you don’t have Hash or Merge join as an option.

Following up on bad ideas, the IS NOT DISTINCT FROM is an equality predicate, but the plan chosen is a serial Merge Join variety, which drags on 14 seconds too long:

SQL Server Query Plan
change up

Drop An Index


If we simulate not having any useful indexes on one table or the other by hinting the clustered index, the performance outlook does not improve.

SELECT
    c = COUNT_BIG(*)
FROM dbo.Comments AS c WITH(INDEX = 1)
JOIN dbo.Votes AS v
    ON c.UserId IS NOT DISTINCT FROM v.UserId;

SELECT
    c = COUNT_BIG(*)
FROM dbo.Comments AS c
JOIN dbo.Votes AS v WITH (INDEX = 1)
    ON c.UserId IS NOT DISTINCT FROM v.UserId;

No useful parts of the first query happen in Batch Mode, but the second query is rescued by two hash aggregates happening in batch mode.

SQL Server Query Plan
ouching

An odd point to make on a blog focused on SQL Server performance tuning is that sometimes not having a useful index gets you a better plan.

Anyway, I’m going back to my vacation.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.

SQL Server 2022: Introduces the DATETRUNC Function, So You Can Truncate Your Dates And Stuff

Underscoring The Importance


When I first sat down to write about this, I made a funny mistake: I kept writing DATE_TRUNC over and over again.

In SQL Server it’s DATETRUNC.

Why? Because that’s the way it’s implemented in Postgres and DB2. Oracle, of course, just calls it TRUNC.

So, while it’s nice to have (what appears to be) the same behavior, it doesn’t exactly help to not have a 1:1 calling equivalent to other platforms.

I assume most of these additions to T-SQL are for cross-platform development and migrations.

Of course, Microsoft being so gosh darn late to this game means folks have probably been rolling-their-own versions of these functions for years.

If they went and called their system function DATE_TRUNC or even TRUNC, they might have some object naming issues to contend with.

Well, okay. But how does it work?

Childish Games


Here are some quick examples of how you call it.

SELECT TOP (10)
    u.DisplayName,
    year = 
        DATETRUNC(YEAR, u.LastAccessDate),
    quarter = 
        DATETRUNC(QUARTER, u.LastAccessDate),
    month = 
        DATETRUNC(MONTH, u.LastAccessDate),
    dayofyear = 
        DATETRUNC(DAYOFYEAR, u.LastAccessDate),
    day = 
        DATETRUNC(DAY, u.LastAccessDate),
    week = 
        DATETRUNC(WEEK, u.LastAccessDate),
    iso_week = 
        DATETRUNC(ISO_WEEK, u.LastAccessDate),
    hour = 
        DATETRUNC(HOUR, u.LastAccessDate),
    minute = 
        DATETRUNC(MINUTE, u.LastAccessDate),
    second = 
        DATETRUNC(SECOND, u.LastAccessDate),
    millisecond = 
        DATETRUNC(MILLISECOND, u.LastAccessDate),
    microsecond = 
        DATETRUNC(MICROSECOND, u.LastAccessDate) /*Doesn't work with datetime because there are no microseconds*/
FROM dbo.Users AS u;

And here are the results:

SQL Server Query Results
workin’

The thing to note here is that there’s no rounding logic involved. You just go to the start of whatever unit of time you choose. Of course, this doesn’t seem to do anything to the millisecond portion of DATETIME, because it’s not precise enough.

But for anyone out there who was hoping for a SOMONTH function to complement the EOMONTH function, you get this instead.

Works well enough!

But does it perform, Darling?

UnSARGable?


To make any test like this worthwhile, we need an index to make data searchable.

CREATE INDEX
    v
ON dbo.Votes
    (CreationDate)
WITH
(
    SORT_IN_TEMPDB = ON,
    DATA_COMPRESSION = PAGE
);

You know, because that’s what they do. To make searching faster. Hello.

So look, under these perfect circumstances, everything performs well. But we have to do a lot of typing.

SELECT
    c = COUNT_BIG(*)
FROM dbo.Votes AS v
WHERE v.CreationDate >= DATETRUNC(YEAR, CONVERT(datetime, '20130101 00:00:00.000'));

Note here that we’re working on a literal value, not a column value, and we have to tell the datetrunc function which type we want via the convert function so that we get a simple seek plan:

SQL Server Query Plan
grin

Without all that, we get a dynamic seek plan:

SELECT
    c = COUNT_BIG(*)
FROM dbo.Votes AS v
WHERE v.CreationDate >= DATETRUNC(YEAR, '20130101 00:00:00.000');

This has some… obvious performance issues compared to the above plan with correct data types.

SQL Server Query Plan
query pan

Query Pranks


Frequent blog readers will not be surprised that wrapping a table column in the new DATETRUNC function yields old performance problems:

SELECT
    c = COUNT_BIG(*)
FROM dbo.Votes AS v
WHERE DATETRUNC(YEAR, v.CreationDate) >= CONVERT(datetime, '20130101 00:00:00.000');

This is particularly annoying because we’re truncating the column to the beginning of the year, which should be searchable in the index since that’s the sort order of the data in the index.

SQL Server Query Plan
shined out

Like most functions, these are fine in the presentation layer, but terrible in the relational layer. There’s no warning about this performance degradation in the documentation, either at the example of using the function in a where clause, or in the final closing remarks.

But that’s par for the course with any of these built-in functions.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.

The Cost Of Complexity In SQL Server

Value Added


SQL Server can do a whole lot of things, aside from return the results of your queries. The thing is, the more of those you add into the mix the tougher it becomes to scale, and maintain reliable performance.

Of course, there are other design considerations that introduce the same kind of complexity:

  • Lots of tables in your schema leads to more joins
  • Lots of nested views lead to more choice for the optimizer
  • Lots of databases lead to cross-database joins, which can lead to tricky issues if they’re in different compat levels, etc.
  • Lots of functions and triggers that have entire books of business logic in them make everything go to hell

But beyond that, consider some more internal design choices:

  • Change Tracking
  • Change Data Capture
  • Auditing
  • Temporal Tables

Now you’re asking SQL Server to log a bunch of stuff every time you do something in your database. You have the action, and then writing that action down.

On top of that, you’re also opening yourself up to more queries because people are going to want to access and report on that data. This isn’t even touching all the additional internal monitoring that goes into SQL Server to track everything going on in there.

Observer overhead is a real thing.

Now let’s add in data protection and synchronization

  • Availability Groups
  • Replication (God forgive me)

I’m leaving Log Shipping out of the equation because it doesn’t really add much complexity on its own. You may add complexity do it by needing to copy files to strange places and choosing to report off shipped databases. I’m also leaving Failover Clusters out, because there’s one set of data that sits in one place. Only the SQL Server instance moves between nodes, unless you decide to span Failover Clusters and synchronize the data using your SAN or something. That stuff usually has a big enough price tag on it to dissuade most folks from tramping down that Path Of Misery©.

Both Availability Groups and Replication can make life miserable. I work with some very nice people who can’t create indexes unless they switch their Availability Groups to asynchronous mode first.

I also work with some very nice people who use Replication and don’t get much sleep at night. Me? I won’t touch the stuff.

The point, though, is that if you decide to have SQL Server start doing a whole lot of extra stuff every time you run a query, or make your queries really complicated to come up with an execution plan for, performance might suffer.

It’s up to you to make sure that you plan accordingly for adding in new features.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Even If SQL Server Table Variables Were Always In Memory, It Wouldn’t Make Them Better Than Temp Tables

Past Perfect


You know all those pesky developer myths that never go away?

  • CTEs and Views can’t use indexes: WRONG
  • Subqueries are slower than joins: WRONG
  • Temp tables are bad for performance: WRONG
  • Table Variables are always in memory: WRONG

Okay, that’s not a complete list, but a complete list would fill the internet to its very brim and cause all sorts of overflow errors.

My job as a performance tuning consultant is to teach folks when they’ve been lead astray.

To confuse things a bit further, Microsoft allows for in-memory @table variables, but you have do a lot of work to get set up to use them.

They’re not the ones you get out of the box.

Future Mediocre


Let’s say for the sake of argument that @table variables were always more “in-memory” than #temp tables (they’re not). They’d still have a lot of the same fundamental problems:

With all that stuff in mind: who cares if they’re only in memory? There are so many downsides for most query patterns, they could all happen in quantum space and performance would still stink.

I know, they have their uses:

But most of y’all aren’t using them that wisely.

And here’s the other thing: Most of y’all don’t have enough memory to begin with, and the memory you do have you misuse in a million ways. Why do you wanna make that worse?

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.

Some Requests For Authors Of SQL Server Books

Heal Thyself


I dislike criticism without introspection, so anything that I say here I’ll also be doing my best to apply to anything I produce.

But anyway, I’m writing this because I read a lot of SQL Server books, and there are some things about them are pretty frustrating.

This goes for both print and digital copies, though some advice is specific to print copies because I think a lot of folks still buy them.

Anyway, here are some thoughts I have. You can show this post to your editors if you want. Maybe it’s their fault, not yours.

I have no idea.

Code Formatting


While I don’t expect anyone’s coding format to be as highly-evolved to near-perfection as mine, I do prefer consistency.

Many times while reading a book, the code formatting style changes from query to query, page to page, etc.

I understand that with multiple authors, things may change between chapters (but even then, agree to something, would ya?) but it’s jarring to see queries written in several different styles throughout a book.

Most code formatting tools out there will get you at least 70% of the way to code being formatted the way you want, even if you’re particularly persnickety about things.

Code Placement


For the absolute love of Crispin Glover, stop putting code in places where it will span multiple pages.

It sucks for everyone, no matter how they’re reading the book, but all those nice little SQL Server Scrubs who bought (hopefully bought) the digital copy of your book have an awkward time copying and pasting things over to run locally.

For the Print-cesses of the world, we just don’t need that kind of page-turning, cliff-hanging suspense.

I’ve seen this more than anything, and it drives me bonkers every time. Just keep writing and say the code example is on the next page.

If your code example takes up a whole page and then some… Well, look, I don’t have an easy answer for you. You want it there for digital folks to copy, but no one with a print edition is going to write that all out from scratch.

Maybe you should just get yourself a nice GitHub gist and link to longer code examples? Use a URL shortener or GO codes to make memorable URLs?

Either way, this particular issue is in every book I’ve ever read.

Image Quality


We really need to talk about this one.

Now, I get it, you’re probably not gonna talk anyone into printing a full color SQL Server book. At least no one smart.

If you can do that, great. You’re one step closer to your book not looking like a 90’s zine about anarchy or graffiti or whatever.

The next step: use high quality images. Please, please, please, use high quality images.

No one’s going to be able to read your book and follow along without wondering if they have cataracts if they’re staring at washed-out, fuzzy images the whole way through.

Far too often, it looks like someone compressed-to-death and watered down the printer ink for every image in a book.

Tech Review


Get someone you disagree with to tech review your book. Not someone who’s wrong, just someone who has a different point of view.

This isn’t a pitch to get me to do it. Unless I love you to death, I’m probably not gonna wanna read and review your book.

But far too many books seem to suffer from “I got my friend to do this”, which leads to groupthink.

There’s no alternate viewpoint considered or explored, and you end up with one-sided highways of thought.

Doing a tech review shouldn’t be about toeing a line, it should be about making a book more technically accurate.

Concept Introduction


I’m not sure if this is a book or thought  organization issue, but… If you’re in an early chapter of a book, say the first three, and you keep seeing things like “we cover this more in chapter 12” or 13 or 14 or some other far off place, you probably shouldn’t be doing it right then and there.

When you’re going to introduce a concept, you should either be covering it in that chapter, or covering it in the next one.

I see this quite a bit, and wonder if I need to skip way ahead in the book and read about this other thing, and then come back.

That’s a crappy, because it makes you feel lost in the material that’s supposed to be bringing you clarity on a subject.

Try to contain yourself, here. Maybe you’re saying too much too soon.

Auto-Errata


It happens to me every time I write a blog post. I re-read it as I’m going through, I re-read it when I’m done, and I tend to check in on things a day or so before it goes live.

When it goes live, I read through the post, and notice half-a-hundred things that I wanna fix, tweak, or expand on.

I can only imagine what that’s like at Bookscale© where you have 300+ pages of blog posts publishing at once.

As you make changes to the book, you should notify folks who bought it. Just randomly posting corrections is hard to keep track of.

GitHub is, again, a decent way to get and track feedback. It also gives you a way to announce large releases to the content so everyone knows when a new version is available.

Hopefully you’re making digital copies available to folks who purchase a physical copy. Otherwise, you’re gonna have a hard time getting a better product in their hands.

Reinventing The Wheel


Rather than take up a whole page with some reheated DMV query, give some love to community scripts out there that do the same thing, but better.

Anyone who knows better is going to pass right over that stuff anyway.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.