create temp table sql Archives | Page 75 of 117

Views vs Indexed Views In SQL Server

Posted on August 18, 2022August 5, 2022 by Erik Darling

Award Winning

Imagine you have a rather complicated query that you want to abstract into a simple query for your less-than-with-it end users.

A view is probably a pretty good way of doing that, since you can shrink your preposterously-constructed tour through every table in the schema down to a simple select-from-one-object.

The problem is that now everyone expects it to perform well throughout all time, under any circumstances, come what may. It’s sort of like how your parents expect dinner to be $20 and tips to be 20% regardless of where they go or what they order.

Lobster? $5.
Steak? $5.
Bottle of wine? $5.
Any dessert you can imagine? $5.
Tip? Gosh, mister, another $5?

I sincerely apologize to anyone who continues to live in, or who moved to Europe to avoid tipping.

If you’d like some roommates, I have some parents you’d get along with.

Viewfinder

Creating a view in SQL Server doesn’t do anything special for you, outside of not making people remember your [reference to joke above] query.

You can put all manner of garbage in your view, make it reference another half dozen views full of garbage, and expect sparkling clean query performance every time.

Guess what happens?

Reality.

When you use views, the only value is abstraction. You still need to be concerned with how the query is written, and if the query has decent indexes to support it. In other words, you can’t just write a view and expect the optimizer to do anything special with it.

SQL Server doesn’t cache results, it only caches raw data. If you want the results of a view to be saved, you need to index it.

Take these two dummy queries, one against a created view, and the other an ad hoc query identical to what’s in the view:

CREATE OR ALTER VIEW
    dbo.just_a_query
WITH SCHEMABINDING
AS
SELECT
    p.OwnerUserId,
    TotalScore = 
        ISNULL
        (
            SUM(p.Score), 
            0
        ),
    TotalPosts = 
        COUNT_BIG(*)
FROM dbo.Posts AS p
WHERE EXISTS
      (
          SELECT
              1/0
          FROM dbo.Votes AS v
          WHERE 
              v.PostId = p.Id      
      )
GROUP BY
    p.OwnerUserId;
GO 

SELECT
    p.OwnerUserId,
    TotalScore = 
        ISNULL
        (
            SUM(p.Score), 
            0
        ),
    TotalPosts = 
        COUNT_BIG(*)
FROM dbo.Posts AS p
WHERE EXISTS
      (
          SELECT
              1/0
          FROM dbo.Votes AS v
          WHERE 
              v.PostId = p.Id      
      )
AND 
    p.OwnerUserId = 22656
GROUP BY
    p.OwnerUserId;
GO 

SELECT
    jaq.*
FROM dbo.just_a_query AS jaq
WHERE 
    jaq.OwnerUserId = 22656;
GO

The plans are identical, and identically bad. Why? Because I didn’t try very hard, and there’s no good indexes for them.

Remember when I said that’s important?

SQL Server Query Plan — avenues lined with trees

Keep in mind this is a query with some batch mode involved, so it could be a lot worse. But both instances complete within a second or so of each other.

So much for view performance.

Maintainer

The rules around indexed views are pretty strict, and the use cases are fairly narrow. I do find them quite useful on SQL Server Standard Edition where batch mode is terribly hobbled.

The horrible thing is that indexed views are so strict in SQL Server that we can’t even create one on the view in question. That really sucks. We get this error.

CREATE UNIQUE CLUSTERED INDEX 
    cuqadoodledoo
    ON dbo.not_just_a_query
(
    OwnerUserId
)
WITH
(
    SORT_IN_TEMPDB = ON,
    DATA_COMPRESSION = PAGE
);

Msg 10127, Level 16, State 1, Line 95

Cannot create index on view “StackOverflow2013.dbo.not_just_a_query” because it contains one or more subqueries.

Consider changing the view to use only joins instead of subqueries. Alternatively, consider not indexing this view.

Alternatively, go screw yourself. Allowing joins but not exists is somewhat baffling, since they’re quite different in that joins allow for multiple matches but exists does not. We’d have to do a lot of fancy grouping footwork to get equivalent results with a join, since distinct isn’t allowed in an indexed view in SQL Server either.

We could also pull the exists out of the view, add the Id column to the select list, group by that and OwnerUserId, index both of them, and… yeah nah.

I have no idea who’s in charge of indexed views in the product at this point, but a sufficiently lubricated republic would likely come calling with tar and feathers in the face of this injustice.

This is basic query syntax. It’s not like uh… min, max, sum, avg, except, intersect, union, union all, cross apply, outer apply, outer joins, or um, hey, is it too late for me to change careers?

The Pain In Pain Falls Painly On The Pain

You may have ended up here looking to learn all the minute differences between views and indexed views in SQL Server.

You may be disappointed in reading this post, but I can assure you that you’re not nearly as disappointed in this post as I am with indexed views in SQL Server.

They’re like one of those articles about flying cars where you read the headline and you’re like “woah, I’m living in the future”, but then three paragraphs in you find out the cars don’t really fly or drive and they might actually just be igloos that are only big enough for an Italian Greyhound or a paper plane that the author’s kid glued wheels to.

If you actually have a use case for indexed views, you’ll have to be really careful about making sure their maintenance doesn’t kill performance.

Thanks for reading!

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.

Why MAX Data Types Are Usually A Bad Choice For SQL Server Columns

Posted on August 17, 2022August 17, 2022 by Erik Darling

Easy Rider

When you’re trying to figure out how to store string data, it often seems easiest to just choose an extra long — even MAX — data type to avoid future truncation errors.

Even if you’re storing strings with a known, absolute length, developers may choose to not enforce that in the application, either via a drop down menu or other form of validation.

And so to avoid errors when users try to put their oh-so-important data in their oh-so-expensive database, we get columns added to tables that can fit a galaxy of data in them, when we only need to store an ashtray worth of data.

While getting data into those columns is relatively easy — most application inserts are single rows — getting data out of those columns can be quite painful, whether it’s searching or just presenting in the select portion of a query.

Let’s look at a couple simple examples of how that happens.

Search Engine

Let’s take a query like this one:

SELECT TOP (20)
    p.Id,
    p.Title,
    p.Body
FROM dbo.Posts AS p
WHERE p.Body LIKE N'SQL Server%';

The Body column in the Posts table is nvarchar and MAX, but the same thing would happen with a varchar column.

If you need a simple way to remember how to pronounce those data types, just remember to Pahk yah (n)vahcah in Hahvahd Yahd.

Moving on – while much has been written about leading wildcard searches (that start with a % sign), we don’t do that here. Also, in general, using charindex or patindex instead of leading wildcard like searching won’t buy you all that much (if anything at all).

Anyway, since you can’t put a MAX datatype in the key of an index, part of the problem with them is that there’s no way to efficiently organize the data for searching. Included columns don’t do that, and so we end up with a query plan that looks some-such like this:

We spend ~13.5 seconds scanning the clustered index on the Posts table, then about two minutes and twenty seven seconds (minus the original 13.5) applying the predicate looking for posts that start with SQL Server.

That’s a pretty long time to track down and return 19 rows.

Let’s change the query a little bit and look at how else big string columns can cause problems.

Memory Bank

Rather than search on the Body column, let’s select some values from it ordered by the Score column.

Since Score isn’t indexed, it’s not sorted in the database. That means SQL Server needs to ask for memory to put the data we’re selecting in the order we’re asking for.

SELECT TOP (200)
    p.Body
FROM dbo.Posts AS p
ORDER BY p.Score DESC;

The plan for this query asks for a 5GB memory grant:

I know what you’re thinking: the Body column probably has some pretty big data in it, and you’re right. In this case, it’s the right data type to use.

The bad news is that SQL Server will makes the same memory grant estimation based on the size of the data we need to sort whether or not it’s a good choice.

I talk more about that in this Q&A on Stack Exchange.

Thanks for reading!

Going Further

Stop Making SQL Server Weird, Microsoft.

Posted on August 16, 2022August 16, 2022 by Erik Darling

Ledgers And Tridents And Toads, Fido-hmy

With every release of SQL Server, Microsoft adds in a bunch of tadpole features. It’s up to those tadpoles to survive long enough to turn into beautiful uh… ~~Frogs~~ Toads.

The way they do that is by having enough paying customers use them — usually prodded by #MVPBuzz chasers — to get continued support and development.

All of this is subject to internal whims (like Microsoft rebranding, renaming, or rebasing a feature, take Mirroring >> Availability Groups), and external customer fancies (Tableau vs PowerBI, etc).

I feel bad for folks who latch onto these things with any sort of seriousness, like Big Data Clusters, or any of the dozens of other features that died before they could bump their butts on a log.

One marvels at the sheer number of “you better get on board” style session titles that have been molded around features where absolutely no one got on board.

Even Kubernetes (which I know isn’t a SQL Server feature) is sort of like K-Pop: Apparently millions of people like it, but I’ve never met a single person who actually uses it.

Maybe it’s more like a bot net. K-Bot? Not sure, here. I’m bad at creative writing.

They Don’t Like You Anyway

Microsoft has this weird perceived popularity issue. Perhaps it’s the diminishing self-esteem that comes with age, with SQL Server turning 30-something and all. Microsoft keeps trying to make SQL Server appeal to swaths of people who will never like it, because it’s not the shiny new thing on the block.

They won’t love you like I do.

Adding in every passing fad to the product to try to stay young and hip does a great disservice to the product as a whole, which has many addressable flaws. Microsoft dedicates countless development cycles to goofy memes that will wither on the vine shortly after the first version.

Imagine if cars still came with cassette players, but not ones that can do anything cool like play both sides without flipping the tape over, or being able to fast forward or rewind to the start or end of a track.

You’d basically have Spatial Data And The Graph Nodes in your car trying to play Garage Days Re-Re-Revisited.

Duets

I do wish the Microsoft bean counters would dance with who they brung a bit more. You know, people who need a relational database with as few bugs and preventable performance issues as possible, with useful development features, and a coherent toolset?

Clearing the minefield of pitfalls that developers fall int0 (hello, MERGE!) would allow less effort to be spent on chasing down oddball problems and more time spent developing quality features.

It’s sort of like if I had a keyboard that fixed all my typos immediately, I’d have more time to write quality blog posts that don’t need several editorial passes.

Maybe someday.

Thanks for reading!

Going Further

SQL Server 2022 CTP 2.1 Improvements To Parameter Sensitive Plan Optimization

Posted on August 15, 2022August 1, 2022 by Erik Darling

Hop On Your Good Foot

Several weeks back, I blogged about a missed opportunity with the new parameter sensitive plan feature. At the time, I thought that there was indeed sufficient skewness available to trigger the additional plan variants, and apparently some nice folks at Microsoft agreed.

If we step back through the old demo, we’ll get different results.

EXEC dbo.VoteSniffing 
    @VoteTypeId = 4;

EXEC dbo.VoteSniffing 
    @VoteTypeId = 2;

I’m not gonna go into a lot of depth here, because I’m writing this while traveling, but we get the (mostly) desired outcome of two different plans being generated. Each plan is more suitable to the amount of data that the query has to process.

I’d consider this a success, even if it adds more queries to the unidentifiable corpse-pile.

Thanks for reading!

Going Further

SQL Server 2022 Introduces IS [NOT] DISTINCT FROM; World Underwhelmed

Posted on August 12, 2022August 12, 2022 by Erik Darling

Back In Time

Feeling quite old while remembering the first time I ever read this, and this note at the end:

This is the SQL language IS DISTINCT FROM feature —implemented in the query processor, but not yet available in the T-SQL language.

Groovy. Since at least 2011, this has been in the Query Processor, and here in 2022 we finally get the linguistic support.

The thing is, it’s pretty underwhelming, and I’m going to show you why I think so.

First, in the docs for SQL Server, all the examples use a single literal value, like so:

SELECT * FROM #SampleTempTable WHERE id IS DISTINCT FROM 17;

I went looking for other docs examples from vendors who have had the syntax around for 10+ years, and there wasn’t anything all that much more interesting.

Mostly case expressions and whatnot.

Big deal.

Alignment

First, if I try to run either of these queries, I’ll get an error after about 6 seconds.

SELECT
    c = COUNT_BIG(*)
FROM dbo.Comments AS c
WHERE c.UserId IS DISTINCT FROM 
               (
                   SELECT
                       v.UserId
                   FROM dbo.Votes AS v
               );

SELECT
    c = COUNT_BIG(*)
FROM dbo.Comments AS c
WHERE c.UserId IS NOT DISTINCT FROM 
               (
                   SELECT
                       v.UserId
                   FROM dbo.Votes AS v
               );

Why does it take 6 seconds to get an error? Because a few parts of the query plan have to do some work, and then finally:

Msg 512, Level 16, State 1, Line 1
Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.

Sort of like how sometimes you try to SUM a column and after a while you get an error about arithmetic overflow.

This is a bit annoying, because that means we need a way to return a single value to evaluate.

So Yeah…

We can’t even rewrite the queries like this to get around the error, but I do want to show you the plans.

This is why we have to wait several seconds to get an error (unless you change it to IS DISTINCT FROM ALL/ANY):

SELECT
    c = COUNT_BIG(*)
FROM dbo.Comments AS c
WHERE c.UserId IS DISTINCT FROM 
               (
                   SELECT
                       v.UserId
                   FROM dbo.Votes AS v
                   WHERE v.UserId = c.UserId
               );

SELECT
    c = COUNT_BIG(*)
FROM dbo.Comments AS c
WHERE c.UserId IS NOT DISTINCT FROM 
               (
                   SELECT
                       v.UserId
                   FROM dbo.Votes AS v
                   WHERE v.UserId = c.UserId
               );

Adding a where clause inside the subquery doesn’t help.

But these query plans are total motherchucking disasters, anyway. We’ll get into indexing later, but right now they both have the same shape and operators, though slightly different semantics to deal with is/is not distinct.

Both plans run single threaded, and using Nested Loops as the physical join type, which stinks because we’re putting together two pretty big tables.

Not to mention that Eager Index Spool. What a filth.

Adding Indexes

We need these indexes to make things go any faster. Before we do anything else, let’s create these so we’re not just sitting around thumb-twiddling.

CREATE INDEX
    c
ON dbo.Comments
    (UserId)
WITH
(
    SORT_IN_TEMPDB = ON,
    DATA_COMPRESSION = PAGE
);

CREATE INDEX
    v
ON dbo.Votes
    (UserId)
WITH
(
    SORT_IN_TEMPDB = ON,
    DATA_COMPRESSION = PAGE
);

Thinking About It

Okay, so writing the query like we did up there isn’t going to get us anything. Perhaps my expectations are a bit too exotic.

Let’s try something a bit more austere:

SELECT
    c = COUNT_BIG(*)
FROM dbo.Comments AS c
JOIN dbo.Votes AS v
    ON c.UserId IS DISTINCT FROM v.UserId;

SELECT
    c = COUNT_BIG(*)
FROM dbo.Comments AS c
JOIN dbo.Votes AS v
    ON c.UserId IS NOT DISTINCT FROM v.UserId;

The first thing to be aware of here is that the IS DISTINCT FROM is an inequality predicate, so you’re stuck with Nested Loops as the physical join type:

I ran out of care-juice waiting for this to finish, so all you’re getting is an estimated plan. The lack of an equality predicate here means you don’t have Hash or Merge join as an option.

Following up on bad ideas, the IS NOT DISTINCT FROM is an equality predicate, but the plan chosen is a serial Merge Join variety, which drags on 14 seconds too long:

Drop An Index

If we simulate not having any useful indexes on one table or the other by hinting the clustered index, the performance outlook does not improve.

SELECT
    c = COUNT_BIG(*)
FROM dbo.Comments AS c WITH(INDEX = 1)
JOIN dbo.Votes AS v
    ON c.UserId IS NOT DISTINCT FROM v.UserId;

SELECT
    c = COUNT_BIG(*)
FROM dbo.Comments AS c
JOIN dbo.Votes AS v WITH (INDEX = 1)
    ON c.UserId IS NOT DISTINCT FROM v.UserId;

No useful parts of the first query happen in Batch Mode, but the second query is rescued by two hash aggregates happening in batch mode.

An odd point to make on a blog focused on SQL Server performance tuning is that sometimes not having a useful index gets you a better plan.

Anyway, I’m going back to my vacation.

Thanks for reading!

Going Further

SQL Server 2022: Introduces the DATETRUNC Function, So You Can Truncate Your Dates And Stuff

Posted on August 11, 2022July 30, 2022 by Erik Darling

Underscoring The Importance

When I first sat down to write about this, I made a funny mistake: I kept writing DATE_TRUNC over and over again.

In SQL Server it’s DATETRUNC.

Why? Because that’s the way it’s implemented in Postgres and DB2. Oracle, of course, just calls it TRUNC.

So, while it’s nice to have (what appears to be) the same behavior, it doesn’t exactly help to not have a 1:1 calling equivalent to other platforms.

I assume most of these additions to T-SQL are for cross-platform development and migrations.

Of course, Microsoft being so gosh darn late to this game means folks have probably been rolling-their-own versions of these functions for years.

If they went and called their system function DATE_TRUNC or even TRUNC, they might have some object naming issues to contend with.

Well, okay. But how does it work?

Childish Games

Here are some quick examples of how you call it.

SELECT TOP (10)
    u.DisplayName,
    year = 
        DATETRUNC(YEAR, u.LastAccessDate),
    quarter = 
        DATETRUNC(QUARTER, u.LastAccessDate),
    month = 
        DATETRUNC(MONTH, u.LastAccessDate),
    dayofyear = 
        DATETRUNC(DAYOFYEAR, u.LastAccessDate),
    day = 
        DATETRUNC(DAY, u.LastAccessDate),
    week = 
        DATETRUNC(WEEK, u.LastAccessDate),
    iso_week = 
        DATETRUNC(ISO_WEEK, u.LastAccessDate),
    hour = 
        DATETRUNC(HOUR, u.LastAccessDate),
    minute = 
        DATETRUNC(MINUTE, u.LastAccessDate),
    second = 
        DATETRUNC(SECOND, u.LastAccessDate),
    millisecond = 
        DATETRUNC(MILLISECOND, u.LastAccessDate),
    microsecond = 
        DATETRUNC(MICROSECOND, u.LastAccessDate) /*Doesn't work with datetime because there are no microseconds*/
FROM dbo.Users AS u;

And here are the results:

The thing to note here is that there’s no rounding logic involved. You just go to the start of whatever unit of time you choose. Of course, this doesn’t seem to do anything to the millisecond portion of DATETIME, because it’s not precise enough.

But for anyone out there who was hoping for a SOMONTH function to complement the EOMONTH function, you get this instead.

Works well enough!

But does it perform, Darling?

UnSARGable?

To make any test like this worthwhile, we need an index to make data searchable.

CREATE INDEX
    v
ON dbo.Votes
    (CreationDate)
WITH
(
    SORT_IN_TEMPDB = ON,
    DATA_COMPRESSION = PAGE
);

You know, because that’s what they do. To make searching faster. Hello.

So look, under these perfect circumstances, everything performs well. But we have to do a lot of typing.

SELECT
    c = COUNT_BIG(*)
FROM dbo.Votes AS v
WHERE v.CreationDate >= DATETRUNC(YEAR, CONVERT(datetime, '20130101 00:00:00.000'));

Note here that we’re working on a literal value, not a column value, and we have to tell the datetrunc function which type we want via the convert function so that we get a simple seek plan:

Without all that, we get a dynamic seek plan:

SELECT
    c = COUNT_BIG(*)
FROM dbo.Votes AS v
WHERE v.CreationDate >= DATETRUNC(YEAR, '20130101 00:00:00.000');

This has some… obvious performance issues compared to the above plan with correct data types.

Query Pranks

Frequent blog readers will not be surprised that wrapping a table column in the new DATETRUNC function yields old performance problems:

SELECT
    c = COUNT_BIG(*)
FROM dbo.Votes AS v
WHERE DATETRUNC(YEAR, v.CreationDate) >= CONVERT(datetime, '20130101 00:00:00.000');

This is particularly annoying because we’re truncating the column to the beginning of the year, which should be searchable in the index since that’s the sort order of the data in the index.

Like most functions, these are fine in the presentation layer, but terrible in the relational layer. There’s no warning about this performance degradation in the documentation, either at the example of using the function in a where clause, or in the final closing remarks.

But that’s par for the course with any of these built-in functions.

Thanks for reading!

Going Further

The Cost Of Complexity In SQL Server

Posted on August 10, 2022August 5, 2022 by Erik Darling

Value Added

SQL Server can do a whole lot of things, aside from return the results of your queries. The thing is, the more of those you add into the mix the tougher it becomes to scale, and maintain reliable performance.

Of course, there are other design considerations that introduce the same kind of complexity:

Lots of tables in your schema leads to more joins
Lots of nested views lead to more choice for the optimizer
Lots of databases lead to cross-database joins, which can lead to tricky issues if they’re in different compat levels, etc.
Lots of functions and triggers that have entire books of business logic in them make everything go to hell

But beyond that, consider some more internal design choices:

Change Tracking
Change Data Capture
Auditing
Temporal Tables

Now you’re asking SQL Server to log a bunch of stuff every time you do something in your database. You have the action, and then writing that action down.

On top of that, you’re also opening yourself up to more queries because people are going to want to access and report on that data. This isn’t even touching all the additional internal monitoring that goes into SQL Server to track everything going on in there.

Observer overhead is a real thing.

Now let’s add in data protection and synchronization

Availability Groups
Replication (God forgive me)

I’m leaving Log Shipping out of the equation because it doesn’t really add much complexity on its own. You may add complexity do it by needing to copy files to strange places and choosing to report off shipped databases. I’m also leaving Failover Clusters out, because there’s one set of data that sits in one place. Only the SQL Server instance moves between nodes, unless you decide to span Failover Clusters and synchronize the data using your SAN or something. That stuff usually has a big enough price tag on it to dissuade most folks from tramping down that Path Of Misery©.

Both Availability Groups and Replication can make life miserable. I work with some very nice people who can’t create indexes unless they switch their Availability Groups to asynchronous mode first.

I also work with some very nice people who use Replication and don’t get much sleep at night. Me? I won’t touch the stuff.

The point, though, is that if you decide to have SQL Server start doing a whole lot of extra stuff every time you run a query, or make your queries really complicated to come up with an execution plan for, performance might suffer.

It’s up to you to make sure that you plan accordingly for adding in new features.

Thanks for reading!

Going Further

Even If SQL Server Table Variables Were Always In Memory, It Wouldn’t Make Them Better Than Temp Tables

Posted on August 9, 2022July 22, 2022 by Erik Darling

Past Perfect

You know all those pesky developer myths that never go away?

CTEs and Views can’t use indexes: WRONG
Subqueries are slower than joins: WRONG
Temp tables are bad for performance: WRONG
Table Variables are always in memory: WRONG

Okay, that’s not a complete list, but a complete list would fill the internet to its very brim and cause all sorts of overflow errors.

My job as a performance tuning consultant is to teach folks when they’ve been lead astray.

To confuse things a bit further, Microsoft allows for in-memory @table variables, but you have do a lot of work to get set up to use them.

They’re not the ones you get out of the box.

Future Mediocre

Let’s say for the sake of argument that @table variables were always more “in-memory” than #temp tables (they’re not). They’d still have a lot of the same fundamental problems:

With all that stuff in mind: who cares if they’re only in memory? There are so many downsides for most query patterns, they could all happen in quantum space and performance would still stink.

I know, they have their uses:

But most of y’all aren’t using them that wisely.

And here’s the other thing: Most of y’all don’t have enough memory to begin with, and the memory you do have you misuse in a million ways. Why do you wanna make that worse?

Thanks for reading!

Going Further

Some Requests For Authors Of SQL Server Books

Posted on August 8, 2022August 8, 2022 by Erik Darling

Heal Thyself

I dislike criticism without introspection, so anything that I say here I’ll also be doing my best to apply to anything I produce.

But anyway, I’m writing this because I read a lot of SQL Server books, and there are some things about them are pretty frustrating.

This goes for both print and digital copies, though some advice is specific to print copies because I think a lot of folks still buy them.

Anyway, here are some thoughts I have. You can show this post to your editors if you want. Maybe it’s their fault, not yours.

I have no idea.

Code Formatting

While I don’t expect anyone’s coding format to be as highly-evolved to near-perfection as mine, I do prefer consistency.

Many times while reading a book, the code formatting style changes from query to query, page to page, etc.

I understand that with multiple authors, things may change between chapters (but even then, agree to something, would ya?) but it’s jarring to see queries written in several different styles throughout a book.

Most code formatting tools out there will get you at least 70% of the way to code being formatted the way you want, even if you’re particularly persnickety about things.

Code Placement

For the absolute love of Crispin Glover, stop putting code in places where it will span multiple pages.

It sucks for everyone, no matter how they’re reading the book, but all those nice little SQL Server Scrubs who bought (hopefully bought) the digital copy of your book have an awkward time copying and pasting things over to run locally.

For the Print-cesses of the world, we just don’t need that kind of page-turning, cliff-hanging suspense.

I’ve seen this more than anything, and it drives me bonkers every time. Just keep writing and say the code example is on the next page.

If your code example takes up a whole page and then some… Well, look, I don’t have an easy answer for you. You want it there for digital folks to copy, but no one with a print edition is going to write that all out from scratch.

Maybe you should just get yourself a nice GitHub gist and link to longer code examples? Use a URL shortener or GO codes to make memorable URLs?

Either way, this particular issue is in every book I’ve ever read.

Image Quality

We really need to talk about this one.

Now, I get it, you’re probably not gonna talk anyone into printing a full color SQL Server book. At least no one smart.

If you can do that, great. You’re one step closer to your book not looking like a 90’s zine about anarchy or graffiti or whatever.

The next step: use high quality images. Please, please, please, use high quality images.

No one’s going to be able to read your book and follow along without wondering if they have cataracts if they’re staring at washed-out, fuzzy images the whole way through.

Far too often, it looks like someone compressed-to-death and watered down the printer ink for every image in a book.

Tech Review

Get someone you disagree with to tech review your book. Not someone who’s wrong, just someone who has a different point of view.

This isn’t a pitch to get me to do it. Unless I love you to death, I’m probably not gonna wanna read and review your book.

But far too many books seem to suffer from “I got my friend to do this”, which leads to groupthink.

There’s no alternate viewpoint considered or explored, and you end up with one-sided highways of thought.

Doing a tech review shouldn’t be about toeing a line, it should be about making a book more technically accurate.

Concept Introduction

I’m not sure if this is a book or thought organization issue, but… If you’re in an early chapter of a book, say the first three, and you keep seeing things like “we cover this more in chapter 12” or 13 or 14 or some other far off place, you probably shouldn’t be doing it right then and there.

When you’re going to introduce a concept, you should either be covering it in that chapter, or covering it in the next one.

I see this quite a bit, and wonder if I need to skip way ahead in the book and read about this other thing, and then come back.

That’s a crappy, because it makes you feel lost in the material that’s supposed to be bringing you clarity on a subject.

Try to contain yourself, here. Maybe you’re saying too much too soon.

Auto-Errata

It happens to me every time I write a blog post. I re-read it as I’m going through, I re-read it when I’m done, and I tend to check in on things a day or so before it goes live.

When it goes live, I read through the post, and notice half-a-hundred things that I wanna fix, tweak, or expand on.

As you make changes to the book, you should notify folks who bought it. Just randomly posting corrections is hard to keep track of.

GitHub is, again, a decent way to get and track feedback. It also gives you a way to announce large releases to the content so everyone knows when a new version is available.

Hopefully you’re making digital copies available to folks who purchase a physical copy. Otherwise, you’re gonna have a hard time getting a better product in their hands.

Reinventing The Wheel

Rather than take up a whole page with some reheated DMV query, give some love to community scripts out there that do the same thing, but better.

Anyone who knows better is going to pass right over that stuff anyway.

Thanks for reading!

Going Further

SQL Server 2022’s GENERATE_SERIES Doesn’t Suck Anymore

Posted on August 5, 2022August 5, 2022 by Erik Darling

Life Comes At You Fast

The summer intern at Microsoft was hard at work between CTPs. Last time around, there were some serious performance issues with our dear new friend GENERATE_SERIES.

With the release of CTP 2.1, the problems that we saw the first time around are all gone. But there are still a couple small caveats that you should be aware of.

There’s also been a change in the way you call the function, too. You not longer need the START and STOP identifiers in the function.

You just put in whatever numbers you’re into.

That’s probably a good thing.

Caveat #1: Parallelism Restrictions

Loading data into tables that have row store indexes on them can’t go parallel.

DROP TABLE IF EXISTS
    dbo.art_aux;

CREATE TABLE 
    dbo.art_aux
(
    id int NOT NULL PRIMARY KEY CLUSTERED
);

INSERT INTO
    dbo.art_aux WITH(TABLOCK)
(
    id
)
SELECT
    gs.value
FROM GENERATE_SERIES
     (
         1, 
         10000000
     ) AS gs
OPTION
(
    MAXDOP 8, 
    QUERYTRACEON 8649
);

But performance is still pretty good, here. Much better than prior.

I think most folks out there would be okay waiting a few seconds for 10 million rows to show up in a table for them.

So sequential. Much unique.

But, what if you want to load those rows in faster?

Caveat #2: Use A Heap Or SELECT INTO Instead

If you create a heap, or just SELECT INTO on instead, you’ll get a parallel load into the table.

For 10 million rows, there’s an improvement of about 1.7 seconds on my sort of crappy travel VM with 4 available CPUs.

DROP TABLE IF EXISTS
    dbo.art_aux;

CREATE TABLE 
    dbo.art_aux
(
    id int NOT NULL
);

INSERT INTO
    dbo.art_aux WITH(TABLOCK)
(
    id
)
SELECT
    gs.value
FROM GENERATE_SERIES
     (
         1, 
         10000000
     ) AS gs;

Here’s the plan:

Yes, dear reader, I even blog on vacation. That’s how much I love and care about you.

Thanks for reading!