SQL Server Archives | Page 123 of 164

Ways To Participate In The Data Community Instead Of Speaking

Posted on October 8, 2020May 16, 2022 by Erik Darling

It’s Hard Out There For A Pup

When you’re first getting started speaking, you might find it difficult to get picked, or find events to speak at where the schedule works for you.

Event organizers, or people who vote to select sessions might not recognize your name yet. If you’re presenting about a topic in a crowded field, it can be difficult to get noticed unless you gin up awesome session titles and abstracts.

Some groups meet during the day when you might have work obligations you can’t step away from, and some might happen at night where you have family obligations you can’t step away from.

Or, like, you might still be working.

Who knows, but I hope not.

If you want to participate in the community in ways that can get you some name recognition, or fit in your schedule (or both!), here are some ways to do it.

Answer Me

Here are some good Q&A sites:

Answering questions on these is great experience, because there’s such a wide variety of topics. Heck, even just reading Q&A can be valuable.

There are some downsides, though: Many questions are poorly asked. They’re not bad questions, but there’s either so little information or so many local factors at play, it can be frustrating.

You gotta learn early to walk away from those.

Painted Word

If you’re the strong, silent type, you can take your presentation and turn it into a series of blogs. Not everyone enjoys interacting publicly like that, and that’s a very valid way to feel.

I know it sounds crazy — why would someone come to your session if the material is available online, at their convenience?

Because watching someone do something brings life to it. Do you like reading your favorite movie script as much as you like watching your favorite movie?

I don’t.

The other nice thing is that you can write really detailed blog posts, which takes some of the strain off having every ounce of minutiae in your session. You can point people to your posts once you’ve covered the most important parts of your topic.

There’s also a lot of value in writing your material out in long form. You’ll notice all sorts of extra things you want to explore, and you have the space to do it.

Be Kind, Rewind

If you have the setup to do it, because for some weird reason you had to start working from home full time, it can be fun to record your session.

It forces you to deliver the whole thing, say it out loud, and figure out how you want to say things.

I’m gonna be honest with you: the way you write things down isn’t gonna be the same way you say them out loud.

If you’re happy with it, put it on YouTube.

Why These Things?

Remember up at the top of the post where I talked about name recognition? All of these things can help establish that.

And look, I’m not saying you have to do any of these things. You’re under no obligation. But if you want to get into speaking, these are all activities that can help you get started, and make you better at it.

Thanks for reading!

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

The Cheapskate’s Guide To Accessing Data In SQL Server

Posted on October 7, 2020May 16, 2022 by Erik Darling

Locality

Every action has some latency attached to it. Depending on how far you have to go, it could be a lot or a little.

And of course, it also depends on some situations you might run into along the way.

That’s one reason why batch mode can introduce such a performance improvement: CPU instructions are run on batches of rows at a time local to the CPU, rather than a single row at a time. Less fetching is generally a good thing. Remember all those things they told you about cursors and RBAR?

For years, I’ve been sending people over to this post by Jeff Atwood, which does a pretty good job of describing those things.

But you know, I need something a little more specific to SQL Server, and using a slightly different metric: We’re gonna assign $Query Bucks$ to those latencies.

After all, time is money.

Gold Standard

Using a similar formula to Jeff’s, let’s look at how expensive it gets once you cross from memory to disk.

See the end there, when you jump from nanoseconds to microseconds? At those prices, you start to understand why people like me tell you to solve problems with more memory instead of faster disks. Those numbers are for local storage, of course, and main memory is still leaving Road Runner clouds around it.

If you’re on a SAN — and I don’t mean that SAN disks are slower; they’re not — you have something else to think about.

What I mean is the component in between that can be a real problem: The N in SAN. The Network.

If you add latency in just the milliseconds, costs pretty quickly jump up into numbers that’d make athletes blush. And if you’ve ever seen those 15 second I/O warnings in the error log…

This is where a lot of people under-provision themselves into nightmare. 10Gb Ethernet connections can move data fairly quickly, at around 1.2 GB/s. Which is great for data that’s easily accounted for in 100s of MB. It’s less great for much bigger data, and it’s worse when there’s a lot of other traffic on the same network.

Sensitivity

Competition for these resources, which is really common for database workloads that often have multiple queries all reading and writing data simultaneously, can take what would be an otherwise fine SAN and make it look like a tarpit.

You have to be really careful about these things, and how you choose to address them when you’re dealing with SQL Server.

Standard Edition is in particularly rough shape, with the buffer pool being a laughable 128GB. In order to keep things tidy, your indexes really need to be spot on, so you don’t have unnecessary things ending up there.

The more critical a workload is, the more you have riding on getting things right, which often means getting these numbers as low as possible.

Hardware that’s meant to help businesses consolidate isn’t always set up (or designed) to put performance first. Once you start attaching prices to those decisions that show how much time they can cost your workloads is a good way to start making better decisions.

Thanks for reading!

Going Further

Please Use Block Quotes For Code Comments In Your T-SQL

Posted on October 6, 2020May 16, 2022 by Erik Darling

Options!

When you’re writing code, you have some options when it comes to leaving comments:

Double dashes at the start of a line: -- Comment!
Block quotes around a section of text: /* Comment! */
No comment whatsoever:

The problem with the double-dash method is that there’s no terminator for it. If you end up pasting the query from a DMV where it’s not all nicely formatted, it’s pretty likely going to end up all on a single line.

With a long enough query, it can be a real pain to scroll across looking for comments and separating them to a new line so you can run the query.

What really sucks is that tools that can automatically format T-SQL for you can’t handle this either. If I could just right click and solve the problem, I’d be talking about that instead.

So, please: If you’re going to put comments in your code, use the block quotes.

clearly superior

Thanks for reading!

Going Further

Mixed Extents Episode 2: Indexes And Statistics Maintenance With Erin Stellato

Posted on October 5, 2020May 16, 2022 by Erik Darling

Erin Stellato!

I got the chance to sit down and chat about index and statistics maintenance with the wonderful Erin Stellato (b|t), and the nice folks at eightkb.

Enjoy! And make sure to subscribe to their YouTube Channel for more great content.

Thanks for watching!

Going Further

Starting SQL: Why Your SQL Server Query Can’t Go Parallel, Scalar Functions Edition

Posted on September 9, 2020May 17, 2022 by Erik Darling

I Won’t Share You

Yesterday we looked at where table variables can have a surprising! impact on performance. We’ll talk more about them later, because that’s not the only way they can stink. Not by a long shot. Even with 1 row in them.

Anyway, look, today’s post is sort of like yesterday’s post, except I’ve had two more drinks.

What people seem to miss about scalar valued functions is that there’s no distinction between ones that touch data and ones that don’t. That might be some confusion with CLR UDFs, which cause parallelism issues when they access data.

Beans and Beans

What I want to show you in this post is that it doesn’t matter if your scalar functions touch data or not, they’ll still have similar performance implications to the queries that call them.

Now look, this might not always matter. You could just use a UDF to assign a value to a variable, or you could call it in the context of a query that doesn’t do much work anyway. That’s probably fine.

But if you’re reading this and you have a query that’s running slow and calling a UDF, it just might be why.

If the UDF queries table data and is inefficient
If the UDF forces the outer query to run serially

They can be especially difficult on reporting type queries. On top of forcing them to run serially, the functions also run once per row, unlike inline-able constructs.

Granted, this once-per-row thing is worse for UDFs that touch data, because they’re more likely to encounter the slings and arrows of relational data. The reads could be blocked, or the query in the function body could be inefficient for a dozen reasons. Or whatever.

I’m Not Touching You

Here’s a function that doesn’t touch anything at all.

CREATE OR ALTER FUNCTION dbo.little_function (@UserId INT)
RETURNS BIGINT
WITH SCHEMABINDING, 
     RETURNS NULL ON NULL INPUT
AS
BEGIN
DECLARE @d DATETIME = GETDATE();
RETURN
(
    ( 
      SELECT @UserId
    )
)
END
GO

I have the declared variable in there set to GETDATE() to disable UDF inlining in SQL Server 2019.

Yes, I know there’s a function definition to do the same thing, but I want you to see just how fragile a feature it is right now. Again, I love where it’s going, but it can’t solve every single UDF problem.

Anyway, back to the story! Let’s call that function that doesn’t do anything in our query.

SELECT TOP (1000) 
    c.Id,
    dbo.little_function(c.UserId)
FROM dbo.Comments AS c
ORDER BY c.Score DESC;

The query plan looks like so, with the warning in properties about not being able to generate a valid parallel plan.

In this plan, we see the same slowdown as the insert to the table variable. There’s no significant overhead from the function, it’s just slower in this case because the query is forced to run serially by the function.

This is because of the presence of a scalar UDF, which can’t be inlined in 2019. The serial plan represents, again, a significant slowdown over the parallel plan.

Bu-bu-bu-but wait it gets worse

Let’s look at a worse function.

CREATE OR ALTER FUNCTION dbo.big_function (@UserId INT)
RETURNS BIGINT
WITH SCHEMABINDING, 
     RETURNS NULL ON NULL INPUT
AS
BEGIN
DECLARE @d DATETIME = GETDATE();
RETURN
(
    ( 
      SELECT SUM(p.Score)
      FROM dbo.Posts AS p
      WHERE p.OwnerUserId = @UserId
    ) - 
    (
      SELECT SUM(c.Score)
      FROM dbo.Comments AS c
      WHERE c.UserId = @UserId
    )
)
END
GO

Not worse because it’s a different kind of function, just worse because it goes out and touches tables that don’t have any helpful indexes.

Getting to the point, if there were helpful indexes on the tables referenced in the function, performance wouldn’t behave as terribly. I’m intentionally leaving it without indexes to show you a couple funny things though.

Because this will run a very long time with a top 1000, I’m gonna shorten it to a top 1.

SELECT TOP (1) 
    c.Id,
    dbo.big_function(c.UserId)
FROM dbo.Comments AS c
ORDER BY c.Score DESC;

Notice that in this plan, the compute scalar takes up a more significant portion of query execution time. We don’t see what the compute scalar does, or what the function itself does in the actual query plan.

The compute scalar operator is what’s responsible for the scalar UDF being executed. In this case, it’s just once. If I had a top that asked for more than one row, It would be responsible for more executions.

We don’t see the function’s query plan in the actual query, because it could generate a different query plan on each execution. Would you really want to see 1000 different query plans?

Anyway, it’s quite easy to observe with operator times where time is spent here. Most people read query plans from right to left, and that’s not wrong.

In that same spirit, we can add operator times up going from right to left. Each operator not only account for its own time, but for the time of all operators that come before it.

The clustered index scan takes 7.5 seconds, the Sort takes 3.3 seconds, and the compute scalar takes 24.9 seconds. Wee.

Step Inside

If you get an actual plan for this query, you won’t see what the function does. If you get an estimated plan, you can get a picture of what the function is up to.

This is what I meant by the function body being allowed to go parallel. This may lead to additional confusion when the calling query accrues parallel query waits but shows no parallel operators, and has a warning that a parallel plan couldn’t be generated.

It’s Not As Funny As It Sounds

If you look at a query plan’s properties and see a non-parallel plan reason, table variable modifications and scalar UDFs will be the most typical cause. They may not always be the cause of your query’s performance issues, and there are certainly many other local factors to consider.

It’s all a bit like a game of Clue. You might find the same body in the same room with the same bashed in head, but different people and blunt instruments may have caused the final trauma.

Morbid a bit, sure, but if query tuning were always a paint by numbers, no one would stay interested.

Anyway.

In the next posts? we’ll look at when SQL Server tells you it needs an index, and when it doesn’t.

Thanks for reading!

Going Further

Starting SQL: Compensating For Disk Waits In SQL Server

Posted on August 27, 2020May 14, 2022 by Erik Darling

Repairer Of Reputations

CPU waits are relatively easy to figure out once you wrap your head around the settings that contribute to them, and the queries that cause them. There’s a pretty direct correlation between parallelism, thread usage, and CPU usage.

Compensating for disk waits is a bit of a different game, because there’s a bit to consider from a few different angles. But first, let’s distinguish a little bit.

Waits that commonly crop up when you’re waiting on disk:

PAGEIOLATCH_**
WRITELOG

When people hear they’re waiting a lot on disk, their first inclination might be that they need faster disks. For WRITELOG waits, that can definitely be a factor. For PAGEIOLATCH waits, it probably shouldn’t be your first move.

Relatively Speaking

When SQL Server hits PAGEIOLATCH waits, it’s to signal operations needing to read pages from disk into memory. If you just rebooted, this is inevitable. You start with a totally cold buffer cache.

But if you have enough memory, you’re not likely to see queries consistently waiting on it. Why? Because if data you need is already in memory, that’s where you go get it from. Why go to disk if you don’t have to? It’s icky out there.

If you really want to compensate for this wait, you’re going to need to think about a few things, like

How much memory you have, and how much memory you’re allowed (non-Enterprise versions have limits)
How much data you have, and how many indexes you have

Let’s say end user queries are consistently waiting on reading from disk. It doesn’t matter much if the wait is fast or slow, what matters is that the data isn’t in memory. Sure, it matters more if the waits are slow, but the first question is memory.

Do you have enough?
Can you add more?
Would what you have be enough if you had fewer indexes? (Unused/Duplicative)
Would what you have be enough if you had less data? (Purging/Archving)

Judgement Night

The reason getting memory right is so crucial is because of how much it’s responsible for.

Aside from caching all those thoughtfully crafted data pages, queries need it to sort and hash data, and there are all sorts of other lower level caches that rely on it. The plan cache is probably the most obvious.

Once you realize that memory is a shared resource, you treat it a whole lot differently. Especially if you know just how much memory some things can take.

To learn more, check out my videos:

Costing

Yeah, memory is cheap. Unless you need so much that your next step is going to Enterprise Edition.

But there’s an intermediate step in the mix that not many people talk about. You can have 2-3 Standard Edition boxes with data split out, and have it potentially be more cost effective than jumping to Enterprise Edition.

This is a better fit for applications/servers that use multiple databases, of course, but I’ve seen people do it with archival data too.

Of course, there are some disk things that you should fix. Like if you’re on a SAN and using <8Gb networking, or if you’re using a VM and not using PVSCSI disks.

The point though, is that if you have room to add memory, you should do that before fiddling with disks. It just plain goes further, because you’re not just helping queries read data into memory faster. You’re caching more pages overall, and you have more memory available for query memory grants (and other caching activities).

Faster disks also won’t do anything to help the waits we’ll talk about tomorrow, that can for sure be a sign that SQL Server doesn’t have adequate memory.

Thanks for reading!

Going Further

Starting SQL: A Month Of SQL Server Performance Fundamentals Posts

Posted on August 3, 2020May 14, 2022 by Erik Darling

Unlearninging

I’m going to be taking a break in August, but I still want the Blogs To Flow™ while I’m letting my brain meat get its groove back.

So this month I’m going to run some fundamentals posts that I like teaching people about. Apologies in advance to people who expect a constant flow of more advanced stuff.

Why fundamentals? Because I still run into people struggling with the basics more than I run into people struggling with more advanced stuff.

Lots of people think they have advanced problems, but they really just screwed up something basic.

Anyway, my hope is that if you learn the basic stuff the right way, you won’t have to un-learn a bunch of bad habits later.

To Play Along

You’ll need:

As far as hardware and settings go, I have 64GB of RAM, and 8 cores. That means my settings are:

MAXDOP: 4
Cost Threshold for Paralelism: 50
Max Server Memory: 51200

If your hardware and settings don’t exactly match those, you may get different results. These things matter, apparently ?

Stay Curious

You have everything you need to work along with these posts. If you have questions on anything, you can run your own experiments to try to answer your questions.

It’s not that I don’t want you to comment to ask them, but I’m not going to be checking in on stuff as regularly, and I don’t want you to think I’m ignoring you when you could be learning independently. You don’t need my permission to do that!

Over the course of the month, I’ll be talking about how queries get executed, query plans get made, along with table and index design, wait stats, and more. I hope you’ll stick with me, even this material is stuff you’re already comfortable with.

It’s Hard Out There For A Pup

Answer Me

Painted Word

Be Kind, Rewind

Why These Things?

Going Further

Locality

Gold Standard

Sensitivity

Going Further

Options!

Going Further

Erin Stellato!

Going Further

I Won’t Share You

Beans and Beans

I’m Not Touching You

Bu-bu-bu-but wait it gets worse

Step Inside

It’s Not As Funny As It Sounds

Going Further

Repairer Of Reputations

Relatively Speaking

Judgement Night

Costing

Going Further

Unlearninging

To Play Along

Stay Curious

Going Further

You Know, For Kids

Going Further

awks ox

Going Further

Could You Be A Kangaroo?

Going Further