Live SQL Server Q&A!

ICYMI


Last week’s thrilling, stunning, flawless episode of whatever-you-wanna-call-it.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

How Many Indexes Is Too Many In SQL Server?

To Taste


Indexes remind me of salt. And no, not because they’re fun to put on slugs.

More because it’s easy to tell when there’s too little or too much indexing going on. Just like when you taste food it’s easy to tell when there’s too much or too little salt.

Salt is also one of the few ingredients that is accepted across the board in chili.

To continue feeding a dead horse, the amount of indexing that each workload and system needs and can handle can vary quite a bit.

Appetite For Nonclustered


I’m not going to get into the whole clustered index thing here. My stance is that I’d rather take a chance having one than not having one on a table (staging tables aside). Sort of like a pocket knife: I’d rather have it and not need it than need it and not have it.

At some point, you’ve gotta come to terms with the fact that you need nonclustered indexes to help your queries.

But which ones should you add? Where do you even start?

Let’s walk through your options.

If Everything Is Awful


It’s time to review those missing index requests. My favorite tool for that is sp_BlitzIndex, of course.

Now, I know, those missing index requests aren’t perfect.

There are oodles of limitations, the way they’re presented is weird, and there are lots of reasons they may not be there. But if everything is on fire and you have no idea what to do, this is often a good-enough bridge until you’ve got more experience, or more time to figure out better indexes.

I’m gonna share an industry secret with you: No one else looking at your server for the first time is going to have a better idea. Knowing what indexes you need often takes time and domain/workload knowledge.

If you’re using sp_Blitzindex, take note of a few things:

  • How long the server has been up for: Less than a week is usually pretty weak evidence
  • The “Estimated Benefit” number: If it’s less than 5 million, you may wanna put it to the side in favor of more useful indexes in round one
  • Duplicate requests: There may be several requests for indexes on the same table with similar definitions that you can consolidate
  • Insane lists of Includes: If you see requests on (one or a few key columns) and include (every other column in the table), try just adding the key columns first

Of course, I know you’re gonna test all these in Dev first, so I won’t spend too much time on that aspect ?

If One Query Is Awful


You’re gonna wanna look at the query plan — there may be an imperfect missing index request in there.

SQL Server Query Plan
Hip Hop Hooray

And yeah, these are just the missing index requests that end up in the DMVs added to the query plan XML.

They’re not any better, and they’re subject to the same rules and problems. And they’re not even ordered by Impact.

Cute. Real cute.

sp_BlitzCache will show them to you by Impact, but that requires you being able to get the query from the plan cache, which isn’t always possible.

If You Don’t Trust Missing Index Requests


And trust me, I’m with you there, think about the kind of things indexes are good at helping queries do:

  • Find data
  • Join data
  • Order data
  • Group data

Keeping those basic things in mind can help you start designing much smarter indexes than SQL Server can give you.

You can start finding all sorts of things in your query plans that indexes might change.

Check out my talk at SQLBits about indexes for some cool examples.

And of course, if you need help doing it, I’m here for just that sort of thing.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Columnstore Precon Details

I thought it could be helpful to go into more detail for what I plan to present at the columnstore precon that’s part of SQL Saturday New York City. Note that everything here is subject to change. I tried to include all of the main topics planned at this time. If you’re attending and really want to see something covered, let me know and I’ll be as transparent as possible about what I can and cannot do.

  • Part 1: Creating your table
    • Definitions
    • Delta rowgroups
    • How columnstore compression works
    • What can go wrong with compression
    • Picking the right data types
    • Deciding on partitioning
    • Indexing
    • When should you try columnstore?
  • Part 2: Loading data quickly
    • Serial inserts
    • Parallel inserts
    • Inserts into partitioned tables
    • Inserts to preserve order
    • Deleting rows
    • Updating rows
    • Better alternatives
    • Trickle insert – this is a maybe
    • Snapshot isolation and ADR
    • Loading data on large servers
  • Part 3: Querying your data
    • The value of maintenance
    • The value of patching
    • How I read execution plans
    • Columnstore/batch mode gotchas with DMVs and execution plans
    • Columnstore features
    • Batch mode topics
    • When is batch mode slower than row mode?
    • Batch mode on rowstore
    • Bad ideas with columnstore
  • Part 4: Maintaining your data
    • Misconceptions about columnstore
    • Deciding what’s important
    • Evaluating popular maintenance solutions
    • REORG
    • REBUILD
    • The tuple mover
    • Better alternatives

I hope to see a few of you there!

SQL Server Performance Problem Solving: Why I Love Consulting

30 Second Abs


I love helping people solve their problems. That’s probably why I stick around doing what I do instead of opening a gym.

That and I can’t deal with clanking for more than an hour at a time.

Recently I helped a client solve a problem with a broad set of causes, and it was a lot of fun uncovering the right data points to paint a picture of the problem and how to solve it.

Why So Slow?


It all started with an application. At times it would slow down.

No one was sure why, or what was happening when it did.

Over the course of looking at the server together, here’s what we found:

  • The server had 32 GB of RAM, and 200 GB of data
  • There were a significant number of long locking waits
  • There were a significant number of PAGEIOLATCH_** waits
  • Indexes had fill factor of 70%
  • There were many unused indexes on important tables
  • The error log was full of 15 second IO warnings
  • There was a network choke point where 10Gb Ethernet went to 1Gb iSCSI

Putting together the evidence


Alone, none of these data points means much. Even the 15 second I/O warnings could just be happening at night, when no one cares.

But when you put them all together, you can see exactly what the problem is.

Server memory wasn’t being used well, both because indexes had a very low fill factor (lots of empty space), and because indexes that don’t help queries had to get maintained by modification queries. That contributed to the PAGEIOLATCH_** waits.

Lots of indexes means more objects to lock, which generally means more locking waits.

Because we couldn’t make good use of memory, and we had to go to disk a lot, the poorly chosen 1Gb iSCSI connections were overwhelmed.

To give you an idea of how bad things were, the 1Gb iSCSI connections were only moving data at around USB 2 speeds.

Putting together the plan


Most of the problems could be solved with two easy changes: getting rid of unused indexes, and raising fill factor to 100. The size of data SQL Server would regularly have to deal with would drop drastically, so we’d be making better use of memory, and be less reliant on disk.

Making those two changes fixed 90% of their problems. Those are great short term wins. There was still some blocking, but user complaints disappeared.

We could make more changes, like adding memory to be even less reliant on disk, and replacing the 1Gb connections, sure. But the more immediate solution didn’t require any downtime, outages, or visits to the data center, and it bought them time to carefully plan those changes out.

If you’re hitting SQL Server problems that you just can’t get a handle on, drop me a line!

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Do You Need A Presenter?

Talk Talk


If you need a speaker for your User Group, SQL Saturday, or other conference, drop me a line!

I have lots of great material about performance tuning queries and indexes that I’m psyched about sharing.

I’m also available for in-person paid training on the same subjects, whether it’s a full day pre-con or corporate training for your staff.

Like my blog content, video content, or writing style?

Or, whatever, maybe you’re just lonely?

Hire me today!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

What’s The Point of 1 = (SELECT 1) In SQL Server Queries?

In Shorts


That silly looking subquery avoids two things:

  • Trivial Plan
  • Simple Parameterization

I use it often in my demo queries because I try to make the base query to show some behavior as simple as possible. That doesn’t always work out, but, whatever. I’m just a bouncer, after all.

The problem with very simple queries is that they may not trigger the parts of the optimizer that display the behavior I’m after. This is the result of them only reaching trivial optimization. For example, trivial plans will not go parallel.

If there’s one downside to making the query as simple as possible and using 1 = (SELECT 1), is that people get very distracted by it. Sometimes I think it would be less distracting to make the query complicated and make a joke about it instead.

The Trouble With Trivial


I already mentioned that trivial plans will never go parallel. That’s because they never reach that stage of optimization.

They also don’t reach the “index matching” portion of query optimization, which may trigger missing index requests, with all their fault and frailty.

	/*Nothing for you*/
	SELECT *
	FROM dbo.Users AS u
	WHERE u.Reputation = 2;

	/*Missing index requests*/
	SELECT *
	FROM dbo.Users AS u
	WHERE u.Reputation = 2
	AND 1 = (SELECT 1);
SQL Server Query Plan
>greentext

Note that the bottom query gets a missing index request, and is not simple parameterized. The only reason the first query takes ~2x as long as the second query is because the cache was cold. In subsequent runs, they’re equal enough.

What Gets Fully Optimized?


Generally, things that introduce cost based decisions, and/or inflate the cost of a query > Cost Threshold for Parallelism.

  • Joins
  • Subqueries
  • Aggregations
  • Ordering without a supporting index

As a quick example, these two queries are fairly similar, but…

	/*Unique column*/
	SELECT TOP 1000 u.Id --Not this!
	FROM dbo.Users AS u
	GROUP BY u.Id;

	/*Non-unique column*/
	SELECT TOP 1000 u.Reputation --But this will!
	FROM dbo.Users AS u
	GROUP BY u.Reputation;

One attempts to aggregate a unique column (the pk of the Users table), and the other aggregates a non-unique column.

The optimizer is smart about this:

SQL Server Query Plan
Flowy

The first query is trivially optimized. If you want to see this, hit F4 when you’re looking at a query plan. Highlight the root operator (select, insert, update, delete — whatever), and look at the optimization level.

SQL Server Query Plan Properties
wouldacouldashoulda

Since aggregations have no effect on unique columns, the optimizer throws the group by away. Keep in mind, the optimizer has to know a column is unique for that to happen. It has to be guaranteed by a uniqueness constraint of some kind: primary key, unique index, unique constraint.

The second query introduces a choice, though! What’s the cheapest way to aggregate the Reputation column? Hash Match Aggregate? Stream Aggregate? Sort Distinct? The optimizer had to make a choice, so the optimization level is full.

What About Indexes?


Another component of trivial plan choice is when the choice of index is completely obvious. I typically see it when there’s either a) only a clustered index or b) when there’s a covering nonclustered index.

If there’s a non-covering nonclustered index, the choice of a key lookup vs. clustered index scan introduces that cost based decision, so trivial plans go out the window.

Here’s an example:

	CREATE INDEX ix_creationdate 
	    ON dbo.Users(CreationDate);

	SELECT u.CreationDate, u.Id
	FROM dbo.Users AS u 
	WHERE u.CreationDate >= '20131229';

	SELECT u.Reputation, u.Id
	FROM dbo.Users AS u 
	WHERE u.Reputation = 2;

	SELECT u.Reputation, u.Id
	FROM dbo.Users AS u WITH(INDEX = ix_creationdate)
	WHERE u.Reputation = 2;

With an index only on CreationDate, the first query gets a trivial plan. There’s no cost based decision, and the index we created covers the query fully.

For the next two queries, the optimization level is full. The optimizer had a choice, illustrated by the third query. Thankfully it isn’t one that gets chosen unless we force the issue with a hint. It’s a very bad choice, but it exists.

When It’s Wack


Let’s say you create a constraint, because u loev ur datea.

	ALTER TABLE dbo.Users
        ADD CONSTRAINT cx_rep CHECK 
			( Reputation >= 1 AND Reputation <= 2000000 );

When we run this query, our newly created and trusted constraint should let it bail out without doing any work.

	SELECT u.DisplayName, u.Age, u.Reputation
    FROM dbo.Users AS u
    WHERE u.Reputation = 0;

But two things happen:

SQL Server Query Plan
my name is bogus

The plan is trivial, and it’s auto-parameterized.

The auto-parameterization means a plan is chosen where the literal value 0 is replaced with a parameter by SQL Server. This is normally “okay”, because it promotes plan reuse. However, in this case, the auto-parameterized plan has to be safe for any value we pass in. Sure, it was 0 this time, but next time it could be one within the range of valid reputations.

Since we don’t have an index on Reputation, we have to read the entire table. If we had an index on Reputation, it would still result in a lot of extra reads, but I’m using the clustered index here for ~dramatic effect~

Table 'Users'. Scan count 1, logical reads 44440

Of course, adding the 1 = (SELECT 1) thing to the end introduces full optimization, and prevents this.

The query plan without it is just a constant scan, and it does 0 reads.

Rounding Down


So there you have it. When you see me (or anyone else) use 1 = (SELECT 1), this is why. Sometimes when you write demos, a trivial plan or auto-parameterization can mess things up. The easiest way to get around it is to add that to the end of a query.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Using System Functions As SQL Injection In SQL Server

One More Thing


I always try to impart on people that SQL injection isn’t necessarily about vandalizing or trashing data in some way.

Often it’s about getting data. One great way to figure out how difficult it might be to get that data is to figure out who you’re logged in as.

There’s a somewhat easy way to figure out if you’re logged in as sa.

Wanna see it?

Still Ill


	SELECT SUSER_SID();  
	SELECT CONVERT(INT, SUSER_SID()); 
	SELECT SUSER_NAME(1);
	
    DECLARE @Top INT = 1000 + (SELECT CONVERT(INT, SUSER_SID()));

	SELECT TOP (@Top)	       
    		u.Id, u.DisplayName, u.Reputation, u.CreationDate
    FROM dbo.Users AS u
    ORDER BY u.Reputation DESC;
    GO

It doesn’t even require dynamic SQL.

All you need is a user entry field to do something like pass in how many records you want returned.

The results of the first three selects looks like this:

SQL Server Query Results
Full Size

This is always the case for the sa login.

If your app is logged in using it, the results of the TOP will return 1001 rows rather than 1000 rows.

If it’s a different login, the number could end up being positive or negative, and so a little bit more difficult to work with.

But hey! Things.

Validation


Be mindful of those input fields.

Lots of times, I’ll see people have what should be integer fields accept string values so users can use shortcut codes.

For example, let’s say we wanted someone to be able to select all available rows without making them memorize the integer maximum.

We might use a text field so someone could say “all” instead of 2147483647.

Then uh, you know.

Out comes ISNUMERIC and all its failings.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Join Joe Obbish In NYC On October 4th To Learn Everything About Columnstore

Second To None


Prior to this year’s SQL Saturday in NYC, we’re running one very special precon, with Joe Obbish:

Joe Obbish – Clustered Columnstore For Performance

Buy Tickets Here!

Clustered columnstore indexes can be a great solution for data warehouse workloads, but there’s not a lot of advanced training or detailed documentation out there. It’s easy to feel all alone when you want a second opinion or run into a problem with query performance or data loading that you don’t know how to solve.

In this full day session, I’ll teach you the most important things I know about clustered columnstore indexes. Specifically, I’ll teach you how to make the right choices with your schema, data loads, query tuning, and columnstore maintenance. All of these lessons have been learned the hard way with 4 TB of production data on large, 96+ core servers. Material is applicable from SQL Server 2016 through 2019.

Here’s what I’ll be talking about:

– How columnstore compression works and tips for picking the right data types
– Loading columnstore data quickly, especially on large servers
– Improving query performance on columnstore tables
– Maintaining your columnstore tables

This is an advanced level session. To get the most out of the material, attendees should have some practical experience with columnstore and query tuning, and a solid understanding of internals such as wait stats analysis. You don’t need to bring a laptop to follow along.

Buy Tickets Here!

Wanna save 25%? Use coupon code “actionjoe” at checkout — it’s good for the first 10 seats, so hurry up and get yours today.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Can SQL Server 2019’s Scalar UDF Inlining Fix This Performance Problem?

Snakey TOP


Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.