Starting SQL: How Nonclustered Indexes Work In SQL Server

Facsimile


While clustered indexes or heaps are the table, nonclustered indexes are their own special little individual copies of the table data.

They’re ordered by the key columns that you choose, which should probably differ from your clustered index key column(s). There is, of course, no ordering of included columns.

Nonclustered indexes will also have different statistics, and a different set of underlying pages. If you’ve ever wondered why you can drop or disable a corrupt nonclustered index and recreate it, but you can’t do that with a clustered index, it’s because databases hate you and want you to suffer.

Downsides First


There are, of course, downsides to over-indexing a table. Your clustered index is generally there as a good foundation for everything else, but it can’t help everything.

Adding in nonclustered indexes will give you extra objects to:

  • Write to and lock when you modify data
  • Choices for the optimizer when it comes up with query plans
  • Backup, restore, check for corruption, maintain
  • Read into memory when you need to read them or write to them

Some caution needs to be exercised when creating indexes, of course. You’ll want to avoid overly wide indexes, and indexes that incorporate the same columns over and over again.

This can be complicated when creating wide tables that lack proper normalization. When you create tables that have many columns with a similar prefix, like “customer_”, it may be a sign that you those columns should be in a separate table with a key to connect it to other tables. Another sign is numbered columns, like “phone_1”, “phone_2” etc.

The more columns you add to a single table, the more trouble you’ll get into trying to index it. Users will want to search those tables in all variety of ways and want to return many different columns. It becomes quite difficult to effectively index a single table like that.

There are exceptions, of course. If you have a reporting table that is purposely denormalized, and has no transactional meaning, you can often afford more indexes being added to it, or even use columnstore indexes to aid reporting queries.

Mitigating Downsides


There are a number of things you can do to mitigate some of the issues you can run into with a lot of indexes, depending on what’s available to your version and edition of SQL Server.

For example, if you want to minimize locking issues, you should add NOLOCK everywhere. No but seriously, don’t do that unless you don’t care at all. A much better option is an optimistic isolation level. It’d be great if SQL Server used one by default, but it’s pointless to kick dust now.

Having good hardware, like enough memory to cache your heavily trafficked data, and write-friendly storage can also help with many issues around writes.

Of course, the indexes you need are going to be a personal issue. Some tables, and workloads, will be able to afford more indexing than others. Putting numbers on these things often takes some digging.

Coming back to the wide tables thing, you may find it difficult to stick to 5 or 10 indexes that have 5 or 10 columns in them without having a static group of queries that touches the table, and forget it if you write anything resembling “SELECT *” from a table like that.

Over Under


Deciding if a table is over-indexed comes down to looking at usage. If your server has been up for a month or longer, and you see a bunch of indexes that are totally unused by read queries, or queries that have way more writes then reads, you might wanna think about ditching those.

But always do this part first, because it’ll give you a more clear picture of what you should do with indexes that are leftover.

Other indexes that are safe to mess with are ones that have the exact same key columns. If they have different included columns, you can create one index to include them all.

Just remember to look for stuff in these index definitions like filters or uniqueness — those things can make indexes look a lot different to the optimizer.

Next we’ll start to talk about designing effective nonclustered indexes for your queries. Because that’s what we design indexes for, right? We don’t just make them up.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Starting SQL: Picking Clustered Indexes For Tables

Need’em Got’em


Let’s say you’ve either got a heap, like we talked about yesterday, or you’re plotting a new table.

It’s OLTP, and you want the very best for your transactions. What are some good characteristics of a clustered index?

Here’s an easy to remember acronym!

  • Narrow: Numbers, dates, not string-ish
  • Unique: Identities or sequences are good for this
  • Distinct: If not unique, then as distinct as possible
  • Ever: Increasing (append only is the goal, here
  • Static: Definitely not a column you ever update

Follow that, and you’ll most likely never go wrong. Unless you’ve got GUIDs involved, maybe. But at least you’re trying.

Why do we care about those things?

Avoiding Problems


All of those traits help us avoid problems down the line. like shuffling pages in the middle around, or with lopsided data.

The narrow thing is important because, like I’ve mentioned, nonclustered indexes inherit those clustered index key columns. If you assign wide clustered indexes, either via the number of key columns, or via the data types of those key columns, you can end up with quite bloated nonclustered indexes.

The whole thing can be unpleasant.

The important thing to remember is that clustered indexes are a nearly free way to logically order your table data, and have an index that fully covers all of the columns. This is probably a good time to remind you that clustered indexes are not a copy of the table, but the table data itself ordered logically by the key columns.

The Questions


To BIGINT Or Not To BIGINT?

If you’re going to do what most people do and make your clustered index an identity, do yourself a favor and use a BIGINT.

Some people will hem and haw about the 4 extra bytes, but those 4 extra bytes don’t make too much difference until the table gets to be around 2 billion rows. At that point, the pain of altering an INT to a BIGINT can be pretty severe, with some careful planning.

I’ve seen enough people go through heck with it to tell you to avoid the problem from the get go.

But I Need So Many GUIDs

Okay, cool. I understand. GUIDs are hard to run out of. They can be rough if not generated sequentially, and even then… Well, at least they’re unique. Your nonclustered indexes won’t suffer *too* much by inheriting them.

But this is generally where a different design pattern makes sense:

  • Put a nonclustered primary key on your GUID
  • Add a clustered index to a more sensible column

But hey, businesses seem to get angry when they find out identities can have gaps in values. GUIDs totally don’t have that problem ?

Are Multi-Keys Okay?

I guess, if you need a couple columns to make a unique combination, or if you have a couple columns you’re always searching on.

Just make sure you really try to keep them narrow. Strings usually aren’t your friend here.

Action Indexes


But you know, once you’ve got a clustered index in place, it just kinda sits there.

The real fun comes in figuring out nonclustered indexes

  • How many can I have?
  • How many columns can I put in them?

Let’s talk about those a little next.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Starting SQL: Quick Notes About Heaps In SQL Server (Tables Without Clustered Indexes)

Nein Nein Nein


I know, I said we’d talk about indexes, and heaps are basically the opposite of indexes.

Even if you have a heap with a nonclustered primary key, or nonclustered indexes hanging around, the base table is still a heap.

Why? It’s all about that clustered index.

Do you always need a clustered index? Sometimes not. But if you’re running an OLTP workload, you probably do.

It’s not that heaps can’t be useful, it’s just that their best use-case is still loading the big datas.

Downsides Of Heaps


Heaps have a couple issues that are going to sound a lot like index fragmentation. In fact, they’re a lot like index fragmentation.

  • Forwarded fetches from updates
  • Empty pages from deletes

Forwarded Fetches

Forwarded fetches are a lot like logical fragmentation, which is when pages are out of order. Instead of having clustered index keys to follow, we have pointers from where the row used to be to where it got moved to.

This happens because updates change a value that makes a row not fit on a page anymore.

Heaps compound the issue a little bit though, because you can’t seek into a heap to avoid forwarded fetches (though you can seek into any index created on top of a heap). You can seek into a clustered index to avoid logical fragmentation.

Empty Space

Empty space from deletes goes a little bit further, too. You can end up with entire pages being empty, if queries that delete data don’t successfully escalate locks to the table level. Worse, those empty pages get read when the heap is scanned.

Sure, clustered indexes can end up with a bunch of empty space on pages, but when pages are totally emptied out they will get deallocated. Heaps don’t do that without a table level lock during the delete.

The thing is that both of these only happen to the heap itself. If you’ve got nonclustered indexes on the heap and queries mostly use those, you may not notice either of these problems. Perhaps a little bit if you do bookmark lookups.

When They Might Matter


This is largely a size thing. The bigger your heaps are, the more pronounced these problems can be.

It also depends a bit on what heap tables get used for. Or more importantly, who uses them.

If they’re staging tables that end users don’t touch, fixing them probably won’t solve a big problem for you outside of ETL.

But if they’re transactional tables where end users are complaining about performance, you need to do two things: check the query plans for table scans, and then check the heaps for forwarded fetches and deletes.

To fix them, you can rebuild the heap, or create a clustered index. In either case, be very careful if you’ve got nonclustered indexes on the table, because they’ll need to all get rebuilt to either fix the RIDs in a heap, or add the clustered index keys.

Your next question is: Erik, how do I pick a clustered index? We’ll talk about that tomorrow!

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Starting SQL: A Little TOP Can Have A Big Sort In SQL Server Query Plans

Shortcuts


To illustrate this problem a little bit better, I need to use a slightly different query.

While a tidy group by was nice to illustrate some of the simpler points about lookups, it overly complicates things when we want to involve other columns.

The second we need anything else in the select or order by portions of a query, we need to apply aggregates to them, or include them in the group by.

And you see, once you set up a query to return the TOP N rows, there’s an expectation that users get to choose the order they start seeing rows in. As long as we stick to columns whose ordering is supported by an index, things will be pretty stable.

Once we go outside that, a TOP can be rough on a query.

Order By CreationDate


Even if we order by CreationDate in descending order, with the index created in ascending order, we don’t need an explicit sort operation to put data in order.

SELECT TOP (1000)
       p.CreationDate,
       p.OwnerUserId,
       p.Score
FROM dbo.Posts AS p
WHERE p.CreationDate >= '20131015'
ORDER BY p.CreationDate DESC;

There are some additional possibilities for this kind of thing with multi-key indexes that we’ll talk about later in the series, but for now this is a good enough illustration of indexes putting data in order.

SQL Server Query Plan
big chain

Order By Score


Let’s ask for data in a different order now. Score is a convenient villain, because there are lots of times when you might want to see things by a highest whatever metric is commendable to be high. Like not blood pressure, probably.

SELECT TOP (1000)
       p.CreationDate,
       p.OwnerUserId,
       p.Score
FROM dbo.Posts AS p
WHERE p.CreationDate >= '20131015'
ORDER BY p.Score DESC;
SQL Server Query Plan
blown out

Without an index that has Score in the key, we need to physically put the data in order to fit the requirements of the query. Note that in this case, the optimizer no longer sees any benefit to using our nonclustered index.

Why Do We Care About Sorts?


Without jumping too far ahead, Sorts need extra memory to run.

You know, that stuff you cache data in, and you don’t have enough of already?

Yeah, that memory.

The same memory that gives you a conniption every time PLE fluctuates.

How much they ask for is going to depend on:

  • How much memory you have
  • What max server memory is set to
  • The size of the data you need to sort

If we aren’t able to get enough memory, or if we don’t ask for enough up front, data could spill from memory to disk.

When spills get big enough, they can sometimes cause performance issues.

This is a good time to start talking a little bit more about indexes, so we can understand more about how they work and help us solve problems.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Starting SQL: Why SQL Server’s Optimizer Doesn’t Always Choose The Same Index

What’s With Widths?


We don’t need to add a ton of columns to our query to have index usage change, but we do need to go back in time a little bit.

Here’s our query now, with just a one day difference in the where clause.

SELECT p.CreationDate,
       COUNT_BIG(*) AS records
FROM dbo.Posts AS p
WHERE p.CreationDate >= '20131015'
AND   p.OwnerUserId = 22656
GROUP BY p.CreationDate;


SELECT p.CreationDate,
       COUNT_BIG(*) AS records
FROM dbo.Posts AS p
WHERE p.CreationDate >= '20131016'
AND   p.OwnerUserId = 22656
GROUP BY p.CreationDate;

Not quite Halloween, but I’ll take the spooky factor.

The other difference is that now we’ve got the OwnerUserId column in there, which isn’t in our nonclustered index.

It’s in the where clause, not the select list, but if we added it there it would have a similar effect on the query. Either way, we have to do something with this new column, and we have to get it from somewhere.

CREATE INDEX CreationDate ON dbo.Posts(CreationDate);

Things Are Looking Up


The query plans for these will look a little bit different.

SQL Server Query Plan
what’s it for

If you’re not familiar with what just happened here, a Key Lookup is a trick the optimizer has up its sleeve. It uses two indexes from the same table to satisfy one query.

We find rows in our nonclustered index that qualify for the filter on CreationDate. That’s a smart start!

Then we join the nonclustered index to the clustered index to find any columns we might need. Remember the clustered index has all the table columns in it.

Stuff like this is made possible by nonclustered indexes inheriting clustered index key columns. Crazy, right?

The Point At This Point


There are many internal details to explore around Key Lookups. There are even some interesting things about how clustered index keys get stored in nonclustered indexes.

What you need to know about Lookups right now is what they are (which we talked about), and that they represent a choice the optimizer has when it comes to index usage.

If you create a narrow index, say one that satisfies some part of the query like the join or where clause, but doesn’t fully contain all of the columns referenced in your query, it may not get used reliably. The usage is decided based on cardinality estimates. The more rows SQL Server expects, the less likely it is that your narrow index will get used.

For example, it may only get used when the value for CreationDate is estimated to return a small-ish number of rows. Parameterization and plan re-use can make this even more confusing.

Next, we’ll look at how we can encourage the optimizer to choose narrow indexes, and the problems we might run into.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Starting SQL: Messing Up A SQL Server Query’s Performance

Your Fault


Yesterday we created an index, and things went pretty well. Good for us. By next week we’ll have six pack abs.

Today we’re going to look at things that I see happen pretty often in queries that mess with effciency.

If one were to sit and think carefully about the way B-Tree indexes are implemented, where key columns define order, one may see these problems coming a mile away.

Then again, one might expect a full formed datababby system to be able to figure some of these things out and use indexes appropriately anyway.

General Anti-Patterns


I’ve posted this on here a number of times, but here it goes again.

Things that keep SQL Server from being able to seek:

  • Function(Column) = …
  • Column + Column = …
  • Column + Value = …
  • Value + Column = …
  • Column = @Value or @Value IS NULL
  • Column LIKE ‘%…’
  • Some implicit conversions (data type mismatches)

Seeks aren’t always necessary, or even desirable, and likewise scans aren’t always bad or undesirable. But if we’re going to give our queries the best chance of running well, our job is most often to give the optimizer every opportunity to make the right decisions. Being the optimizer is hard enough without us grabbing it by the nose and poking it in the eyes.

To fix code, or make code that looks like that tolerable to The Cool DBA Kids™, there are some options like:

  • Computed columns
  • Dynamic SQL
  • Rewrites

Different options work well in different scenarios, of course. And since we’re here, I might as well foreshadow a future post: These patterns are most harmful when applied to the leading key column of an index. When they’re residual predicates that follow seek predicates, they generally make less of a difference. But we’re not quite there yet.

The general idea, though, is that as soon as we write queries in a way that obscure column data, or introduce uncertainty about what we’re searching for, the optimizer has a more difficult time of things.

Do It Again


Let’s compare a couple different ways of writing yesterday’s query. One good, one bad (in that order).

SELECT p.CreationDate,
       COUNT_BIG(*) AS records
FROM dbo.Posts AS p
WHERE p.CreationDate >= '20131225'
GROUP BY p.CreationDate;

SELECT p.CreationDate,
       COUNT_BIG(*) AS records
FROM dbo.Posts AS p
WHERE YEAR(p.CreationDate) = 2013
AND   MONTH(p.CreationDate) = 12
AND   DAY(p.CreationDate) >= 25
GROUP BY p.CreationDate;

The mistake people often make here is that they think these presentation layer functions have some relational meaning. They don’t.

They’re presentation layer functions. Let’s see those execution plans. Maybe then you’ll believe me.

SQL Server Query Plan
horse and carriage

Things are not so hot when we pile a Mess Of Functions™ into the where clause, are they?

I mean, our CPUs are hot, but that’s generally not what we’re after.

The Use Of Indexes


We could still use our index. Many people will talk about functions preventing the use of indexes, but more precisely we just can’t seek into them.

But you know what can prevent the use of nonclustered indexes? Long select lists.

Next time, we’ll look at that.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Starting SQL: Measuring A SQL Server Query’s Performance Metrics

Outlaw Buttons


Yesterday, we started with a query, and wanted to add an index to make it faster. Remember, indexes help queries find and relate data by putting it in order.

In the weird old days, we might use SET STATISTICS TIME, IO ON; to measure a query. If you’re still using old (unsupported) versions of SQL Server, I understand if you have to continue to use those.

For those of us living in the present (and near past), we can get the detail we need from actual execution plans.

Look, you’re probably getting those anyway if you’re trying to tune a query with any seriousness. You might as well make the most of them.

Something New


When most people look at execution plans, they neglect one of the most important parts: operator properties.

Once you got a plan in front of you, you can either highlight an operator and hit F4, or right click on one and choose “Properties”. A little window pane should appear over to the right in SSMS, and it’ll have a whole bunch of useful information in it that doesn’t show up in the tool tips you get when you hover over operators.

SQL Server Query Plan
mcgruff

What you can see over there in detail is information about reads and operator time.

SQL Server Query Plan Properties
signs and measures

Interpreting Stuff


Logical reads is the number of 8KB pages the query read through. I used to put a lot of stock in tuning queries to do fewer reads, but I’ve run into too many situations where a faster query did more reads. I no longer consider it to be a great indicator of performance being better.

If logical reads incidentally go down, great. If not, whatever.

The reason this is better to get from execution plans is because if you use the SET STATISTICS command, it only tells you reads at the table level, not at the index level. If you have more complicated query plans that use multiple indexes from the same table, you can’t really tell which did what. Getting it from the query plan is easier to interpret.

Down the bottom is some more useful information about CPU and elapsed time. I know what you’re thinking: that’s incredible, I can see how long each operator runs for. And that’s true; you can. But it’s a little more complicated depending on what kind of plan you’re looking at.

  • If the plan is all row mode operators, time is cumulative going from right to left
  • If the plan is all batch mode, time is per operator
  • If the plan is a mix, then the time will also be mixed

It’s a little confusing, I know. But that confusion is going to become more common as you nice people start using SQL Server 2019 and seeing Batch Mode On Rowstore. Otherwise, you’ll really only see Batch Mode when columnstore is involved.

If you use this feature enough, you’ll undoubtedly run into rowmode query plans where the times are inconsistent, especially around parallel exchanges. Don’t take it personally — accounting for parallel thread and operator times is difficult. Things will most often even out as you read through the plan.

For more detail on timing stuff, check out my videos here and here on it.

For Real Though


Let’s look at our query with and without an index.

CREATE INDEX CreationDate ON dbo.Posts(CreationDate);

SELECT p.CreationDate,
       COUNT_BIG(*) AS records
FROM dbo.Posts AS p
WHERE p.CreationDate >= '20131225'
GROUP BY p.CreationDate;

It’s pretty easy to see what got better by looking at the execution plans.

SQL Server Query Plan
better butter

With an index on CreationDate, we avoid the clustered index scan and the need to go parallel to compensate for not having an opportune index.

We can seek easily to the dates we care about, because they’re ordered in a way that makes it easy to find them.

SQL Server Query Plan Properties
slick

Okay, Great


Now we know how to figure out if we did something right when adding an index.

Tomorrow we’ll look at ways you can write your queries that might interfere with indexes being used efficiently.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Starting SQL: SQL Server Indexes Store Key Column Data In Order

Index Skit (0:39)


The most common types of indexes you’re going to see in your life are of the rowstore variety.

As much as I love columnstore, it’s probably not something you’re going to see a whole lot outside of data warehouses. Sure, some folks will have reporting over OLTP, and might find some utility for them, but they can be tough to manage with all those tiny modifications.

Let’s stick with the stuff that’ll help you the most: clustered and nonclustered rowstore indexes.

There are important things to know about indexes, and we’ll get more in-depth later on. For now, let’s talk about how they can help a query.

Clustered


Our table has a clustered index on it, which is also playing the part of a primary key. The primary key attribute makes it unique, of course. By default, if you create a primary key, it’ll also be used as the clustered index key. If you only create a clustered index, it won’t be unique by default.

Let’s not get bogged down there, though. Here’s our index.

SQL Server Management Studio Table
so clustered

We can identify clustered indexes and which columns are in them pretty easily in SSMS.

To simplify a bit, the clustered index is all these columns:

SQL Server Management Studio Table
every single one

Ordered by this column:

SQL Server Management Studio Table
businessing

That’s Great But


Having the Id column in order doesn’t help us find data in other columns, because they’re not in order.

Let’s say we wanted to find posts by CreationDate. The values for it aren’t in an order that helps us search through them.

SQL Server Management Studio Table
where are they now?

If our query is only concerned with the CreationDate column we can create a single-column index on it. As queries become more complicated and involve more columns, we need to consider wider indexes sometimes so that they stand a better chance of getting used, but we’ll come back to that later.

Here’s our overly-simple query.

SELECT p.CreationDate,
       COUNT_BIG(*) AS records
FROM dbo.Posts AS p
WHERE p.CreationDate >= '20131225'
GROUP BY p.CreationDate;

And here’s how the optimizer decides to answer our query.

SQL Server Query Plan
down and out

We have to scan all of the data pages in the clustered index looking for CreationDates that match our predicate.

Make It Plain


It’s not such a crazy idea to create additional indexes to speed up queries, but how exactly do they do that?

What is it about indexes that magically make queries go faster? According to the title, they put data in order, so let’s go with that.

It’s easy enough to create a helpful index here.

CREATE INDEX CreationDate ON dbo.Posts(CreationDate);

Tomorrow, we’ll look at ways to see if our index gets used, and different ways to measure if it improves our query.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Starting SQL: Written Queries vs Executed Queries In SQL Server (Logical Query Processing)

You Know You Wanna


Every time you push F5, decades of research, code, and math kick in and have their way with your query. No, really. They do.

Your query is a question that the database has to answer, and not only does it have to make sure that your syntax is correct, and the objects you’re asking questions of exist, but it also needs to figure out the cheapest way to come up with that answer.

How you design tables and indexes and constraints, and write your queries, can help better choices get made. But we’re getting ahead of ourselves a little bit, aren’t we?

The first thing we have to talk about is the difference between how you write your query, and how the database sees your query.

It’s Trees All The Way Down


You start your query with a select, list a bunch of columns, maybe calculate some expressions, or do some aggregations. Sometimes you can throw a top or distinct in there for fun.

Then you mosey on to from, maybe with a join or forty, and hopefully a where clause that’ll filter out some rows. Good stuff. I’m proud of you already.

If you’re feeling extra spicy (or if necessitated by the abovesaid aggregation), you might group by some columns. It’s nice to not get error messages, after all.

After that, you’ve got your order by, if you so choose. Or if your query needs data in a particular order. Remember, without an order by, you don’t get data in a special order.

Of course, there are some less common things you may do along the way. You might use cube or rollup, a having clause, or use offset/fetch.

But the point is that the optimizer rearranges things a little bit. That old dog goes from > where > group by > having > select > order by.

Wild, right? But what does that mean for you?

It’s Tables All The Way Down, Too


When you run a query, the results are tabular, but not necessarily a table. That’s why you’re allowed to do certain things when you nest parts of the query that you can’t do without some nesting.

For example, I can reference an alias in the order by, because the optimizer processes the select list first:

SELECT p.PostTypeId,
       COUNT_BIG(*) AS records
FROM dbo.Posts AS p
GROUP BY p.PostTypeId
ORDER BY records DESC;

But I can’t reference that alias in the where clause, because it hasn’t been materialized yet.

SELECT p.PostTypeId,
       COUNT_BIG(*) AS records
FROM dbo.Posts AS p
WHERE records > 1000
GROUP BY p.PostTypeId
ORDER BY records DESC;

I know, I know. This is what having is for. But you’re jumping ahead again.

The point I’m getting to is that with a little nesting, we can take those tabular results and work with them in a way that’s totally tubular.

SELECT *
FROM 
(
    SELECT p.PostTypeId,
           COUNT_BIG(*) AS records
    FROM dbo.Posts AS p
    GROUP BY p.PostTypeId
) AS not_a_table
WHERE not_a_table.records > 1000
ORDER BY records DESC;

Virtually Yours


The thing is, that internal query isn’t physically materialized anywhere. The result set is computed, and the filter is applied against that result.

That means you have to calculate it every single time. The same goes for if you put it in a (non-materialized) view. That view will run the query and calculate the result and return it to the user, but it won’t physically exist anywhere.

For example, let’s look at two logically equivalent queries.

SELECT *
FROM 
(
    SELECT p.PostTypeId,
           COUNT_BIG(*) AS records
    FROM dbo.Posts AS p
    GROUP BY p.PostTypeId
) AS not_a_table
WHERE not_a_table.records > 1000
ORDER BY records DESC;

SELECT p.PostTypeId,
       COUNT_BIG(*) AS records
FROM dbo.Posts AS p
GROUP BY p.PostTypeId
HAVING COUNT_BIG(*) > 1000
ORDER BY records DESC;

When we run them, the optimizer comes up with identical physical implementations of both queries.

SQL Server Query Plan
little red corvette

If I run them both a few times, the operator times will wiggle back and forth some, but the plans don’t change, and they both take in the neighborhood of two seconds each.

The point here is that there’s no performance advantage right now to either query form. So if this is something you need to calculate frequently, you might start looking at ways to physically persist the result. That way queries that need it would have to do less work.

Tomorrow, we’ll start to look at how indexes can help things out.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Starting SQL: A Month Of SQL Server Performance Fundamentals Posts

Unlearninging


I’m going to be taking a break in August, but I still want the Blogs To Flow™ while I’m letting my brain meat get its groove back.

So this month I’m going to run some fundamentals posts that I like teaching people about. Apologies in advance to people who expect a constant flow of more advanced stuff.

Why fundamentals? Because I still run into people struggling with the basics more than I run into people struggling with more advanced stuff.

Lots of people think they have advanced problems, but they really just screwed up something basic.

Anyway, my hope is that if you learn the basic stuff the right way, you won’t have to un-learn a bunch of bad habits later.

To Play Along


You’ll need:

As far as hardware and settings go, I have 64GB of RAM, and 8 cores. That means my settings are:

  • MAXDOP: 4
  • Cost Threshold for Paralelism: 50
  • Max Server Memory: 51200

If your hardware and settings don’t exactly match those, you may get different results. These things matter, apparently ?

Stay Curious


You have everything you need to work along with these posts. If you have questions on anything, you can run your own experiments to try to answer your questions.

It’s not that I don’t want you to comment to ask them, but I’m not going to be checking in on stuff as regularly, and I don’t want you to think I’m ignoring you when you could be learning independently. You don’t need my permission to do that!

Over the course of the month, I’ll be talking about how queries get executed, query plans get made, along with table and index design, wait stats, and more. I hope you’ll stick with me, even this material is stuff you’re already comfortable with.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.