The How To Write SQL Server Queries Correctly Cheat Sheet: Joins

So Many Choices


SQL Server is full of landmines options when you’re writing queries. For most queries, you don’t need much beyond the basics.

Think of your standard CRUD operations. Most don’t even require a join; they’re very straightforward. And hey, if you completely denormalize all your data to one huge table, you’ll never have to think about a lot of this stuff anyway.

It’s only when developers are forced to think about things that things start to go wrong. I don’t mean to pick on developers specifically. It’s the human condition. Thinking often leads to poor choices.

In this post, I’m going to give you some basic guidance on when to use various T-SQL facilities, based on years of finding, fixing, and writing queries.

Some of the details and information may not surprise the more seasoned and spiced of you out there.

Here’s a piece of advice that I give everyone: Always start with a SELECT. I don’t care if the final form of your query is going to be an insert, update, or delete (I do care if it’s going to be a merge, because ew), you should always start off by writing a select, so you can validate query results first. It’s easy enough to change things over when you’re done, but please make sure what you’re changing is what you expect to change. I’d even go one step further and say that the first time you run your modification query, you should do it in a transaction with a ROLLBACK command.

I’ll usually do some variation on this, so I can see inserted and deleted results easily:

BEGIN TRANSACTION
    UPDATE TOP (100)
        u
    SET u.Reputation += 1000
    OUTPUT
        'D' AS d, Deleted.*,
        'I' AS i, Inserted.*
    FROM dbo.Users AS u
    WHERE u.Reputation < 1000
    AND   u.Reputation > 1;
ROLLBACK TRANSACTION;

Anyway, on to the cheat codes.

Inner Joins


Joins combine data horizontally (sideways, for the forgetful). The most basic thing you can do with two tables in a database, really.

The important thing to remember is that in one-to-many, and many-to-many relationships, joins will display duplicate matched values.

If you don’t need to show data from another table, don’t use a join. We’ll talk about other options later, but please let this burn into your mind. The number of queries I’ve seen with needless DISTINCT instructions on them is nearing a decent pre-tax cash bonus.

Here’s an example of when a join is necessary. We want to get all of our Users with a Reputation over 500,000, and sum up the Score on all their Posts, plus figure out what kind of Post the points were awarded to.

SELECT
    u.Id,
    u.DisplayName,
    PostType =
        CASE
             p.PostTypeId
             WHEN 1
             THEN 'Question'
             WHEN 2
             THEN 'Answer'
             ELSE 'Other'
        END,
    TotalScore = SUM(p.Score)
FROM dbo.Users AS u
JOIN dbo.Posts AS p
  ON p.OwnerUserId = u.Id
WHERE u.Reputation > 500000
GROUP BY
    u.Id,
    u.DisplayName,
    p.PostTypeId
ORDER BY
    TotalScore DESC;

Because we need multiple columns from the Posts table, we can’t just use a correlated subquery in the select list. Those only allow for one column or expression to be projected from the results.

Since this is an inner join, it restricts the results down only to matching rows. Now, it’s not really possible to get a Reputation over 1 without posting things that other users can vote on, so it doesn’t make sense to use an outer join here.

What if we wanted to find slightly different data?

(Left) Outer Joins


Let’s say we wanted to generate a report of people whose Reputation is sitting at one (the site minimum), to figure out if they’re inactive, unpopular, or if their account has been suspended for some reason.

We could use a query like this to do it.

SELECT
    u.Id,
    u.DisplayName,
    u.Reputation,
    TotalScore = SUM(p.Score),
    c = COUNT_BIG(p.Id)
FROM dbo.Users AS u
LEFT JOIN dbo.Posts AS p
  ON p.OwnerUserId = u.Id
WHERE u.Reputation = 1
GROUP BY 
    u.Id,
    u.DisplayName,
    u.Reputation
ORDER BY
    TotalScore;

Before talking about the logic, it’s important to note that when you’re counting rows from the outer side of a join, you’ll usually wanna specify a non-nullable column to pass into the counting function, rather than (*), so you don’t incorrectly count NULL values.

Primary key columns are your friend for this, but any non-NULLable column will do.

We need a left join here, because we want everyone with a Reputation of 1, not just those users who have posted. The left join preserves rows from the Users table in that case.

The results we get back find all sorts of interesting things (that I told you we were looking for):

  1. Users who were very active, but then had their accounts suspended
  2. Users who have posted, but were heavily downvoted
  3. Users who haven’t posted at all
sql server query results
bad, ugly, lazy

I’m not going to talk about right outer joins, because that’s the foolish domain of characterless buffoons who use Venn diagrams to explain join results.

I assume they have good intentions, they just lack the backbone to tell you that there is no natural reason to ever use a right join, that isn’t better logically expressed in a different way.

They’re usually trying to sell you something.

(Full) Outer Joins


In short, these preserve results from both tables, but still with a correlation. I’d nearly put these in the same category as right joins, except they have a couple decent use cases, and aren’t personally offensive to polite society.

Let’s say we want to figure out how many Posts don’t have an associated User, and how many Users don’t have an associated Post all in one query:

SELECT
    PostsWithoutAUser = 
        SUM(CASE WHEN u.Id IS NULL THEN 1 ELSE 0 END),
    UsersWithoutAPost = 
        SUM(CASE WHEN p.Id IS NULL THEN 1 ELSE 0 END)
FROM dbo.Users AS u
FULL JOIN dbo.Posts AS p
  ON p.OwnerUserId = u.Id;

It’s sort of an exception report, to let you know just how much referential integrity your data lacks.

Aside from oddball situations, you shouldn’t have to think much about these in your day to day life.

Cross Joins


Like full joins, I don’t see cross joins used terribly often, though they do have some uses, like populating a grid.

A reasonably worded example would be something like: you have a table of scotch, and a table of glass sizes, and you want to show someone all possible combinations of scotch and glass sizes.

If you pick a big enough glass, eventually using cross joins in more creative ways will seem like a good idea. One place I’ve been forced to use them is in some of my stored procedures, like sp_PressureDetctor.

Here’s one example:

SELECT
    sample_time =
        CONVERT
        (
            datetime,
            DATEADD
            (
                SECOND,
                (t.timestamp - osi.ms_ticks) / 1000,
                SYSDATETIME()
            )
        ),
    sqlserver_cpu_utilization =
        t.record.value('(Record/SchedulerMonitorEvent/SystemHealth/ProcessUtilization)[1]','int'),
    other_process_cpu_utilization =
        (100 - t.record.value('(Record/SchedulerMonitorEvent/SystemHealth/ProcessUtilization)[1]','int')
         - t.record.value('(Record/SchedulerMonitorEvent/SystemHealth/SystemIdle)[1]','int')),
    total_cpu_utilization =
        (100 - t.record.value('(Record/SchedulerMonitorEvent/SystemHealth/SystemIdle)[1]', 'int'))
FROM sys.dm_os_sys_info AS osi
CROSS JOIN
(
    SELECT
        dorb.timestamp,
        record =
            CONVERT(xml, dorb.record)
    FROM sys.dm_os_ring_buffers AS dorb
    WHERE dorb.ring_buffer_type = N'RING_BUFFER_SCHEDULER_MONITOR'
) AS t
WHERE t.record.exist('(Record/SchedulerMonitorEvent/SystemHealth/ProcessUtilization[.>= sql:variable("@cpu_utilization_threshold")])') = 1
ORDER BY
    sample_time DESC;

The sys.dm_os_sys_info view is a single row, with no relation at all to sys.dm_os_ring_buffers, but I need to use the one value in the one column in the one row for every row that it produces, so that I can turn the timetable column into a human-understandable value.

Here’s another example from the same procedure, slightly abridged:

SELECT
    total_threads =
        MAX(osi.max_workers_count),
    used_threads =
        SUM(dos.active_workers_count),
    available_threads =
        MAX(osi.max_workers_count) - SUM(dos.active_workers_count),
    threads_waiting_for_cpu =
        SUM(dos.runnable_tasks_count),
    requests_waiting_for_threads =
        SUM(dos.work_queue_count),
    current_workers =
        SUM(dos.current_workers_count),
    total_active_request_count =
        SUM(wg.active_request_count),
    total_queued_request_count =
        SUM(wg.queued_request_count),
    total_blocked_task_count =
        SUM(wg.blocked_task_count),
    total_active_parallel_thread_count =
        SUM(wg.active_parallel_thread_count),
    avg_runnable_tasks_count =
        AVG(dos.runnable_tasks_count)
FROM sys.dm_os_schedulers AS dos
CROSS JOIN sys.dm_os_sys_info AS osi
CROSS JOIN
(
    SELECT
        wg.active_request_count,
        wg.queued_request_count,
        wg.blocked_task_count,
        wg.active_parallel_thread_count
    FROM sys.dm_resource_governor_workload_groups AS wg      
) AS wg;

In this case, I keep myself safe from exploding result sets by aggregating all of the selected columns. You may also find that necessary, should you choose to work with data so terrible that it requires cross joins.

One thing to be especially aware of is that cross joins can only be physically implemented in SQL Server with a nested loops join, so the larger your tables get, the worse performance will get.

Beware out there.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Join me at DataTune in Nashville, March 8-9 2024

Spring Training


This March, I’ll be presenting my full day training session The Foundations Of SQL Server Performance Tuning.

All attendees will get free access for life to my SQL Server performance tuning training. That’s about 25 hours of great content.

Get your tickets here for this event, taking place Friday, March 8th-9th 2024 at Belmont University – Massey Center 1900 Belmont Blvd, Nashville, TN 37212

Here’s what I’ll be presenting:

The Foundations Of SQL Server Performance Tuning

Session Abstract:

Whether you want to be the next great query tuning wizard, or you just need to learn how to start solving tough business problems at work, you need a solid understanding of not only what makes things fast, but also what makes them slow.

I work with consulting clients worldwide fixing complex SQL Server performance problems. I want to teach you how to do the same thing using the same troubleshooting tools and techniques I do.

I’m going to crack open my bag of tricks and show you exactly how I find which queries to tune, indexes to add, and changes to make. In this day long session, you’re going to learn about hardware, query rewrites that work, effective index design patterns, and more.

Before you get to the cutting edge, you need to have a good foundation. I’m going to teach you how to find and fix performance problems with confidence.

Event Details:

Get your tickets here for this event!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

What SQL Server’s Query Optimizer Doesn’t Know About Numbers

What SQL Server’s Query Optimizer Doesn’t Know About Numbers


Video Summary

In this video, I delve into the nuances of SQL Server’s query optimizer and its sometimes surprising behavior when dealing with integer values. Specifically, I explore how the same set of indexes can lead to vastly different execution plans based on slight variations in query syntax—demonstrating that even mature software like SQL Server isn’t always as smart as we might expect regarding simple logical equivalences. Through a series of queries and detailed execution plan analysis, I highlight the importance of being mindful of how you phrase your SQL statements, as seemingly minor differences can significantly impact performance.

Full Transcript

Erik Darling here with Darling Data. My hair has reached a level of absurdity that it has not seen, I don’t know, at least since various times in my life when I’ve tried to cut my own hair. Still waiting for my lovely and talented hair person to get back from their vacation. Before I go on my vacation. Um, was it Tuesday? God. Uh, anyway, um, I had to take last week off from recording. There was like, this advanced form of RSV going through, uh, circulating through my household. And, uh, I don’t know, you can still kinda hear it in my voice. Uh, I don’t know, I just, I sound, I sound like I did in my 20s when I smoked cigarettes and enjoyed life. So that, that was, that, that’s what I’m taking from this anyway. At least, you know, I realized that, that, that, that level of happiness is, is quite possibly achieving. Again, my time. We’ll see though. Uh, what else? Um, oh yeah. Uh, if, if anyone out there is, is currently struggling with some advanced form of RSV and, uh, your, your throat hurts constantly, uh, I have some leftover viscous lidocaine gargle.

If you’d like some. Um, I will say that, um, I will say that the viscous lidocaine gargle confirmed, uh, confirmed some things about me. Um, that, uh, I have a petulant gag reflex. Is, is not meant for viscous gargling. That’s, that’s about what I learned about myself last week. That, uh, that, uh, I don’t know.

That, and I have a great immune system. Take the crap out of that. Tank. Absolute unit. All right.

Uh, let’s get back to SQL Server. Apparently that’s where we, that, that’s where we always end up back at SQL Server. And, uh, in today’s video, I want to talk about how, uh, SQL Server’s optimizer isn’t always smart about integers. By that I mean, it does not always infer things from integers that it should when it has plenty of information about the integers.

I don’t know how, I don’t know how the optimizer missed the boat on integers. They’re all over the place. Anyway, uh, I’ve got a couple indexes on a couple tables.

I’ve got, uh, I’ve got an index on the badges table, helpfully called not posts on the name column and the user ID column. And I’ve got an index, helpfully called not badges on the post table on post type ID. Remember, that’s the, it’s a big deal.

It leads on post type ID, uh, has owner user ID as a secondary key column and includes the column. Score. I’ve also gone out of my way to create an index on the post table to tell SQL Server’s cost-based optimizer that the values in the post type ID column are between zero, greater than zero, less than nine. And so it’s the numbers one through eight.

And, uh, I’ve got a couple queries down below and I’ve written, I’ve written these queries. In two different ways. The same query is one slight difference. One teeny, tiny, itty bitty, slight little difference.

And it might, it might shock you to find that there is a huge performance difference. In the way, in, in, in, in the execution plan between one query and another. So let’s, uh, let’s scroll down a little bit and let’s find this hotspot, the hottest spot north of Havana.

And let’s zoom in on it. So in the first query, I have told SQL Server, I want to see all of the post type IDs one and two. This is in one comma two.

Right? So just post type ID one and just post type ID two. And in the second query, I’ve said, I want to see where post type ID is less than three. And the other μέor strip that person has the opposite reality to tens of tens of tens of thousands of tens of thousands of tens of tens of tens of tens of tens of tens of tens of tens of tens of tens of tens of tens of tens of tens of tens of tens of tens of tens of tens of tens of tens of tens of tens.

I mean, four seventies. And for veryーム build on the currentiffe dizendo Houston, KL passé, you could be operational temp base. And, well, we should know that there’s nothing zero or below in there.

We would think so anyway. But here’s what happens when I run them. And I’ve actually, I pre-ran them because, well, I didn’t want to sit around on camera twiddling thumbs waiting for this thing to execute.

I get docked by beer gut magazine every time twiddling thumbs on camera. It’s a long story. It’s a very complicated contract.

But anyway, here is just, you know, some proof that the only numbers possible in the post-type ID column are between 1 and 8, and that’s what my constraint enforces and acknowledges. These eight digits are all you will ever see in there.

Not a very selective predicate, admittedly. Right? For like a 17 million row table, it’s the same eight numbers repeating over and over again. Not terribly.

Not a good clustered index. Just go out on a limb and say this would not make a good clustered index. So anyway, about those query plans, which you got a shocking sneak preview of a second ago, but maybe I was standing in the way.

Let’s see if we can zoom and focus in here. So if we come way over here, we’ve hit the limits of SQL Server being useful. So let’s drag that over here.

So in this top query where I said in 1, 2, what happens? Well, we get a nice little index seek, and we get a sort operator. We need the sort operator because we don’t have score in the key of the index.

That’s okay. It’s not really hurting us here. We get a lazy table spool, which, you know, I have mixed feelings on it. I’m going to do another video this week.

I have, I don’t know, three or four videos lined up to do this week so I can pad out next week. Next week, I’m on vacation. That’s all I ever wanted.

So I’ve got a video about table spools I’m going to also do this week. Maybe even today since the snow day here in Brooklyn. But this, you know, performs pretty okay for what it does, right?

It’s like performance-wise, it’s all right, right? It’s not great, not awful. Just it’s fine for what it does. This whole thing finishes in about 6.2 seconds.

Big score for us, maybe. I don’t know. I don’t really have big feelings about either of the, well, really, rather, I don’t have big feelings about most of either of these query plans, but the one I’m going to show you next is the one that I have much, much bigger feelings about.

Now, this is the query, just to remind you. This is the one we just looked at, where it was in 1.2. The one we’re going to look at next is less than 3.

That is not an emotional heart, all right? No emo, darling data. There’s only goth.

So let’s look at this query plan. Let’s see where this query plan gets offensive, which is something that if you’ve watched enough of my videos, you will immediately recognize and also be offended by.

And it’s when I said in 1.2, SQL Server said, cool, index C, got it, no problem. When I said less than 3, SQL Server said, ah, we’re going to build an index off your index.

Your index needs an index. What’s different about our index? Nothing.

On the same two columns. Oops, got a little weird there. I apologize. Got a little cocky, flew off the handle there.

It’s on the same stuff, right? Post type ID, owner user ID. Who cares, right?

Same index. We output the score column. Same index. SQL Server made a copy of my index. When the query ran, it was the same index. All because I said less than 3.

Maybe that got SQL Server emotional. Maybe I thought, oh, who loves me? I can build an index for it. I don’t know. It’s weird, right?

But it’s just logic that’s not built into SQL Server’s query optimizer. It doesn’t infer less than 3 is the same thing as being in 1.2. Even with an acknowledged, forced, enabled constraint on your table data.

And I think what’s particularly interesting is, actually, no, we should drag both of these way over. Just put the nested loops in the corner there and put the nested loops in the corner there. And the thing that I want to show when we zoom in…

Oops, I don’t need that tooltip. It’ll go away. Weirdo. It’s not even like a cardinality estimation thing. That’s a lot of white space, but whatever.

It’s not even like a cardinality estimation thing. Because if you look at this index seek, right? Look at the number of rows SQL Server estimates from the index seek. Granted, it’s wrong by 338%.

But SQL Server didn’t choose to build a copy of my index when the query ran for this one. The bottom one… I mean, we have the full table cardinality here, which makes sense because we had to read the whole table to make a copy of that index.

But then the estimates from the spool are exactly the same. Excuse me. A bit of last week coming back to haunt me.

It’ll be the only time this happened in this video. But yeah, the index from… So rather, the estimate from here is exactly the same as it was from up here, right? It’s 11635229.

It’s the same 338% that it was off. So, what did we learn? Well, maybe the optimizer isn’t as smart as we give it credit for.

Granted, it’s got a lot to deal with. Having seen your queries, I know just how much it has to deal with. But really, sometimes the way you phrase your queries makes a very, very big difference for performance.

Now, we’ve talked about that a lot from a lot of different angles. Sorgability and exists versus joins and other things like that. But this is a weird one, sort of admittedly, because you would think that a mature, like 30-year-old piece of software would be able to figure out that 1 and 2 are between 0 and 3.

Whether you say in, 1, 2, or you just say less than 3. So, be careful out there. When you’re writing your queries, sometimes little changes make a big difference.

Anyway, before any more of my lung attempts to present itself to you, I’m going to say thank you for watching. If you enjoy this sort of SQL Server content, don’t forget to like the video. If you want to be alerted every time, well, not every time my lungs try to present themselves to you, but every time I post this sort of insightful SQL Server commentary, go ahead and subscribe to the channel.

Coming up on 3,000 subscribers. Pretty happy about that. I don’t get anything at 3,000.

3,000 is actually a pittance of a number compared to people who play video games. Maybe I should start playing video games. I don’t know.

But yeah. Thank you for watching. I hope you enjoyed yourselves. I hope you learned something. And remember, if you need any viscous throat gargle, admit it medicinally, it’s lidocaine, viscous lidocaine.

If you need any of that, let me know. I’ll send you whatever I have left. Thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Indexing SQL Server Queries For Performance: Fixing Unpredictable Search Queries

It’s My Blog And I’ll Blog What I Want to


When I sit down to write any blog post as a series, I do these things first:

  • List out topics – it’s cool if it’s stuff I’ve covered before, but I want to do it differently
  • Look at old posts – I don’t want to fully repeat myself, but I write these things down so I don’t forget them
  • Write demos – some are easier than others, so I’ll jump around the list a little bit

Having said all that, I also give myself some grace in the matter. Sometimes I’ll want to talk about something else that breaks up the flow of the series. Sometimes I’ll want to record a video to keep the editors at Beergut Magazine happy.

And then, like with this post, I change my mind about the original topic. This one was going to be “Fixing Predicate Selectivity”, but the more I looked at it, the more the demo was going to look like the one in my post in this series about SARGability.

That felt kind of lame, like a copout. And while there are plenty of good reasons for copouts when you’re writing stuff for free, even I felt bad about that one. I almost ended the series early, but a lot of the work I’ve been doing has been on particularly complicated messes.

So now we’re going to talk about one of my favorite things I help clients with: big, unpredictable search queries.

First, What You’re (Probably) Not Going To Do


There’s one thing that you should absolutely not do, and one thing that I’ll sometimes be okay with for these kinds of queries.

First, what you should not do: A universal search string:

WHERE (p.OwnerUserId LIKE N'%' + @SearchString + N'%' OR @SearchString IS NULL)
OR    (p.Title LIKE N'%' + @SearchString + N'%' OR @SearchString IS NULL)
OR    (p.CreationDate LIKE N'%' + @SearchString + N'%' OR @SearchString IS NULL)
OR    (p.LastActivityDate LIKE N'%' + @SearchString + N'%' OR @SearchString IS NULL)
OR    (p.Body LIKE N'%' + @SearchString + N'%' OR @SearchString IS NULL);

The problem here is somewhat obvious if you’ve been hanging around SQL Server long enough. Double wildcard searches, searching with a string type against numbers and dates, strung-together OR predicates that the optimizer will hate you for.

These aren’t problems that other things will solve either. For example, using CHARINDEX or PATINDEX isn’t a better pattern for double wildcard LIKE searching, and different takes on how you handle parameters being NULL don’t buy you much.

So like, ISNULL(@Parameter, Column) will still suck in most cases.

Your other option is something like this, which is only not-sucky with a statement-level OPTION(RECOMPILE) hint at the end of your query.

WHERE  (p.OwnerUserId = @OwnerUserId OR @OwnerUserId IS NULL)
AND    (p.CreationDate >= @CreationDate OR @CreationDate IS NULL)
AND    (p.LastActivityDate < @LastActivityDate OR @LastActivityDate IS NULL)
AND    (p.Score >= @Score OR @Score IS NULL)
AND    (p.Body LIKE N'%' + @Body + N'%' OR @Body IS NULL)

This departs from the universal search string method, and replaces the one string-typed parameter with parameters specific to each column’s data type.

Sure, it doesn’t allow developers to be lazy sons of so-and-so’s in the front end, but you don’t pay $7000 per core for them, and you won’t need to keep adding expensive cores if they spend a couple hours doing things in a manner that resembles a sane protocol.

The recompile advice is good enough, but when you use it, you really need to pay attention to compile times for your queries. It may not be a good idea past a certain threshold of complexity to come up with a “new” execution plan every single time, minding that that “new” plan might be the same plan over and over again.

Second, What You’re Eventually Going To End Up With


SQL Server doesn’t offer any great programmability or optimizer support for the types of queries we’re talking about. It’s easy to fall into the convenience-hole of one of the above methods.

Writing good queries means extra typing and thinking, and who has time for all that? Not you. You’re busy thinking you need to use some in-memory partitioning, or build your own ORM from scratch, no, migrate to a different relational database, that will surely solve all your problems, no, better, migrate to a NoSQL solution, that’ll do it, just give you 18-24 months to build a working proof of concept, learn seven new systems, and hire some consultants to help you with the migration, yeah, that’s the ticket.

You can’t just spend an hour typing a little extra. Someone on HackerNews says developers who type are the most likely to be replaced by AI.

Might as well buy a pick and a stick to DIY a grave for your career. It’ll be the last useful thing you do.

Rather than put 300 lines of code and comments in a blog post, I’m storing it in a GitHub gist here.

What I am going to post in here is the current list of variables, and what each does:

  • @Top: How many rows you want to see (optional, but has a default value)
  • @DisplayName: Search for a user’s display name (optional, can be equality or wildcard)
  • @Reputation: Search for users over a specific reputation (optional, greater than or equal to)
  • @OwnerUserId: Search for a specific user id (optional, equality)
  • @CreationDate: Search for posts created on or after a date (optional, greater than or equal to)
  • @LastActivityDate: Search for posts created before a date (optional, less than)
  • @PostTypeId: Search for posts by question, answer, etc. (optional, equality)
  • @Score: Search for posts over a particular score (optional, greater than or equal to)
  • @Title: Search for posts with key words in the title (optional, can be equality or wildcard)
  • @Body: Search for posts with key words in the body (optional, can be equality or wildcard)
  • @HasBadges: If set to true, get a count of badges for any users returned in the results (optional, true/false)
  • @HasComments: If set to true, get a count of comments for any users returned in the results (optional, true/false)
  • @HasVotes: If set to true, get a count of votes for any posts returned in the results (optional, true/false)
  • @OrderBy: Which column you want the results ordered by (optional, but has a default value)
  • @OrderDir: Which direction you want the results sorted in, ascending or descending (optional, but has a default value)

To round things up:

  • There are 9 parameters in there which will drive optional searches
  • Seven of the nine optional searches are on the Posts table, two are on the Users table
  • There are 3 parameters that drive how many rows we want, and how we want them sorted
  • There are 3 parameters that optionally hit other tables for additional information

Indexing for the Users side of this is relatively easy, as it’s only two columns. Likewise, indexing for the “Has” parameters is easy, since we just need to correlate to one additional column in Badges, Comments, or Votes.

But that Posts table.

That Posts table.

Index Keys Open Doors


The struggle you’ll often run into with these kinds of queries is that there’s a “typically expected” thing someone will always search for.

In your case, it may be a customer id, or an order id, or a company id… You get the point. Someone will nearly always need some piece of information for normal search operations.

Where things go off the rails is when someone doesn’t do that. For the stored procedure linked above, the role of the “typically expected” parameter will be OwnerUserId.

The data in that column doesn’t have a very spiky distribution. At the high end, you have about 28k rows, and at the low end, well, 1 row. As long as you can seek in that column, evaluating additional predicates isn’t so tough.

In that case, an index like this would get you going a long way:

CREATE INDEX
    p
ON dbo.Posts
    (OwnerUserId, Score DESC, CreationDate, LastActivityDate)
INCLUDE
    (PostTypeId)
WITH
    (SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);
GO

Since our stored procedure “typically expects” users to supply OwnerUserId, has a default sorting of Score, optional Creation and LastActivity Dates can act as residual predicates without a performance tantrum being thrown.

And since PostTypeId is one of the least selective columns in the whole database, it can go live in the basement as an included column.

Using dynamic SQL, we don’t have to worry about SQL Server trying to re-use a query execution plan for when OwnerUserId is passed in. We would have to worry about that using some other implementations.

Here, the problem is that some searches will be slow without supporting indexes, and not every slow query generates a missing index request.

/*NOPE THIS IS FINE NO INDEX COULD HELP*/
EXEC dbo.ReasonableRates
    @CreationDate = '20130101',
    @LastActivityDate = '20140101',
    @HasBadges = 1,
    @HasComments = 1,
    @HasVotes = 1,
    @Debug = 1;
GO

As an example, this takes ~10 seconds, results in a perfectly acceptable where clause for an index to help with, but no direct request is made for an index.

Of course, there’s an indirect request in the form of a scan of the Posts table.

sql server query plan
dirty looks

So, back to the struggle, here:

  • How do you know how often this iteration of the dynamic SQL runs?
  • Is it important? Did someone important run it?
  • Is it important enough to add an index to help?

And then… how many other iterations of the dynamic SQL need indexes to help them, along with all the other questions above.

You may quickly find yourself thinking you need to add dozens of indexes to support various search and order schemes.

Data Access Patterns


This is the big failing of Row Store indexes for handling these types of queries.

CREATE INDEX
    codependent
ON dbo.Posts
(
    OwnerUserId,
    /*^Depends On^*/
    Score,
    /*^Depends On^*/
    CreationDate,
    /*^Depends On^*/
    LastActivityDate,
    /*^Depends On^*/
    PostTypeId,
    /*^Depends On^*/
    Id
)
INCLUDE
    (Title)
/*^Doesn't depend on anything. It's an Include.^*/
WITH
    (MAXDOP = 8, SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);

In general, if you’re not accessing index key columns starting with the leading-most key column, your queries won’t be as fast (or may not choose to use your index, like in the plan up there), because they’d have to scan the whole thing.

For queries like this, nonclustered column store indexes are a way hotter ticket. Columns can be accessed independently. They may get abused by modification queries, and they may actually need maintenance to keep them compressed and tombstone-free, but quite often these tradeoffs are worth it for improving search queries across the board. Even for Standard Edition users, whom Microsoft goes out of their way to show great disdain for, it can be a better strategy.

Here’s an example:

CREATE NONCLUSTERED COLUMNSTORE INDEX
    nodependent
ON dbo.Posts
    (OwnerUserId, Score, CreationDate, LastActivityDate, PostTypeId, Id, Title)
WITH(MAXDOP = 1);

With this index in place, we can help lots of search queries all in one shot, rather than having to create a swath of sometimes-helpful, sometimes-not indexes.

Even better, we get a less wooly guarantee that the optimizer will heuristically choose Batch Mode.

Two Things


I hope you take two things away from this post:

  • How to write robust, readable, repeatable search queries
  • Nonclustered columnstore indexes can go a lot further for performance with unpredictable predicates

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Recent Updates To sp_QuickieStore, sp_HealthParser, And A New Contributing Guide

Work Work Work


If you’re the kind of person who needs quick and easy ways to troubleshoot SQL Server performance problems, and you haven’t tried my free scripts yet, you’re probably going to keep having SQL Server performance problems.

I don’t get a lot of visitor contributions to my code (and here I thought it was just because it’s perfect), but I had a couple cool recent additions to sp_QuickieStore, my free tool for searching and analyzing Query Store data.

First, Ben Thul did a great job of simplifying the process of searching for only for queries that run during configurable business hours. I had gone through a whole process of creating a lookup table with times and a bunch of other nonsense. Ben, being smart, converted that over to just using parameters with a time type, so it doesn’t matter if you use 12- or 24-hour time. Thank you, Ben.

Second, Bill Finch dropped a really interesting pull request on me that allows for searching for query text that includes square brackets. I had no idea that didn’t work, but apparently I don’t go searching for Entity Framework created query text all that often. Very cool stuff, and a thank you to Bill as well!

Third, since I keep running into databases where Query Store is in a weird state, I added an initial check to see if it’s read only, if the desired and current state disagree with each other, or if auto-cleanup is disabled. Of course, I haven’t run into that since. Lucky me.

Fourth, Cláudio Silva added a new parameter to search Query Store for only plans that have hints (2022+, probably whatever Azure nonsense). An idea so cool, I expanded on it to also allow searching for queries with feedback and variants (also 2022+, probably whatever Azure nonsense)

Fourth Part Deux, I made a few tweaks to sp_HealthParser:

  1. Numbers are now nicely formatted with commas, so it’s easy to identify the precise scale of misery you’re experiencing.
  2. A Friend At Microsoft told me that wait durations should already be in milliseconds in the system health extended event, and that I didn’t need to divide those numbers by 1000 to convert them from microseconds. This change is somewhat experimental, because some awfully big numbers show up. If you happen to know better, or feel like testing to verify the change, give the latest version a run.
  3. If you’re searching for warnings only, I added a parameter (@pending_task_threshold) to reduce the number of warnings lines from the cpu task details results. You’ll get a warning here even if there’s one pending task, which isn’t very useful. You usually want to find when LOTS of pending tasks were happening. The default is 10.

Finally, I added a contributing guide. It’s not very extensive (which prevents it from being exhausting); the main point I’m trying to get across is that forks and pull requests must be made from and to the dev branch only. Committing directly to main is verboten. Totes verbotes, as they say in Germany and surrounding German-speaking countries, I’ve been informed by Reliable Sources.

If you have questions, run into bugs, or think adding some code to any of my procedures, open up an issue. I don’t do support via email or blog comments.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Indexing SQL Server Queries For Performance: Fixing Windowing Functions

I’m The Face


A lot of the time, the answer to performance issues with ranking windowing functions is simply to get Batch Mode involved. Where that’s not possible, you may have to resort to adding indexes.

Sometimes, even with Batch Mode, there is additional work to be done, but it really does get a lot of the job done.

In this post I’m going to cover some of the complexities of indexing for ranking windowing functions when there are additional considerations for indexing, like join and where clause predicates.

I also want to show you the limitations of indexing for solving performance problems for ranking windowing functions in Row Mode. This will be especially painful for developers forced to use Standard Edition, where Batch Mode is hopelessly hobbled into oblivion.

At some point, the amount of data that you’re dealing with becomes a bad fit for ranking windowing functions, and other approaches make more sense.

Of course, there are plenty of things that other variety of windowing functions do, that simple query rewrites don’t cover.

Here are some examples:

sql server windowing functions
playing favorites

I realize that aggregate and analytic functions have many more options available, but there are only four ranking functions, and here at Darling Data, we strive for symmetry and equality.

It would be difficult to mimic the results of some of those — particularly the analytic functions — without performance suffering quite a bit, complicated self-joins, etc.

But, again, Batch Mode.

Hey Dude


Let’s start with a scenario I run into far too often: tables with crappy supporting indexes.

These aren’t too-too crappy, because I only have so much patience (especially when I know a blog post is going to be on the long side).

The index on Posts gets me to the data I care about fast enough, and the index on Votes allows for easy Apply Nested Loops seeking to support the Cross Apply.

There are some unnecessary includes in the index on Votes, because the demo query itself changed a bit as I was tweaking things.

But you know, if there’s one thing I’ve learned about SQL Server, there are lots of unnecessary includes in nonclustered indexes because of queries changing over the years.

CREATE INDEX
    p
ON dbo.Posts
    (PostTypeId)
INCLUDE
    (Score)
WITH
    (SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);

CREATE INDEX 
    v 
ON dbo.Votes
    (PostId) 
INCLUDE 
    (UserId, BountyAmount, VoteTypeId, CreationDate) 
WITH
    (SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);

Now, the query I’m using is quite intentionally a bit of a stress test. I’m using two of the larger tables in the database, Posts and Votes.

But it’s a good example, because part of what I want to show you is how larger row counts can really mess with things.

I’m also using my usual trick of filtering to where the generated row number is equal to zero outside the apply.

That forces the query to do all of the window function work, without having to wait for 50 billion rows to render out in SSMS.

SELECT
    p.Id,
    p.Score,
    v.VoteTypeId,
    v.LastVoteByType
FROM dbo.Posts AS p
CROSS APPLY
(
    SELECT
        v.*,
        LastVoteByType = 
            ROW_NUMBER() OVER
            (
                PARTITION BY
                    v.VoteTypeId
                ORDER BY
                    v.CreationDate DESC
            )
    FROM dbo.Votes AS v
    WHERE v.PostId = p.Id
    AND   v.VoteTypeId IN (1, 2, 3)
    AND   v.CreationDate >= '20080101'
) AS v
WHERE p.PostTypeId = 2
AND   v.LastVoteByType = 0;

If you’re curious about why I wrote the query this way, watch this YouTube video of mine. Like and subscribe, etc.

Assume that the initial goal is that we care very much about the ~4.2GB memory grant that this query acquires to Sort data for the windowing function, and to create an index that solves for that.

Dark Therapy


The query plan isn’t too bad, but like we looked at in the post in this series about fixing sorts, there is a bit of a sore spot.

sql server query plan
get in line

Now, it has been blogged about many times, so I’m not going to belabor the point too much: the columns that need sorting are the ones in the partition by and order by of the windowing function.

But the index needs to match the sort directions of those columns exactly. For example, if I were to create this index, where the sort direction of the CreationDate column is stored ascending, but the windowing function asks for descending, it won’t work out.

CREATE INDEX 
    v 
ON dbo.Votes
    (PostId, VoteTypeId, CreationDate) 
INCLUDE 
    (UserId, BountyAmount) 
WITH
    (SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE, DROP_EXISTING = ON);

In fact, it’s gonna slow things down a bit. Score another one for crappy indexes, I suppose.

sql server query plan
30 love

The reason why this one is so much slower is because of the Seek. I know, I know, how could a Seek be bad?! Well, it’s not one seek, it’s three seeks in one.

Time spent in each of the Row Mode operators in both of the plans you’ve seen so far is nearly identical, aside from the Seek into the Votes index. If we compare each tool tip…

sql server query plan
one seek vs three seeks

The plan properties for the Seek are only interesting for the second query. It’s not very easy to see from the tool tips above, because Microsoft is notoriously bad at user experience in its products.

sql server query plan
threefer

It is somewhat easier to see, quite verbosely, that for each PostId, rather than a single seek and residual predicate evaluation, three seeks are done.

But, anyway, the problem we’re aiming to solve persists — the Sort is still there — and we spend about 4.5 seconds in it.

Your Best Won’t Do


With a similar index, the best we can do is get back to the timing of the original query, minus the sort.

The index we created above was useless for that, because we were careless in our specification. We created it with CreationDate sorted in ascending order, and our query uses it in descending order.

CREATE INDEX 
    v 
ON dbo.Votes
    (PostId, VoteTypeId, CreationDate DESC) 
INCLUDE 
    (UserId, BountyAmount) 
WITH
    (SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE, DROP_EXISTING = ON);

Now, we’ve gotten rid of the sort, so our query is no longer asking for 4.2GB of RAM, but the runtime is only roughly equivalent to the original query.

sql server query plan
i see london, i see france

A bit amusing that we were better off with a query plan where the sort spilled to disk, but what can you do? Just marvel at your luck, sometimes.

Improving Runtime


The sort of sad thing is that the cross apply method is purely Row Mode mentality. A bit like when I poke fun at folks who spend a lot of energy on index fragmentation, page splits, and fill factor as having 32bit mentality, modern performance problems often require Batch Mode mentality.

Query tuning is often about trade-offs, and this is no exception. We can reduce runtime dramatically, but we’re going to need memory to do it. We can take this thing from a best of around 15 seconds, to 2-3 seconds, but that Sort is coming back.

Using the normal arsenal tricks, getting Batch Mode on the inner side of a cross apply doesn’t seem to occur easily. A rewrite to get Batch Mode for a cross apply query is not exactly straightforward.

SELECT     
    p.Id,
    p.Score,
    v.VoteTypeId,
    v.LastVoteByType
FROM dbo.Posts AS p
CROSS APPLY
(
    SELECT 
        v.* 
    FROM 
    (
        SELECT
            v.*,
            LastVoteByType = 
                ROW_NUMBER() OVER
                (
                    PARTITION BY
                        v.VoteTypeId
                    ORDER BY
                        v.CreationDate DESC
                )
        FROM dbo.Votes AS v
    ) AS v
    WHERE v.PostId = p.Id
    AND   v.VoteTypeId IN (1, 2, 3)
    AND   v.CreationDate >= '20080101'
) AS v
WHERE p.PostTypeId = 2
AND   v.LastVoteByType >= '99991231'
OPTION(RECOMPILE);

Let’s change our query to use the method that I normally advise against when working in Row Mode.

SELECT
    p.Id,
    p.Score,
    v.VoteTypeId,
    v.LastVoteByType
FROM dbo.Posts AS p
JOIN
(
    SELECT
        v.*,
        LastVoteByType = 
            ROW_NUMBER() OVER
            (
                PARTITION BY
                    v.VoteTypeId
                ORDER BY
                    v.CreationDate DESC
            )
    FROM dbo.Votes AS v
    WHERE v.VoteTypeId IN (1, 2, 3)
    AND   v.CreationDate >= '20080101'
) AS v
  ON v.PostId = p.Id
WHERE p.PostTypeId = 2
AND   v.LastVoteByType = 0;

In Row Mode, this sucks because the entire query in the derived join needs to be executed, producing a full result set of qualifying rows in the Votes table with their associated row number. Watch the video I linked above for additional details on that.

However, if we have our brains in Batch Mode, this approach can be much more useful, but not with the current index we’re using that leads with PostId.

When we used cross apply, having PostId as the leading column allowed for the join condition to be correlated inside the apply. We can’t do that with the derived join, we can only reference it in the outer part of the query.

Tweaking Indexes


An index that looks like this, which allows for finding the rows we care about in the derived join easily makes far more sense.

CREATE INDEX 
    v2 
ON dbo.Votes
    (VoteTypeId, CreationDate DESC, PostId) 
INCLUDE 
    (UserId, BountyAmount) 
WITH
    (SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);

With all that done, here’s our new query plan. Something to point out here is that this is the same query plan as the more complicated rewrite that I showed you in the last section, with the same memory grant. Some of these memory grant numbers are with memory grant feedback involved, largely shifting numbers downwards, which is what you would expect to see if you were doing this in real life.

sql server query plan
daydreamer

It could be far less of a concern for concurrency to grant out ~2GB of memory for 2 seconds, than for 15-20 seconds.

Even in a situation where you’re hitting RESOURCE_SEMAPHORE waits, it’s far less harmful to hit them for 3 seconds on average than 15-20 seconds on average. It’s also hard to imagine that you’re on a server where you truly care about high-end performance if 2GB memory grants lead you to RESOURCE_SEMAPHORE waits. If you have 128GB of RAM, and max server memory set to 116-120GB, you would be able to run ~80 of these queries concurrently before having a chance of a problem hitting RESOURCE_SEMAPHORE waits, assuming that you don’t get Resource Governor involved.

Tweaking The Query


Like I said early on, there’s only so good you can get with queries that use windowing functions where there are no alternatives.

Sticking with our Batch Mode mindset, let’s use this rewrite. It’s not that you can’t cross apply this, it’s just that it doesn’t improve things the way we want. It takes about 5 seconds to run, and uses 1.3GB of RAM for a query memory grant.

SELECT
    p.Id,
    p.Score,
    v.VoteTypeId,
    v.LastVoteByType
FROM dbo.Posts AS p
JOIN
(
    SELECT
        v.PostId,
        v.VoteTypeId,
        LastVoteByType =
            MAX(v.CreationDate)
    FROM dbo.Votes AS v
    WHERE v.VoteTypeId IN (1, 2, 3)
    AND   v.CreationDate >= '20080101'
    GROUP BY
        v.PostId,
        v.VoteTypeId
) AS v
  ON v.PostId = p.Id
LEFT JOIN dbo.columnstore_helper AS ch
  ON 1 = 0 /*This is important*/
WHERE p.PostTypeId = 2
AND   v.LastVoteByType >= '99991231';

Note that I don’t naturally get batch mode via Batch Mode On Row Store. I’m using a table with this definition to force SQL Server’s hand a bit, here:

CREATE TABLE
    dbo.columnstore_helper
(
    cs_id bigint NOT NULL,
    INDEX cs_id CLUSTERED COLUMNSTORE
);

But the result is pretty well worth it. It’s around 1 second faster than our best effort, with a 1.6GB memory grant.

sql server query plan
waterfall

There may be even weirder rewrites out there in the world that would be better in some way, but I haven’t come across them yet.

Coverage


We covered a number of topics in this post, involving indexing, query rewrites, and the limitations of Row Mode performance in many situations.

The issues you’ll see in queries like this are quite common in data science, or data analysis type workloads, including those run by common reporting tools like PowerBI. Everyone seems to want a row number.

I departed a bit from what I imagined the post would look like as I went along, as additional interesting details came up. I hope it was an enjoyable, and reasonably meandering exploration for you, dear reader.

There’s one more post planned for this series so far, and I should probably provide some companion material for why the multi-seek query plan is 2x slower than the seek + residual query plan.

Anyway, I’m tired.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Join me at DataTune in Nashville, March 8-9 2024

Spring Training


This March, I’ll be presenting my full day training session The Foundations Of SQL Server Performance Tuning.

All attendees will get free access for life to my SQL Server performance tuning training. That’s about 25 hours of great content.

Get your tickets here for this event, taking place Friday, March 8th-9th 2024 at Belmont University – Massey Center 1900 Belmont Blvd, Nashville, TN 37212

Here’s what I’ll be presenting:

The Foundations Of SQL Server Performance Tuning

Session Abstract:

Whether you want to be the next great query tuning wizard, or you just need to learn how to start solving tough business problems at work, you need a solid understanding of not only what makes things fast, but also what makes them slow.

I work with consulting clients worldwide fixing complex SQL Server performance problems. I want to teach you how to do the same thing using the same troubleshooting tools and techniques I do.

I’m going to crack open my bag of tricks and show you exactly how I find which queries to tune, indexes to add, and changes to make. In this day long session, you’re going to learn about hardware, query rewrites that work, effective index design patterns, and more.

Before you get to the cutting edge, you need to have a good foundation. I’m going to teach you how to find and fix performance problems with confidence.

Event Details:

Get your tickets here for this event!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Indexing SQL Server Queries For Performance: Fixing A Sort

Orderly


Ordered data is good for all sorts kinds of things in databases. The first thing that may come to mind is searching for data, because it’s a whole lot easier to get what you need when you know where it is.

Think of a playlist. Sometimes you want to find a song or artist by name, and that’s the easiest way to find what you want.

Without things sorted the way you’re looking for them, it’s a lot like hitting shuffle until you get to the song you want. Who knows when you’ll find it, or how many clicks it will take to get there.

The longer your playlist is, well, you get the idea. And people get all excited about Skip Scans. Sheesh.

Anyway, let’s look at poor optimizer choices, and save the poor playlist choices for another day.

A Normal Query


This is a query that I know and love.

SELECT   
    p.*
FROM dbo.Posts AS p
JOIN dbo.Votes AS v
  ON p.Id = v.PostId
WHERE p.PostTypeId = 2
AND   p.CreationDate >= '20131225'
AND   v.VoteTypeId = 2
ORDER BY 
    p.Id;

I love it because it gets a terribly offensive query plan.

sql server query plan
ban me

Look at this monstrosity. A parallel merge join that requires a sort to enable its presence. Who would contrive such a thing?

A Sidebar


This is, of course, a matter of costing. For some reason the optimizer considered many other alternatives, and thought this one was the cheapest possible way to retrieve data.

For reference, the above query plan has an estimated cost of 2020.95 query bucks. Let’s add a couple hints to this thing.

SELECT   
    p.*
FROM dbo.Posts AS p
JOIN dbo.Votes AS v
  ON p.Id = v.PostId
WHERE p.PostTypeId = 2
AND   p.CreationDate >= '20131225'
AND   v.VoteTypeId = 2
ORDER BY 
    p.Id
OPTION
(
    HASH JOIN, 
    USE HINT('DISALLOW_BATCH_MODE')
);

Using this query, I’m telling SQL Server to use a hash join instead of a merge join. I’m also restricting batch mode to keep things a bit more fair, since the initial query doesn’t use it.

Here’s the execution plan:

sql server query plan
hard to explain

SQL Server’s cost-based optimizer looks at this plan, and thinks it will cost 13844 query bucks to execute, or nearly 6x the cost of the merge join plan.

Of course, it finishes about 5 seconds faster.

Like I end up having to tell people quite a bit: query cost has nothing to do with query speed. You can have high cost queries that are very fast, and low cost queries that are very slow.

What’s particularly interesting is that on the second run, memory grant feedback kicks in to reduce the memory grant to ~225MB, down from the initial granted memory of nearly 10GB.

The first query retains a 2.5GB memory grant across many executions, because sorting the entire Votes table requires a bit of memory for the effort.

But This Is About Indexes, Not Hints


With that out of the way, let’s think about an index that would help the Votes table not need sorting.

You might be saying to yourself:

SELECT   
    p.*
FROM dbo.Posts AS p
JOIN dbo.Votes AS v
  ON p.Id = v.PostId /*We have to sort by this column for the merge join, let's put it first in the index*/
WHERE p.PostTypeId = 2
AND   p.CreationDate >= '20131225'
AND   v.VoteTypeId = 2 /*We can put this second in the index so we don't need to do any lookups for it*/
ORDER BY 
    p.Id; /*It's the clustered primary key, so we can just let the nonclustered index inherit it*/

Which would result in this index:

CREATE INDEX
    v   
ON dbo.Votes
    (PostId, VoteTypeId)
WITH
    (SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);

And you’d be right this time, but you wouldn’t be right every time. With that index, this is the plan we get:

sql server query plan
job well done

The optimizer chooses apply nested loops, and seeks both to the PostIds and VoteTypeIds that we care about.

That Won’t Always Happen


Sometimes, you’ll need to reverse the columns, and use an index like this:

CREATE INDEX
    v2   
ON dbo.Votes
    (VoteTypeId, PostId)
WITH
    (SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);

This can be useful when the where clause predicate is really selective, and the join predicate is less so. We can still get a plan without a sort, and I’ll talk about why in a minute.

For now, let’s marvel at the god awful query plan SQL Server’s optimizer chooses for this index:

sql server query plan
daffy duck

I think if I ever got my hands on the SQL Server source code, I’d cost merge joins out of existence.

But anyway, note that there’s no sort operator needed here.

Before I explain, let’s look at what the query plan would look like if SQL Server’s optimizer didn’t drink the hooch and screw the pooch so badly.

sql server query plan
how nice of you to join us

It’s equally as efficient, and also requires no additional sorting.

Okay, time to go to index school.

Index 101


Let’s say we have an index that looks like this:

CREATE INDEX
    whatever_multi_pass
ON dbo.Users
(
    Reputation,
    UpVotes,
    DownVotes,
    CreationDate DESC
)
INCLUDE
(
    DisplayName
);

In row store indexes, the key columns are in stored in sorted order to make it easy to navigate the tree to efficiently locate rows, but they are not stored or sorted “individually”, like in column store indexes.

Let’s think about playlists again. Let’s say you have one sorted by artist, release year, album title, and track number. Who knows, maybe someone (like DMX) released two great albums in a single year.

You would have:

  • The artist name, which would have a bunch of duplicates for each year (if it’s DMX), duplicates for album title, and then unique track ids
  • The release year, which may have duplicates (if it’s DMX) for each album, and then unique track ids
  • The album title which would have duplicates for unique track id

But for each of those sets of duplicates, things would be stored in order.

So, going back to our index, conceptually the data would be stored looking like this, if we ran this query:

SELECT TOP (1000)
    u.Reputation,
    u.UpVotes,
    u.DownVotes,
    u.CreationDate
FROM dbo.Users AS u
WHERE u.Reputation IN (124, 125)
AND   u.UpVotes < 11
AND   u.DownVotes > 0
ORDER BY
    u.Reputation,
    u.UpVotes,
    u.DownVotes,
    u.CreationDate DESC;

I’ve cut out some rows to make the image a bit more manageable, but here you go:

sql server query results
storage!

For every row where reputation is 124, upvotes are sorted in ascending order, and then for any duplicates in upvotes, downvotes are stored in ascending order, and for any duplicate downvotes, creation dates are stored in descending order.

Then we hit 125, and each of those “reset”. Upvotes starts over again at 1, which means we have new duplicate rows to sort downvotes for, and then new duplicate rows in downvotes to sort creation dates in.

Going back to our query, the reason why we didn’t need to sort data even when PostId was the second column is because we used an equality predicate to find VoteTypeIds with a value of 2. Within that entire range, PostId were stored in ascending order.

Understanding concepts like this is really important when you’re designing indexes, because you probably have a lot of complicated queries, with a lot of complicated needs:

  • Multiple where clause predicates
  • Multiple join columns to different tables
  • Maybe with grouping and ordering
  • Maybe with a windowing function

Getting indexes right for a single query can be a monumental feat. Getting indexes right for an entire workload can seem quite impossible.

The good news, though, is that not every query can or should have perfect indexes. It’s okay for some queries to be slow; not every one is mission critical.

Making that separation is crucial to your mental health, and the indexing health of your databases.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

A Little About Nested Loops, Parallelism, and the Perils of Recursive Common Table Expressions

A Little About Nested Loops, Parallelism, and the Perils of Recursive Common Table Expressions



How can I optimize a recursive CTE inside a IVTF?​

Video Summary

In this video, I dive into some interesting aspects of SQL Server query execution plans, focusing on nested loops joins and the peculiarities of recursive common table expressions (CTEs). Despite my current predicament of needing a haircut, which I can’t get for two more weeks, I share insights that might save you from similar delays in understanding complex query behaviors. I explore how parallelism works with nested loops joins, particularly highlighting the importance of one-row guarantees and the concept of “parallel skew.” By demonstrating practical examples, I illustrate why even a small tweak like adding a `TOP` clause can significantly impact performance and plan execution.

Full Transcript

Erik Darling here with Darling Data, for as long as you’ll have me. I want to talk today a little bit, despite the fact that I’m desperately in need of a haircut, and I can’t get one for like two weeks, and I don’t want you to have to wait two weeks to learn about SQL Server stuff, I want to talk today a little bit about nested loops, specifically parallel nested loops. and the perils of recursive CTE. And, you know, you don’t need specifically a recursive CTE to see kind of performance stuff I’m going to talk about, but they do make a rather clever foil in that regard. And, well, I answered a question recently about them on a database, is an endpoint, typical recursive CTE plan. So usually when you use a recursive common table expression, the recursive part of the query, in this case that’s most of the query, most of the query plan rather, you are not eligible for parallelism in here. SQL Server just doesn’t go for it. It will cause a parallel zone in your query plan and also is like, you know, sort of an aside, I believe batch mode is also ineligible for the inner side of recursive CTE plan, which kind of makes sense because there’s really not a lot of operators in here that are eligible for batch mode anyway. There’s nested loops, joins, spools, and I suppose compute scalars are, I think concatenation is too, but like what’s the point of doing that, right? Not a lot of it.

Now, you can use cross-apply sometimes in some ways to change that, right? So this query here, we’re cross-applying, but SQL Server still chooses a single threaded execution plan. For this one down here, where we do things a little bit differently, rather than looking for just a single post ID. Now, remember, the ID table, or sorry, the ID column in the post table is a clustered primary key on there.

So looking for one value means one row is going to come out of it. This is a particularly interesting value because it actually has a lot of answers and comments associated with it. So if I tried to run this for real, it would run for a long time, and it would fail without the max recursion zero hint. So a lot of fun background there, right? Life-changing stuff. This version of the query, though, in this version of the query, we are looking for, and I don’t know why IntelliSense is crapping all over this thing and telling me that the post table doesn’t exist, and let’s get rid of that. I don’t need I don’t need your IntelliSense today anyway. These objects clearly exist. It’s not like the video where I was telling you to maybe pretend to select from an object that doesn’t exist so that you don’t cache query plans, but whatever. This query changes a little bit, and the reason we’re going to talk about why in a minute, but we’re also going to talk about a little bit about parallel nested loops, because a lot of stuff about them that a lot of people kind of don’t understand.

This whole query now is eligible for parallelism, and there are some rules and some reasons and some things to watch out for. So let’s hide those. I don’t want to ruin the big surprise. I’m going to scroll up here a little bit where I’ve written some things about parallel nested loops that you should know. And there we go. That’s a bit better on the Zoom. I would just have to move this so my big head isn’t covering up anything important. And let’s talk a little bit about parallel nested loops.

It’s only chosen for the inner side when the outer side has a one row guarantee. So like in the function that I was running, where even when I was searching for owner user ID 22656, which has about 27, 28,000 rows associated with it, what we’re passing into the inline table valued function that locates a post to look for is the ID of the table, right? Again, the clustered primary key on the table. So even though 27,000 rows will be taken and looped, or 28,000 rows will eventually be taken and looped, what SQL Server cares about is that what we’re finding each time we do that loop has a one row guarantee. And that join, rather that correlation on the clustered primary key of the post table is what gives it that. The cost reduction stuff doesn’t consider the inner side of the nested loops join when SQL Server is figuring out if it’s going to give you a parallel plan or not. It only cares about making the outer part of the nested loops cheaper. And coming back over here, what I mean by that is that only this, this is all SQL Server cares about. If SQL Server thought that it wouldn’t be any harder using a, or wouldn’t be any more costly using a single threaded execution plan, then it would have chosen a serial plan again, like we saw in the other ones. The parallel plan for this has an estimated subtree cost of 135 query bucks and change. And SQL Server thought this was a cheaper option than scanning the post table using a single thread. So none of the stuff on this side of the nested loops join is considered when costing a query for parallelism because, get ready for this because this is a wild one, what’s running in here in parallel is not exactly parallel. It’s running .copies of the same query across .threads and each thing is sort of like a serial plan inside running individually, which conceptually is kind of like what happens with like other parallel plan types, other parallel join types rather like hash joins and merge joins and stuff like that. But you should, parallel merge joins were a mistake and we’re not going to talk about those in SULLI this fine video. So all that out of the way, I’m going to talk a little bit about the setup for the query plans that we’re about to look at and the function that they use.

And then I’m going to show you why parallel nested loops joins can be just terrifically sensitive to parallel skew. And by parallel skew, I mean how many rows end up on each thread and a neat way that you can fix those issues if you are having those issues. So we have a couple indexes here to help our demo along. We have one on the post table and one on the comments table. The columns in use here make total sense for what the query, what the function is doing. The function here, here’s the anchor part of the recursive CTE where we look up a single post ID. And I have littered this function with force sequence because SQL Server, despite my fantastic indexes, was choosing to take those indexes and build nearly identical indexes from them using an eager index pool, which was making everything terrifically slow. So lesson one, if you have a good index on your table and SQL Server is making an eager index pool from that index, use a force sequence to make your life easy.

We have had a lot of instances lately helping clients out where SQL Server was choosing daft execution plans and force sequence were quite useful in addressing those issues. Now down here in the, you have the anchor and then the hierarchy building part of your common table expression. There’s a word for it that’s slipping my mind right now. Maybe I should have done some research before I started yacking away. So here’s where we go find things in the post table. So like what we found up top was an answer, right? Because that’s the clustered primary keys, finds that, finds an answer that we care about. And then in here is where we find questions or rather answers to the question. And then down here is where we find comments on the question or answer. So we find all the stuff that we can in here. And then we select stuff out of the recursive CTE. This is still inside the function. And then here are a couple of queries that demonstrate really well what I meant by parallel nested loops being very sensitive to skew. So I’m going to make sure that I hit R and not E so I don’t try to run this whole demo file over again. And we’re going to look at a couple portions of these query plans.

Now notice that the only difference between these two queries is here we just do a straight select from cross apply where p dot creation date is greater than or equal to 2013-03-06. And in here we have a select count from and then we have a top in here with a big top number so that we don’t have to worry about maybe not getting enough rows back, right? We definitely have fewer than 2.1 billion rows in the table. And then we cross apply to the hierarchy function outside of that, right? So this top query takes just about a full minute to run. If we look at the final operator in the query plan, that’s 58.443 milliseconds. And there’s not like a lot of weird mumbo jumbo in here where SQL Server messes up query operator times because it does that quite a bit too. But really what’s important here is when we if we right click on this, that little line, and we look at the actual number of rows for all executions, we are going to see quite a bit of skew here, right? Like all these rows, let me actually clear that out. Let’s just highlight the whole thing. Some rows like here and here and here end up with a lot more rows on them than other threads like, stay pink, u and u and u and u. So the way that SQL Server figures which rows are going to go on which thread is it uses some like modulus-ish math and this thing called the parallel page supplier starts dividing rows up depending on you know how many threads are involved and you know like how many rows and other stuff those things get split up and you hope that they end up getting split up evenly but that doesn’t always happen the way things that the values that come out of the table end up hashing out. If you have like values that are really like all even then like like then like whatever like modulus value for the odd numbers are going to be screwed, right?

Because they’re not going to find anything. You could end up with like million rows, zero rows, million rows, zero rows, million rows, zero rows for all the threads in your query which would be a bad situation like this. This is this is nearly as bad a situation uh just not with one with million zero it’s just pretty close with 1.4 million, 1.4 million, 1.6 million and a bunch of other that have like 10, 12, 10 to 12,000 rows on each thread. So that’s a bad time and this query ends up running for a full, well just about a full minute. We can call it a full minute. I feel feel like it’s a full minute. It felt like a full minute of my life anyway and then we have this query down here which finishes twice as fast. The final timing on this is just about 30 seconds, close enough to 30 seconds for me anyway and if it’s close enough for me then it’s close enough. And the reason why is because and I wish this thing would stop jumping around and reframing it’s pretty annoying. When we have a top in a query, that top under most circumstances unless it’s on the inner side of a nested loops join, uh that top will force a serial zone in your query plan. It does not force the whole query to be serial which is the way the crappy Microsoft documentation about parallelism makes it sound.

Uh it just for which I’ve had like three or four people be confused about that in my and like in recent memory. So uh it’s just badly documented. So it forces a serial zone in the plan up here. But because the overall plan is parallel right so like this is all parallel and this is all parallel and all this is obviously parallel. We know that because we have our fun little racing stripes here.

SQL Server needs to split those rows back out after the parallel zone. All right so if we, oh boy this reef, this SMS is really really fighting me today. So what SQL Server does to do that is it has to distribute streams. So we have a parallel scan, we have a gather streams, and then we have a distribute streams because this top has no racing stripes. Slow top right? Single threaded top. I’m not saying it’s slow. It’s 750 milliseconds. Who cares? Uh but what this distribute streams operator forces us to do is redistribute those rows. And what a distribute streams operator gives us the opportunity to do is distribute those rows evenly. So boy you’re a real jerk. So if we look at this and we go to the properties and we look at the actual number of rows, the same thing happened here right? These numbers are all all over the place. I mean they’re not exactly the same but they’re definitely all over the place. But if we look at the row distribution after that distribute streams going into the nested loops join, these are all very very even right? Right we have 2, 4, 3, 2, 4, 3, and then 2, 4, 2. So we have a much better balance of rows on our parallel threads because of that. So because I’ve talked long enough and because I have a dinner reservation soon and I want to go eat, I’m hungry. I haven’t eaten all day for various reasons. I’ve been very busy. I know I don’t look like the type of person who skips meals but dinner is the most important drink of the day.

I thought breakfast was the most important meal of the day. I don’t know. One of those things. I’m sure maybe I’ll write an article about what the most important drink of the day is for BeerGut magazine. See what happens there. So anyway, uh, recursive CTE, kind of a pain in the butt. They can be very slow because oftentimes queries aren’t written away where they can engage a parallel plan. And even when they do engage a parallel plan, you have to be very, very careful about how that parallel plan is written because you could end up in a situation with incredibly skewed rows across parallel threads, making your query very slow. In this case, it’s a 30 second verse full minute query, and I will take 30 seconds over 60 seconds when query tuning any day. So that being said, thank you for watching. I hope you learned something. Hope you enjoyed yourselves. If you like SQL Server performance tuning content like this for free from young, handsome men who need haircuts, feel free to subscribe to my channel. If you like this video, well, give it the old thumbs up because that’s pretty much the only thing that brings me joy these days, especially before I’ve had the most important drink of the day. So, uh, I’m gonna get going now.

Thank you for watching, and, um, I don’t know. I hope, I hope you also enjoy your most important drink of the day too, if you’re the type of person who partakes in important drinks. So, thank you.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

When SQL Server Isn’t Smart About Aggregates Part 2

Keep It A Buck


Here are the missing indexes that SQL Server wants for our aggregation queries from yesterday:

CREATE INDEX
    p2   
ON dbo.Posts
    (OwnerUserId, Score)
WITH
    (MAXDOP = 8, SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);


CREATE INDEX
    c2
ON dbo.Comments 
    (UserId, Score)
WITH
    (MAXDOP = 8, SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);

I’ve taken a small bit of artistic license with them.

The crappy thing is… They really do not help and in some cases things get substantially worse.

Original Query


The original query plan is still awful. It is using both of our new indexes.

sql server query plan
oh okay

No early aggregation whatsoever. Though yesterday’s takes 23 seconds, and today’s takes 22 seconds, I’d hardly call ourselves indexing victors for the improvement.

Rewrite #1: Manually Aggregate Posts


This one eats it the hardest, again, using both of our new indexes.

sql server query plan
we gotta talk.

If one were to appreciate any aspect of this query plan, it’s that the optimizer didn’t choose a parallel merge join plan. Parallel merge joins were a mistake, and have driven me closer to alcohol induced comas than the Red Sox in the 90s.

The total runtime for this query shoots up to about 8 seconds. The biggest change, aside from a serial execution plan being chosen, is that only the Hash Match operator at the very end runs in Batch Mode. All other operators execute in Row Mode.

Rewrite #2: Manually Aggregate Comments


We go back to a parallel plan, but again, in Row Mode. This query now takes 2x as long as yesterday’s Batch Mode plan.

sql server query plan
try, try again

Again, both new indexes are in use here. This one is the most disappointing.

Rewrite #3: Manually Aggregate Both


The fun thing about all of these plans is that, aside from the things I’ve been talking about, they all have the same problem as yesterday’s plans: Unless we tell SQL Server to aggregate things, it’s not trying to do that before the joins happen.

sql server query plan
wrongo boyo

Again, the entire plan runs in Row Mode, using both new indexes. Though most of the operators are ineligible for Batch Mode, the hash operations are, but… Just don’t use it

It’s not the end of the world for this query. It runs within a few milliseconds of yesterday’s with the old indexes. It’s just disappointing generally.

Rewrite #4: Manually Aggregate Both, Force Join Order


I’m going through the motions a touch with this one, because unlike yesterday’s plan, this one uses the forced join order naturally. It ends up in a similar situation as the above query plan though.

sql server query plan
e-mo-shuns

Again, both indexes are in use, but just not helping.

It Seems Strange


Why would SQL Server’s query optimizer decide that, with opportune indexes, Batch Mode just wouldn’t be useful?

Regardless of key column order, the same number of rows are still in play in all of my examples, with or without aggregations. In many cases, the new indexes are also scanned to acquire all of the rows, but also even the seek operators need to acquire all the rows!

There’s no where clause to help things, and only a single one of the Row Mode queries uses a Bit Map operator that can be used to filter some rows out of the joined table early.

Quite a strange brew of things to consider here. But the bottom line is, additional indexes are not always helpful for aggregation queries like this, and may result in really weird plan choices.

If you’re dealing with queries that aggregate a lot of data, and SQL Server isn’t choosing early partial or full aggregations before joining tables together, you’re probably going to have to roll up your sleeves and do it yourself.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.