Tasting Menu Archives | Darling Data

The SQL Server Performance Tasting Menu: Performance Issues With Parallel Query Plans

Posted on April 30, 2020January 21, 2026 by Erik Darling

Sancerrely Yours

Video Summary

In this video, I delve into the fascinating world of parallelism in SQL Server queries, specifically focusing on how to identify and address issues related to skewed parallelism. By running two queries—one at max degree of parallelism (8) and another at max 1—I demonstrate how to analyze query performance metrics such as duration versus CPU time. When these values are nearly equal but the query runs in a parallel fashion, it often indicates that something is amiss with the parallel execution plan. I walk through an example where a parallel query uses only a fraction of its allocated threads, resulting in inefficient use of resources and potentially slower performance compared to its serial counterpart. To illustrate this point, I also show how changing a nested loop join to a hash join can significantly improve query efficiency by distributing rows more evenly across available threads. This video is packed with insights for anyone looking to optimize their SQL Server queries and understand the nuances of parallel execution plans.

Full Transcript

This is the last demo that I have right now in this series. And it is, I think, one of my favorites because it deals with interesting problems in SQL Server regarding parallelism. So what I’m going to do is I’m going to run two queries. The first one is going to run at max.8. The second one is going to run at max.1. And I’m going to kick this off. And then I’m going to start babbling for a little bit because these things don’t finish terribly quickly. So one thing that I always look for when I’m tuning queries is I look at the duration of the duration of the query versus the CPU that the query used, which can be instructive in a few different ways. If duration is much longer than wall clock time, it can indicate that the query was blocked or had something else weird going on with it because we did not burn CPU the entire time the query was running. Another thing that we’re going to run at max.1.

So what’s very interesting is if CPU time is equal to duration, but the query went parallel. Well, that just shouldn’t be because under normal circumstances, we should see wall clock time be some divisor or be some product. I suck at math. Remainter. I have no idea. But like let’s just say that we had a query run for 20, 40, 60, 60, 80 seconds. That would make a lot of sense to me. But if we have a query run at max.4 for 20 seconds and then max.1 for 20 seconds. Well, that tells me something is amok with the parallelism. So what I’m going to do is go over to the execution plans and show you a little something. So we have the parallel query, which ran for 19.6 seconds. And we can validate that. We can make sure that our eyes are not deceiving us by going to the query time stats. And we see CPU time, 19,603 milliseconds, which is 19, excuse me, 0.6 seconds and elapsed time 19,601.

Not so good. The serial version of that query runs for 18.9 seconds. And we can again validate that by going to the properties and looking at the elapsed time. We have CPU time 19,606 and elapsed time 18,967. So that adds up. It does not add up that we had this query run at dop8 that it had three parallel branches reserved 24 threads and only used 17 of them. So we had 24 threads reserved, but we only ended up using 17 threads. But I don’t think we really went all that parallel. Let’s start over here.

All the way to the right. We have this is serial and this is serial and this is serial. Now we finally go and distribute streams, but we distribute streams for one row. So if we look at what went on here, we took round robin fashion. We took one row and we put it on thread one and we had two, three, four, five, six, seven, eight entirely empty threads. And we sent that into a nested loops. And we still have the same thing. I’m going to keep this open. Actually, we went to a nested loop join with thread one had 108, 107, eight, seven.

That’s an eight digit numbers. That’s almost 11 million rows on one thread. And we stayed on one thread there and it’s one thread there and it’s, oh, 27,000 rows, but still on one thread and 27,000 rows still on one thread. And over here, oh, we’re down to 21 rows on one thread and 21 rows on one thread. And, oh, well, we, okay, things, we repartitioned streams and things. Got a little better one, five, six, 11, 12, 14, 20.

Maybe I, maybe I added that up wrong somewhere, but yeah, it should be 21, right? Six, eight, nine, 14, 15, 20, 21. Okay. I did. I was off by one. Most common error in all of computing off by one. But yeah, so we had this, this massively, I mean, not massively massively, but we had a good chunk of parallelism for this query.

And we, we used none of it. Oh, it just makes you want to quit. This makes you want to quit. And so the, the serial version of the query actually finishes a little bit faster because we don’t have all the overhead of, of going parallel.

We don’t have to distribute streams or repartition streams or do, or, or gather streams. You don’t have to do any of that. We, we just use one thread the whole time and it kind of doesn’t matter. Right? Use one thread here. There’s only, only one thread to be used. Who cares?

Like this. And we, that we get a much more reasonable thread reservation of one. There’s no need to have thread stat here. We have a serial query. We’re, we’re only going to use one thread.

Ain’t that something? So that’s a bummer. That’s a, that’s a big, big bummer. Now, um, this problem does, does get exacerbated quite a bit by the nested loopiness of the joins.

So what I want to show you is if we change, uh, this to be a, in our hash, hag, hag join. I wish we had a hag join. I would love to join a hag.

If we run this, it finishes very quickly. That finishes in one second. Uh, and we don’t have all of the same weird loopy issues that we had, uh, with the other plan. If you look at the properties here, uh, actual number of rows, pretty, pretty even spread.

Uh, not like perfect, but I’m willing to accept that over having 11 million rows on a single thread. And, uh, I don’t know, this, this looks a lot better to me. So, if you have parallel plans and, uh, you look at, you’re looking at the execution plan and you notice that CPU time and duration are just about even, you might want to go digging through your execution plan a little bit and start looking for, uh, skewed parallelism.

It’s not going to always be that every single row ends up on a single thread, but quite often it does end up that way. That problem will get exacerbated tremendously by, by, uh, looping constructs, nested loops, key lookups, things like that. Because, um, you’re basically sending one row to do, one thread to do a whole lot of work over and over and over again.

Uh, and the more, the more rows get involved, the worse it is. Um, for, uh, queries with, uh, with smaller row counts that may not, uh, may not flare up so, so terribly. But, gosh darn it, for queries that, that have to process a lot of rows, you can certainly run into some pretty nasty performance issues.

So, uh, that’s my advice there. If you have a parallel query, CPU time and duration are neck and neck. Start digging.

Again, you might find that, um, hinting the query to be maxed off one gives you just about the same execution time. Um, and if it’s a, you know, you see a lot of looping constructs in the plan, try saying, I want a hash join instead. Because sometimes, SQL Server just chooses poorly.

And when it does, we get yelled at. This is the sort of stuff that we have to resort to. Looking at query plans.

Hash joins. Hints. Oh. Anyway. Thank you for watching. And I’ll see you some other time. I’m going to take a little break. Recording things.

It’s been a long two days of, of recording stuff. And, um, yeah. That’s all. Anyway. Thank you for watching.

I hope you enjoyed, uh, most if not all of these videos. And, um, sorry if some of them only come out one ear. I apologize for my unruly children.

I’m going to go start drinking. Because it’s daylight. Goodbye.

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

The SQL Server Performance Tasting Menu: How DISTINCT Queries Can Hurt Performance

Posted on April 29, 2020January 21, 2026 by Erik Darling

This Stinks

Video Summary

In this video, I delve into a common performance issue in SQL Server: the overuse of the `DISTINCT` keyword. I explore how relying on `DISTINCT` can lead to unnecessary work for SQL Server and discuss alternative approaches that can be more efficient. Specifically, I demonstrate using `EXISTS` instead of joins with `DISTINCT`, which not only avoids extra processing but also results in a different execution plan. Additionally, I highlight the importance of leveraging primary key columns when possible, as they can significantly improve performance by reducing the need for complex operations like hash matches and sorts. The video also touches on more advanced techniques such as using window functions to generate row numbers, which can be even more performative than `DISTINCT` in certain scenarios. Overall, this content is aimed at helping database administrators and developers optimize their queries and improve overall system performance by making smarter choices when dealing with distinct result sets.

Full Transcript

Ah, yes. Ah, yes. Video 857 on this 10,000th day of quarantine. Just kidding. Man, the jokes in the, some of the jokes in the video is going to age very poorly, assuming that we age beyond, beyond this week. So, in this video, I want to talk about something that I see a lot, and something that I see sort of frequently causing performance issues, is people being overly reliant on the distinct keyword. Now, this could, this is interchangeable with a long string of group buys. Now, I don’t want to say that this isn’t always necessary, because sometimes it absolutely is, but I want to show you a couple different approaches to getting around having to use distinct and, I don’t know, some stuff to look at. So, I don’t want to vote for, I guess. It’s fun. It’s fun. It’s a lot of fun, isn’t it? Isn’t it fun, SQL Server? Database is fun and sexy. Everyone wearing their bikinis today? I hope so. I am, underneath my clothes.

Yeah. So, let’s just say that at some point, your boss is like, yeah, we need to get a report of people who did a thing. Right? So, we’re selecting some stuff from users, joined to posts, and everyone who has a reputational one. I don’t know, whatever. The query itself isn’t that important. What’s important is the results. But when we run this query, we get a bunch of duplicates back. You get 280,000 rows, and gosh darn it, there are a lot of duplicates in these rows. There’s lots of people posted multiple times. Now, we might say, oh, God. I can’t handle it no more. I can’t take it. We might write a query that looks like this, where we get a distinct group of rows back.

And this doesn’t take terribly long, a couple seconds. We get fewer rows back. I get 218,000 rows back. And I don’t know. It’s not that big a deal, maybe. But did we really need to do this? Did we really need to go and make SQL Server do the extra work to make these values distinct? What if at some point we said, you know what, we should stop displaying user IDs?

Because when we display user IDs, that’s actually someone’s account number. That could, well, that’s not good. Maybe that would be considered personally identifiable information of some kind.

But there we go, where order by items must appear in the select list if select distinct is specified. The other thing that kind of stinks is that if we take out ID, we can have duplicate display names. So there’s actually a minor loss, well, not minor. There’s a loss of fidelity if we just get distinct display names.

There are 2.4-ish million records in the user’s table. There are exactly the same number of unique IDs, but there’s only 1.8-something million unique display names. So there are duplicate display names in there.

A display name is not meant to be unique. Now, one way to get a distinct result setback without having to use a distinct keyword is to use exists. Because with exists, we don’t present duplicate rows.

We just check to see if a correlated row exists in the other table. And when we do this, I mean, the wall clock time isn’t, like, really all that much better. But we do get a different plan.

This runs a little bit faster, and we didn’t have to use distinct. And we didn’t have to select the ID column, right? We notice that we’re not selecting ID, but we can still order by ID. So there are a number of benefits to using exists over a join when we don’t need to see all of the duplicate rows that come out of what would be a join operation.

Exists does not produce duplicate rows like that. Now, where this can make this a more profound difference is with not exists. So something that I’ll see quite frequently is someone follow this query pattern.

We’ll say, select count from users joined to posts. And they’ll… Unruly child.

I don’t know whose kid that is, being noisy out there. But if we run this query, and we see what happens, it will be less pretty than alternatives. So this runs for about 1.8 seconds.

And what kind of sucks about this query, the way it’s written, is you have a full scan of this index on the users table, a full scan of this index on the post table, and then a full join of both of those indexes.

We don’t eliminate rows until after the join. We spend all our time up in this area, and then after this, we filter out rows based on where p.id is null.

If we change the way we write that query a little bit, and we change that to not exists, it runs a little bit faster. We save about 800 milliseconds, and we don’t have that filter after the join now.

We have a left anti-semi join, which indicates that we are just looking for the lack of existence. We are looking for non-matches. Ooh la la.

Big words. Big, fancy database words. Now, where things can get annoying with distinct or with non-unique results of some kind, is if you have queries that select many columns.

Because a distinct is sort of like a group by across a lot of columns. You’re asking SQL Server to find a distinct group of tuples, more and more crazy database words. A distinct set of tuples across many columns.

And that can be, depending on the number of columns and the data types and other things, that can be a fairly painful process. So let’s start with this query. Now, I just hit F5 here, and that ran for about 20 seconds.

Two zero seconds. If you look at the execution plan, it is, I think, in the running for one of the most defensive query plans I’ve ever seen. The reason for that is this part of the plan down here, where SQL Server, because we have chosen a merge join.

See, the optimizer was like, we need a merge join here. We need to sort the entire votes table in order to have the merge join run. This ends up spilling a bit.

Not a ton, but a little bit. But 52,000 pages. Probably not the end of the world, but a decent chunk. And then it’s just not a plan that I like. I hate this plan for many, many reasons.

We’ll leave that there, though. That’s what we call a side thing. Sidebar digression?

I don’t know. Whatever. Words aren’t fun anymore. But now let’s look at how long it takes to return a distinct set of rows here. Remember, that returned 53,946 rows. Let’s say we only want distinct values back.

Okay. Okay. So, what’s important to note about this query are a couple things. One is one of the columns that we’re selecting is the ID column.

Hmm? Yes. The ID column is the primary key. It’s the clustered index of the table. And if you recall back to the trivial plans video, if SQL Server knows that a column is unique, then it doesn’t have to really do any work to come up with a unique result set.

In this query, we get back 25,000 rows. Good. Good, good, good. Good.

Good. And then if we look at the query plan. That took eight seconds, by the way. If we look at the query plan, we can see over here that this asked for around about 740 megs of memory. And if we look at how SQL Server made the result set unique, well, where is it?

We have a hash match aggregate on the votes table. So what SQL Server did was it only made the post ID from votes unique, right? So index scan, hash match aggregate here, and then the join to the post table here.

At no point in this set of columns, or rather in this set of operators, do we see SQL Server making the post table unique. So all it had to do was work on the votes table, which is nice. That’ll change a bit.

If we take the ID column out. So now rather than selecting distinct ID, then accepted answer ID, if we just say select accepted answer ID, well, this query will have some different characteristics to it. And I’m going to show you why this stuff can be kind of nasty, yucky, icky when you get into, you know, more and more rows coming back.

Larger result sets, you know, having, it’s a lot of work. Making things unique. It’s a lot of work.

This query has already been running for 20 seconds to return the, 21 seconds to return the same number of rows. This time the query plan looks a bit different. So now we still have the votes table is over here. And we still have the votes table getting aggregated on post ID right here, right?

And then sorted and then another damn merge join. Boy howdy. If I had a dollar for every time I wanted to shoot a merge join. But again, nothing happening in here.

But now, after we come out of this, we do this join, we do this. Now we have a distinct sort. Mmm. Ooh la la.

A distinct sort. We also have some kind of funny business in the operator times here where we go from 11.3 to 10.2 to 10.2 to 11.6. But that’s neither here nor there.

It’s a good effort on those. But now we see that there is a distinct sort in the data, right? We see the distinct sort. And if we look at the output list and the order by list, we can see that SQL Server implemented getting a unique set of columns in the result set by ordering by every single column that we selected. If we look at the memory grant for this, it is 229.200, whatever that means.

Whatever that’s supposed to mean. But that was obviously not a very good memory grant given the amount that we spilled here and here. It was not a good time for this query.

You can see if we look carefully, we start at one second. And that’s going to be about two seconds. But then we spend, well, about six and a half seconds there and a couple of seconds there and a second there. So this was probably not the greatest way to do things.

Now, quite often when I see people doing this, it’s a bit misguided. Because usually there is not, usually there is within the result set, there is usually a set of columns that we can use to identify a distinct result set. That doesn’t mean we have to use every single column.

So let’s take this query where rather than say we’re going to get a distinct set of columns from everything that we put in, let’s just get a distinct set of columns based on the ID and run this. And that happens quite a stretch faster than even the query where we, again, this unruly child who I wish I knew who it belonged to so I could flog their parents.

But if we look at how long this took, this took about a second to get that unique result set. And we did not, it did not take eight seconds like it even did when we included the ID column. So if you’re really gung-ho on saying I need distinct, damn it, please consider, if you can, including a primary key column.

And if not, perhaps generating a row number over, you know, a couple few columns that you can get unique results set from instead would be a better approach than saying I need a distinct, all of these columns because that can be painful and time consuming. Anyway, where things can get even more annoying, especially with rowstore indexes, is let’s take this query. Now, I had been testing this on SQL Server 2019.

There’s a bit more hope for with batch mode on rowstore. That’s why I have this hint down here to disallow batch mode. But if we look at this query, all right, and we look at what happens, the execution plan is rather wild, right?

We joined the post table to itself and we do a bunch of work here and there’s a spool here and a spool here and a spool here. So really, we fed into this spool and then we reused the data in that spool here and we reused the data in that spool here. And we use that, it can get a result set.

It took about four and a half seconds. Just, I mean, probably not terrible. But let’s look at that same query, right? Same exact query. But now, so SQL Server 2019, we would be using batch mode for rowstore. That would be, you know, probably better, that would be an equally good approach.

Just as good. But you get some attention cake too. But in early versions prior to SQL Server 2019, 2014, 16, 17 specifically, if we create a temp table that’s empty with a clustered columnstore index on it, and we do a stupid join to that temp table that doesn’t actually give us any rows but doesn’t eliminate any rows, it just kind of gives SQL Server this whisper of columnstore in its ear.

And it can start using batch mode for things. We start using batch mode for some rowstore operations. We get a much faster query.

At least we did last time I ran this. But who knows what will happen this time now that people are watching. So this runs in about three seconds, which is better, right? It’s better than four and a half seconds. And we have a far less complicated query plan to get all that done.

So, you know, if you are running these kinds of queries, these kind of workloads, then, you know, columnstore is probably a good choice for you generally. But SQL Server 2019 batch mode on rowstore will definitely start helping out there.

So, what did we learn? If we need to get distinct result sets and we are joining tables together, or if we need to find a lack of existence, usually exist or not exist is a better choice than doing join and using distinct over the results, or doing a left join and saying where some column is null to the join table.

If you need to get a distinct result set, if there’s a primary key on the table, that can certainly help with performance if you’re including that. If not, then usually you can get better performance by using a windowing function like row number to generate row number, figure out where row number equals one.

And usually just doing that over a smaller set of columns is more performative than your alternatives. The more performative alternative. Maybe that’s going to be my company slogan.

The performative alternative.

Going Further

The SQL Server Performance Tasting Menu: How You Compare Date Columns Can Hurt Query Performance

Posted on April 28, 2020January 21, 2026 by Erik Darling

Time Flies

Video Summary

In this video, I delve into an important aspect of SQL Server performance tuning: how date math in WHERE clauses can significantly impact query execution time. Often, developers use parameters or variables to filter data based on dates, but they frequently make a critical mistake by wrapping the column in functions like `DATEADD` or `DATEDIFF`. This approach can lead to poor performance as the dataset grows, even with usable indexes. I demonstrate this issue using a stored procedure that filters data based on a number of days, showing how it performs poorly when wrapped in a function and explaining why rewriting the query to avoid column functions results in much faster execution times. By creating an appropriate index and running queries both before and after making these changes, you can see the dramatic improvement in performance firsthand. This video serves as a reminder that simple adjustments in your SQL logic can have significant impacts on overall database efficiency.

Full Transcript

Aint no good news. So what I want to show you is how where you choose and how you choose to do date math in a where clause matters greatly to performance. Now a lot of times I’ll see people with a stored procedure that accepts a number. And that number will typically be a number of days or months or some other element of time. And they’ll use that to filter data. So if someone wants 30 days of data, they’ll use that to calculate 30 days in some manner. Now the big mistake is that you’re going to make. I always see people make. I always see people make. Is that they write their queries like this. Where they’ll say the date diff between date diff and days or whatever, whatever, whatever, whatever, uh, uh, uh, element of time. Between a column in the table and say today’s day, usually this will be get date. But since the, uh, the stack overflow 2013 database that I’m using only has data up until the end of 2013, I’m substituting to the beginning of the beginning of the beginning of 2014 so that I actually get data back. Otherwise, if I did get date, it would take hundreds of days. It would look crazy. So I have 30 days. So that will get me 30 days worth of data.

But this is the absolute worst possible way to calculate this because you are wrapping a column in a function. And this will, this is not a good way to do this. This is, this is, this is, this will, this will end poorly as data gets bigger and bigger, even with usable indexes, which I’ll show you in a minute. Right now, the only index I have is on the post table on the ID column and includes owner user ID. So let’s run this query and let’s see how long it takes. Okay, we’re done. Now, uh, this took about 2.6 seconds. If we, uh, head on over to the right and we look at the, look at the votes table, we can see that predicate.

DateDiffDay, convert implicit, dateTimeOffset3, stackOverflow, 2000, uh, creationDate, uh, less than, uh, oh, I mean, there’s a lot going on in there, right? A lot going on, less than 30. And of the 2.6 seconds that we spend in the plan, 2.1 of those seconds is spent in the votes table. Now, let’s change the way that we do that date math a little bit. Let’s do it so that the column is not wrapped in that function.

What we’re going to do is we’re going to say where that column is greater than or equal to, and then we’re going to subtract 30 days from, uh, 2014-0101. Now, and again, normal, like, you know, data that’s flowing into a database now, this would just be getDate or something. But, uh, again, old data, gotta, gotta, gotta look, it looks sort of out of date over here.

So, um, let’s run this query and see if we do any better. My goodness, that was much faster, wasn’t it? That took one second flat. And this isn’t because the data was magically in cache either.

If I run this query again, it’ll take just about two and a half seconds again. There’s not going to be a big time difference there. 2.6 seconds. And if I rerun this one, it’s going to be just about a second again.

Hmm. Yes. Yes. Look, we have tuned a query today. Now, we still scan the votes table, but notice how much less time we spend scanning the votes table. We go from about 2.1 seconds to about half a second there.

All right. We still have that predicate. We still have an unindexed predicate here, but we’re not converting that column over into anything. There’s no convert implicit.

There’s no column wrapped in a function. There’s no, you know, row by row evaluation of stuff. We do a much better job of this. This is a much less painful way of expressing that same logic. Now, where this gets even more pronounced is if we have a good index in place.

So I’m going to create this index. And if memory serves, this will take about 20 seconds. So I’m going to sing my favorite song.

No, I’m not going to sing. I’m not going to sing at all. But this is something that I see repeatedly with performance tuning clients where, you know, every single time they’re like, ah, slow query. Ah, we’ve got to do this thing.

I always find stuff like this. I mean, it doesn’t have to be, you know, a date function that’s wrapped up in a column. It can be really any kind of function can screw you up in this magnificent way. So now let’s run, now that we have an index on the votes table on creation date and post ID, let’s look at how this changes our filtering here.

And let’s look at how that changes our filtering here. So let’s go and run the query that took 2.6 seconds before and run it again. And ooh la la.

Well, okay, so we did a little bit better than 2.6 seconds. We’re at 2.541 seconds now. So we may be saved somewhere between 50 and 100 milliseconds. Good job.

Good, good, good query tuning there, pal. Pat on the back. You knocked that one out of the park. Now let’s run our query written in a smart way with that index. Remember, before this took just about one second.

All right. Ah, that got better too, didn’t it? Now we’re at about half a second here, 570 milliseconds. I’m willing to say that’s about half a second.

You can argue with me in the comments if you want. I’ll ignore you. And now let’s look at what happened over here. We have an index seek that takes 0.33 milliseconds. That’s not bad at all.

If we go back and look at what that other query did, written in a bad way, right? This is bad. This is the not good way of doing things. If we look at what that did, even with a usable index in place, we still have to scan the entire votes table and apply that predicate across.

That is not a seek predicate. That is a predicate that gets evaluated, row by row by row by row. So we have to read all of the rows in the votes table to evaluate that predicate.

We cannot seek to anything with that predicate. With this predicate, we are free to seek and we are free to read far fewer rows. We go look what we did there.

That is an index seek. And now this is one of those times where an index seek is better than an index scan. I’m willing to concede that there are times when that is true. But if we look at what the index seek did, we no longer have a predicate.

We have a seek predicate where we were able to figure out everything that met that predicate. But only that. We didn’t have to scan the whole table to figure it out. So when you’re writing store procedures that need to do some sort of date filtering like this, please, please, please, please, please leave the column out of it.

Do date math on parameters, variables, literals. Do date math on literally anything other than the column. There are times when you’ll need to figure out if two columns, right?

Like let’s say instead of this being a column and a literal or a static value like this, let’s say that this was a creation date and let’s say like a last activity date or something. There are times when I understand you’ll have to do that.

And that’s when you would want to use a computed column to figure that out. I’m not going to get into that in this video because that’s a little bit beyond what I’d want to do. And we’re already getting close to 10 minutes.

And apparently they’re coming for me. So I’m going to say goodbye.

Going Further

The SQL Server Performance Tasting Menu: How Multi-Statement Table Valued Functions Hurt Performance

Posted on April 27, 2020January 21, 2026 by Erik Darling

You’re A Drag

Video Summary

In this video, I delve into the common pitfalls of writing bad code in SQL queries, specifically focusing on joins and how shortcuts can lead to suboptimal performance. I walk through a complex query that uses OR conditions within joins, highlighting its inefficiencies and explaining why such constructions often result in poor execution plans. By dissecting the query plan, I demonstrate the impact of constant scans, hash joins, and nested loops, ultimately showing how breaking down the problematic join with UNION ALL can significantly improve performance. This video serves as a reminder to always be cautious when using shortcuts in SQL queries, especially those involving joins, and encourages viewers to consider alternative approaches for better query optimization.

Full Transcript

You all write bad code. Every last one of you. That’s why you’re here watching these. Joins are a part of queries that are particularly sensitive to bad code. And when I say bad code, I mean shortcuts. Shortcut code. It’s like taking a shortcut and getting stuck in traffic. Everything that’s a shortcut to you, anything that makes writing a query easier for you, often makes it far more difficult for the optimizer.

Things like, you know, functions or case expressions. When you use those relationally, I don’t mean like in a select list. I mean like, you know, if you use those in joins and where clauses, you can severely mess things up. It’s not pretty. on the or tiny.

something really funny? a pretty short. And he that petit a. It this is态. at. or build and joins. And this is a somewhat complicated query. Not like very complicated.

It’s just a lot of stuff. There’s a lot of things going on in it. There’s this table which is going to get us information about questions, and then this CTE which is going to get us information about questions, and then this CTE which is going to get us information about answers, and then this CTE which is going to make sure that we did not self-answer anything.

We’re going to join those together. And then this CTE which is going to join to the votes table, and this is where our problem begins. We’re going to join votes on the answer ID being equal to the post ID, or the question ID being equal to the post ID.

Whew. Yeah, let’s run this query. Let’s see just how awful this is. Because I’m willing to bet that on the scale of awfulness, I mean it’s not going to take an hour or else I wouldn’t be here. I’d be doing something else. I’d be taking a bath letting Calgon take me away.

Now, in query plans there exists a chunk of XML which can help identify when stuff like this is happening. Unfortunately, it shows up for a lot of things. I was going to write a rule in it for SP Blitzcache, but unfortunately the XML parsing that it would take to correctly identify when this happens in a join would have been pretty bulky and unwieldy, and there’s enough going on in there already.

I mean like you can just look at the query text and it’s that you see a join with an OR in it. Just smack whoever wrote it as hard as you can. All right. So that runs for 14 seconds. All right. We will go look at the query plan. It is not a fast query.

14 and a half seconds. 14.573 if we’re going to pretend to be exact here. And where does the problem start? Boy oh boy. Where does it start?

Well, we have a lot of stuff going on over here. But by the time we get anywhere interesting, we get up over here, we hit about 4 seconds.

Hmm. Hmm. Hmm. Hmm. Hmm. Hmm. And then when we get to here, we are now at 14 and a half seconds. And if we scroll all the way down and all the way over, we can start to see where our problems arise. So let’s take a quick step back.

Notice this hash join right here. The number of rows that come out of this hash join is 205 4907. If we boogie on back down to where we just were, we have two constant scans that each produce 205 4907 rows out of one.

A slight misestimate. Itty bitty misestimate. So that happens.

We get a constant scan here. And SQL Server is like, ah, well I can turn that into fewer rows. And I can take that 2 million rows.

205 4907. That’s about 2,500,000. And I would say, and I bet I can, I can squish that down into less, but we, we, we never actually do. We, we concatenate that and we, we continue to make a poor guess.

So 4, 4, 4.1 million out of, out of two. And we, we have that again here where we spend about six seconds sorting 4.1 million rows. All right.

You can see, you can see that happening. Those order buys, right? Fun stuff. And then we spend another 200 or so milliseconds trying to merge things together. And we do, we do a pitiful job there. And then we do with this index seek right here.

This doesn’t just happen once. Of course, this happens 4.1 million times. So the reason why we spend about 10, about 14, about what was it?

Let me make sure that I’m being accurate here. The reason why we spend about 14.5 seconds in this branch of code down here is mostly trying to, um, create a unique set of, uh, rows from the results of that hash join. So that hash join is actually what’s feeding into these constant scans, right?

That’s join right here. And then all those rows emit from over here. We do this nested loops join 205, 4907. And what happens over here?

We execute this index seek. Uh, come on. Where are you, Jimmy? Number of executions, 4109814. So lots of good fun times there, right? Lots of awesome stuff.

And we can avoid all of this awful, awful work if we just break that join up. So we’re not going to touch any other part of the query. This is going to stay the exact same.

This is going to stay the exact same. This is going to stay the exact same. What we’re going to do is take that or and have one side of it represented here. Union all.

Have another side of it represented here. And then we’re going to do the exact same thing selecting out of it. So let’s do that. Let’s highlight all this. And now let’s run this query. And I’m going to bet that this is faster.

Not only because I’ve done this demo a bunch of times, but because it just usually happens when you take an or out of a join that things get better. So let’s go find where that was happening or let’s go look at the plan to see what happened instead of all that malarkey.

Well, pretty nice. We have one scan of the votes table over here rather than 4.1 million Sikhs of the votes table. So for all you people out there who get on your obviously high horses and talk about how Sikhs are better than scans.

Well, not always. Not always. Not always.

If Sikhs were always better than scans, then why would they even build scans into the product? What would be the point? If Sikhs were always faster, why would there be scans?

Like why would there be an intentionally slow thing? Why? Just to mess with you?

No. I don’t think so. Anyway. Whenever I’m tuning queries, whenever I’m looking for obvious anti-patterns, things to start picking apart, or conditions and joins are one of the first places that I stop and look because they will almost always be a pretty big tripping point and a great way to identify when they are, you know, in a query plan when they’re causing issues.

I’m not going to rerun this thing and make you sit there for 14 seconds again. But when you see that pattern in query plans where you see constant scans and all this other stuff and nested loops in a Sikh, that’s usually when I start getting nervous and when I start wanting to rewrite things either as union or union all, kind of depending on, you know, what kind of mood I’m in.

Well, you know, union makes things distinct and union all does not. So there’s generally a penalty with union distinctifying things. We’ve spent a lot of time in some other videos talking about the pain that comes from distinctifying things, but there are times when distinctness can be useful.

So if you’re going to choose between union and union all, I usually start with union all, see how it goes. Then if that goes successfully, I might also try union just to see if eliminating duplicates buys me anything. And if not, then switch back to union all.

But always test between one and the other. Make sure that you realize that there is a logical difference between the two and that there are times when you might need duplicates to get correct results. Anyway, that’s all I have to say here.

Goodbye.

Going Further

The SQL Server Performance Tasting Menu: Tuning Modification Queries With #Temp Tables

Posted on April 24, 2020January 21, 2026 by Erik Darling

CIA

Video Summary

In this video, I delve into the intriguing world of Halloween protection in SQL Server and how it can impact query performance, particularly when dealing with modification queries that require data from the same table being updated. I explore a specific example where an eager table spool is used to handle Halloween protection, leading to significant execution time. To demonstrate a potential solution, I show how staging data in a temp table can bypass this built-in protection and potentially speed up the query by reducing the overhead associated with the spool operation. The video also includes some humorous moments, like an unexpected interruption from a local wine store delivery, which adds a bit of levity to what could otherwise be quite technical content.

Full Transcript

I wish I had someone to do like intro music for these things. I wish that like I was comfortable enough with the rules of copyright to have like a brief sample of a song that I like to do as intro music for these videos. I have a, like I heard somewhere at some point there was like a 30 second rule. Like you could have like 30 seconds of a song and then it’s gone. But there are a lot of bands that I, I mean, I don’t know, let’s still, like at least, used to listen to where 30 seconds would be like a good five, 10 seconds longer than the song. So whatever. I want to talk about Halloween protection and I forget why I call this spooky hand. In retrospect, that’s probably terrible for SEO, isn’t it? It’s got, got nothing to do with hands or spookiness. I mean, I guess, I guess aside from the fact that Halloween is a bit, a bit spooky. Boo. Anyway, uh, the, what’s, what I think is interesting about, um, Halloween protection in some cases is that, uh, it’s, it’s done via an eager table spool. And there are many times when I’ve found that, um, uh, a temp table is a faster approach. In other words, manual Halloween protection versus just letting a SQL Server do its own HP as they call it in the hood. So let’s look at, uh, an example. If I’m going to reset this column back to this being null, that’s, this isn’t the problem we have. Our problem is not with this, this, with this update. This update is fairly straightforward. We’re not, we’re not worried about this.

This update. Um, the reason I have the where one equals one there is because if I don’t, um, SQL prompt will freak out. But if I, if I, if I throw a where one equals one on it, it just runs. So thanks for checking. Ah, the things you do for love. Okay. So we got query plans turned on and I want to show you an example of, um, uh, an update. Well, I mean, it could go for, I mean, I guess it go for a delete as well or an insert. In some, some cases where, um, the values that you need to pull from are also, uh, are stored in the table that, uh, you’re modifying. And in this case, we are setting, uh, the max score column equal to the max score.

Uh, when we look at the max score. Uh, when we look at the max score. Uh, when we look at the question score and answer score in this total score by user table that I totally made up. So, obviously, this, the score in the Stack Overflow database will, will tally up the total question score and answer score for each user. And then the max score column tries to figure out if their question score is higher or their answer score is higher.

Okay. Whatever. It demos well, at least. Uh, but when we run this, you know, since this, this is the table that we, we need to update and it’s also the table with the data in it, SQL Server is going to use a spool to protect us from Halloween. So, throw some holy water on it, I guess. And this thing runs for, I don’t know, like five, almost six seconds, 5.7 seconds.

And if we look at the, the query operators and we kind of tally up where we start spending time. Well, it’s about a second up until this merge join, right? And then we get over to this table spool and we hit 3.4 seconds.

So, that’s about two and a half seconds that we spend in the spool itself, right? Because three, about three, 3.4 minus 1.062. I’m willing, I’m willing to call that two and a half. I’m not interested in arguing about a hundred and something milliseconds with you.

And then, uh, there’s about another 1.3-ish milliseconds, uh, seconds spent actually doing the update. Now, I don’t have any, like, firm, um, you’ve got to do this when type rules because query tuning is a lot about experimentation. Um, you know, there’s all sorts of different, uh, as, as I learned from a very wise Kiwi who is not just full of bird ideas, uh, that, um, there are many local factors involved with query tuning that may make different approaches and techniques worthwhile.

One that I find in this case is if I look at a query plan and I see, uh, an eager table spool in it for a modification query like this, I might, I might try staging data in a temp table myself and then running the update from that temp table. That takes out the need for SQL Server to do Halloween protection because we have a, a separate structure to read from.

That’s what a spool is. It’s a separate structure. And, um, there are all sorts of fun, um, optimizations that, uh, doing things with temp tables can get that, that eager index spools do not get. Uh, if I, if I tried to come up with an exhaustive list, I would probably be here for longer than I want to be.

And, um, I’m not going to do that. What I am going to do is show you a way to have modification queries go faster by avoiding, uh, the, the built-in Halloween protection. Now, uh, what we’re going to do is dump the results of that union all into a temp table.

And, and I just got a phone call from, uh, the, my local wine store that I have placed a delivery from. And, uh, I had to, I had to pick a substitute wine. So if there’s a little hiccup in this video, now you see why.

I didn’t even attempt to make it look seamless. Isn’t that fun? Cool. All right. The wonders of modern technology.

I can order wine, record SQL Server training videos, and, uh, that’s it. That’s all I can do with my life is order wine and record SQL Server training. I was going to come up with another thing, but it doesn’t exist.

So, uh, we’re going to, I already did it. Let’s do it again. Let’s do it again for fun. Let’s stick the results of that query into a temp table. That, that happens pretty quickly. That only takes a second.

All right. So that’s, that’s good news there. And then, uh, let’s update from that temp table. And that should take just about a second and a half. It does. Right? So cool.

So all together, if we run this all at once and one, one big hunk of chunk of munka of burning love, this runs in just about two and a half seconds, one second to do the insert. Oh, actually, no, it’s faster than faster.

Look at that. Look at that. So much faster. So much better. Right? So once about two seconds total, which is awesome because the original took like what? Six and a half seconds. Now we think of all the locking and blocking time we could save ourselves by, by doing manual Halloween protection rather than doing automatic Halloween protection, I guess it would be called. Right?

Manual versus automatic. Get the stick. All right. Whatever. So, um, yeah, uh, whenever, whenever I’m looking at queries and trying to tune them, uh, you know, there are some servers you get into and there’s just massive locking and blocking issues.

You might not be able to, uh, get an optimistic isolation level turned on, or you might be in a situation where, uh, modification queries are blocking each other. Right? It’s not just like readers and writers fighting.

It’s like writers are like, and like optimistic isolation levels don’t help that. No way, no how. Um, I mean, snapshot isolation can sort of, but it’s complicated. And so, uh, there are times when it does help to, uh, reduce the duration of modification queries to reduce locking, to reduce blocking.

And one great way to do that is to use temp tables like I just showed you. Oh, it’s crazy. Oh, it’s insane. Oh.

All right. Cool. Uh, I’m done here. I’m going to go record another video. I’m going to hit save there. I think we’re good. All right.

Cool. Talk to you later.

Going Further

The SQL Server Performance Tasting Menu: Useful vs Useless Bitmap Operators In Query Plans

Posted on April 23, 2020January 21, 2026 by Erik Darling

Bitwise, Filter Foolish

Video Summary

In this video, I delve into the fascinating world of bitmaps in SQL Server query execution plans. I explore when these magical, filter-y creatures are truly beneficial and when they might just be a waste of space. By walking through various execution plans, I show you how bitmaps can significantly reduce the number of rows processed early on, leading to faster queries. However, I also highlight scenarios where bitmaps don’t offer any real advantage, leaving us with an understanding of their true value and when they might not be worth the overhead. This video aims to demystify these often-overlooked elements in query plans, helping you make more informed decisions about your database optimizations.

Full Transcript

Bitmaps. This one’s about bitmaps. You ever seen a bitmap? I’ve seen some bitmaps. You ever seen a batch mode bitmap? You ever seen a batch mode bitmap be good? Yeah, me either. Anyway, let’s look at some row mode bitmaps. Those magical, wonderful, bloom-filtery, sometimes making your query faster-y things. I’ve spent a lot of time reading research and literature about bitmaps, and that’s pretty much word for word what all of them say. Not a whole lot else. So it’s good that they’re so easy to convey to the layman. So let’s look at times when bitmaps are useful. Let’s look at times when bitmaps are useless. And let’s look at times where bitmaps don’t end up where you might expect them.

Bum, bum, bum, bum, bum, bum, bum, bum. Bitmaps. Bitmaps. So let’s see, when is a bitmap useful? Well, they’re generally useful when they, they, they, they reduce the number of rows that come out of a scan. All right, so let’s, let’s go look at this execution plan. And let’s, let’s do ourselves a little thinking. Let’s do, use our, use a little bit of that gray matter that’s been atrophying away. This extended Netflix and chill session. So in this execution plan, we have a bitmap that gets created here. And this is where bitmaps get created. Generally, sometimes in parallel merge join plans, they might be on the inner side of a join, but we’re not going to look at those because parallel merge join plans were a mistake.

They should never happen. So a bitmap gets created here. It sort of keeps track of values that we’ve seen. We create a little, little bitmap filter that we can apply elsewhere in the query plan. This is where it gets created, but where it gets used is on the inner side of a join. If we hover over this clustered index scan right here, and we look at the tooltip, we have a predicate where we are probing in row for data. Ooh, in row probe. Ooh, la la. Sounds fancy. Sounds like a fun time. Hmm. But we can see that probe is on bitmap 1004, which is the bitmap that got created up yonder. And what it’s doing is before rows even pass through various storage engine-y things, we are, we are filtering them out.

So there are rows that are not going to really even enter the scan. We are at the storage engine layer, filtering out things that we don’t need. And you can see that reflected a bit. Now it’s easy to see here because there’s no additional predicates. If we had like another predicate on the users, filtering something on the users table, you might see an additional predicate applied here. And it might be a little bit better. And it might be a little bit harder to ascertain if the bitmap was useful or useless.

But in this case, we have 39,789 of about 2.4 million rows leaving the scan. Normally, one might think that this was just a bad estimate. But in this case, it is because we have this bitmap that is going in and filtering rows out. Isn’t that lucky for us? If we look at the number of rows that we estimated to read, which is all of them, and the number of rows that we actually read, which is significantly less than 83,000.

And then the actual number of rows that passed out, which is about 39,000. This bitmap did a significant amount of filtering for us. And we did not. And this query did less work because of it, which is fantastic. A fantastical time. Now, let’s look at when a bitmap is useless. Let’s scroll on down here. My mouse wheel has been so weird lately.

It’s like sometimes it goes and other times it just jumps around. And I don’t know anymore if it’s mice or if it’s my laptop or laptop dock. Because it seems like I get a new mouse every three months and then out of nowhere the scroll wheel starts just not being helpful for me. So, anyway, useless bitmap is this one. And it’s not because the query is so very slow or anything and whatever.

It’s more about what the bitmap actually does. Here, again, we’ll look at the index scan on the user’s table. Let’s hover on over that. We have our predicate in row bitmap again. Our hero has arrived here to save the day, here to make this query amazingly fast. Except it’s utterly, absolutely, pointlessly useless. There’s no point to this bitmap whatsoever.

If we look at things in here, we might think, wow, we got such a great estimate. This is the best query plan ever. Good job, optimizer. Way to guess. Way to guess way better than you did on that other plan. But we’d be overlooking the fact that we have a useless bitmap involved here.

So, again, the bitmap gets created here. And we start keeping track of user IDs that we’ve seen and not seen. We’ve got you making a filter. And then we apply it down here. But this time, rather than reducing the number of rows or the number of rows read, that doesn’t happen.

Every single row passes the filter. So if we look at the number of rows that we read, it’s every single row in the user’s table. And if we look at the number of rows that passed out of the scan, it’s like every single row in the user’s table.

So this bitmap did effectively nothing. This bitmap, I mean, it got created and it did stuff, but it didn’t help us filter out any rows. Now, this is sort of, I don’t know, I guess, the best possible scenario for a bitmap to get applied over at the index scan. Right? That happened in the last two queries we looked at.

Sometimes, bitmaps get stuck. So let’s look at a case where a bitmap can get stuck in a funny, funky, awkward place. Now, if we look at this query plan, we again, we have a bitmap. It gets created here.

Same as the last two. First verse, same as the last verse, same as the first little… I forget how this one goes. But if we hover over the index scan here, what do we have? Nothing.

Just a scan. Just a regular old scan. And if we look over here, we have a hash match aggregate. We are building a residual. Building that.

And then if we… There’s no probe here either. But then if we finally go over to this repartition streams, we will find our probe. Look at that.

How did that happen? I don’t know. Magic. Magic. Crazy, crazy magic. Witchcraft. So at the repartition streams operator, that is where we start applying the bitmap filter.

I don’t know that we really… I don’t know that this is really helpful. I mean, the estimate might be partially a little bit wonky because of this aggregate, which is completely wrong.

Right? This user ID aggregate, we are off by 1600 or so percent. And then the number of rows that pass through here is all amok. But it’s hard to know if we can blame the bitmap on that or if we should blame the bad estimate on the aggregate.

I would imagine it’s some confluence of the two, but I’m not really a math guy. So I tend to avoid that. So bitmaps can be great.

If you can get some early… Early… Early… I’m gonna have a son and name him early. If you can get some early semi-row…

semi-join row reductions, you are in good shape with a bitmap. Generally though, if you get a bitmap and it doesn’t… isn’t really helpful, you will probably not notice.

It probably is not the end of the world for your query plans. But this is an interesting subject and it’s one that I don’t see a lot of people talking about. So it’s one that I wanted to teach you a little bit about because it’s one of those sort of like weird things in a query plan that a lot of people overlook and don’t fully understand.

Like where they actually get used, what they actually do, how to tell if they’re useful or not. So there you go. There’s a little bit of knowledge for you.

A little bit of wisdom on this brisk Saturday afternoon. Thank you for watching.

Going Further

The SQL Server Performance Tasting Menu: How Predicates Against Different Tables Can Confuse The Optimizer

Posted on April 23, 2020January 21, 2026 by Erik Darling

Fendi Bendi

Video Summary

In this video, I dive into an intriguing limitation of implied predicates in SQL Server. Typically, SQL Server is quite adept at leveraging a predicate across multiple parts of a query to optimize performance. However, I explore a scenario where the presence or absence of certain predicates can significantly impact the execution plan and runtime. By analyzing a specific example, we uncover how the optimizer’s handling of join predicates within hash joins can lead to suboptimal plans, even when the data filtering seems straightforward. This discussion highlights the importance of residual predicates in query optimization and offers insights into why sometimes adding an index or restructuring a query can dramatically improve performance.

Full Transcript

In this exciting, scintillating, thrilling video, we will be discussing some, well I think is a pretty interesting limitation of implied predicates in SQL Server. Normally, SQL Server is quite good at taking a predicate and using it in as many places as possible to to make a query fast. Let’s take this query for example. Right? Now, the results of this thing are absolute garbage. There is nothing useful about this. But the query plan has a few useful things in it. One, we have a very rare impact of 100 on the nose for a missing index request. Fortunately, this query already finishes relatively quickly and already has a fairly low cost. But one, one might wonder, if we add that index, would it truly cost zero to run this query? Interesting, interesting question. One that we’re not going to answer today. We’re going to leave that there. Another thing that this query plan has going for it is this little red X on the nested loops join. And a warning that comes along with it saying that there is no join predicate.

Boy, oh boy. Boy, oh boy. I know what you’re thinking. Erik Darling has finally lost his damn mind. He has finally gone completely soft after four years of quarantine. He’s finally lost his mind. But no, it’s not me, it’s you. You. What happened? Well, if we look at this side, look at this clustered index scan, we have a predicate where user ID equals 22656. And if we look over here, we’ll have another one. We’re having a stubborn tool tip today. But if we look over here at this clustered index scan, we will also, oh, we were there. We will also have a predicate of 22656. Meaning that any row that leaves here or here will already be user ID 22656. So yes, there is no join predicate here, but every row is going to match anyway. So yes, there is no join predicate here. So let’s take a break. Let’s take a break.

I don’t know. Let’s take a break. So let’s take a break. That’s just because that’s what we were looking for. P. Owner user ID equals 22656. And we’re joined on user ID. So what could possibly go wrong? I appreciate how well intentioned that warning is but I think think maybe someone other than the summer intern should have worked on it because there’s some fault there’s some fault in those stars now that’s good right that’s that’s a good trick yeah we we used a predicate in the where clause to filter both sides of the join that’s that’s good news to me I like I like the way you move SQL Server this one’s a little bit different though so here we’re joining comments and votes and we’re joining them on post ID and we have a filter on where post ID equals 138 or the user ID in the votes table equals 831 now something kind of sneaky here is that user ID exists in both tables and it’s going to be user ID is also a match for between them so that’s that’s cool but let’s look at this query in this query plan and and try to figure out why it runs for like eight seconds to figure this figure this out to count 41 rows well over in the over in the query plan everything is kind of okay here and everything is I mean not great here now we have enough for those those unfortunate repartition streams but everything gets really wonky here why is it taking that long to do a join why our joins slow good lord everything that access consultant said was true joins are slow we should just move to excel files we put them in Hadoop and never have to join them but let’s look at what happened inside of this job we have a hash keys probe on post ID it’s kind of fairly fairly straightforward because that’s what we were joining on but then look at this probe residual that also was on post ID so we had to residually probe we had to make we had to make double extra sure that the hashing algorithm was correct and then we also have another another set of residuals on c dot post ID equals 138 or v dot user ID equals 831 now you might be wondering why we didn’t have those residual predicates over here right because that could be there and that could be there I don’t know I think that’s a little weird but let’s look at what happens if we change this query a little bit so let’s look at what happens when we put both predicates on the votes table so where v dot post ID equals 138 or v dot user ID equals 831 and let’s see what happens now we run this we get our 41 records back and we get them back very very quickly now we don’t have that big embarrassing hash join do we no we have a lot of an index seek into an apparently useful index on the votes table and another index seek into an apparently useful index on the votes table don’t let no one tell you Eric darling can’t index his way into a good demo it’s the one thing one thing I can do is get you like that man cap and then we have a concatenation of those two indexes and then a sort of a but but none of this is none of this is taking any time to do it right now we have a lot of time look at all those zeros look at all those wonderful goose eggs it’s like watching Nolan Ryan pitch doesn’t get much better than that and we have an index seek here more zeros there are zeros all the way to the end and despite all those zeros we still have a cost of go point something so I don’t know what maybe maybe we should look at that that that index on the post table query again I don’t know it might be fun might be interesting so so that’s nice and again I see this in a lot of entity framework queries where it is just doing something very very goofy with the where clause and things are things are turning out badly for people I don’t like it now or conditions kind of get a bad rap in general if this or were inside the join I would I would be nervous as heck and that’s that’s sort of what happened in the slow queries that that or clause got moved inside of the join now normally you might see like if it was written like actually handwritten inside the join you might see like some spooling or something going on in the plan but this one is just has this weird confusing residual predicates at the hash join so be mindful of those when one is query tuning always pay attention to the residual predicates inside of a hash join because you can find some interesting things inside there now the optimizer is generally pretty good at figuring this stuff out just not really in this case anyway I thought that was interesting hope you did too and I will see you over in the next video but whenever that is who knows it’s already been 25 years since I started recording these yes yes yes oh yes

Going Further

The SQL Server Performance Tasting Menu: How To Count Distinct Values In SQL Server

Posted on April 22, 2020January 21, 2026 by Erik Darling

Counting On You

Video Summary

In this video, I delve into the intricacies of counting distinct values using windowing functions in SQL Server, a topic that often leaves many developers scratching their heads. I share my experience of having to delete 15 minutes of footage because I accidentally took a sip of coffee and felt uncomfortable with the swallowing noise it might have picked up. This minor distraction led me to focus on the importance of clean audio for viewer enjoyment. The core content, however, centers around how counting distinct values over a window is currently impossible in SQL Server—despite Postgres supporting this feature effortlessly. I explore various workarounds, including a complex query rewrite that significantly improves performance but requires more code than necessary. Through detailed analysis and humor, I highlight the ongoing challenges with windowing functions and advocate for their continued improvement, emphasizing the need for better support and syntax in future updates.

Full Transcript

I want you to know how much I care about your listening and viewing pleasure with these videos. I just deleted 15 minutes of video because I took a sip of coffee towards the end and I felt that there might have been a swallowing noise picked up and that made me uncomfortable. And so I said to myself, I’m not going to make these nice people watch 15 minutes of video just to hear a swallowing noise and feel grossed out because there’s one thing that I can’t abide. It’s mouth noises, swallowing noises, chewing noises, any of that. It’s all grotesque. I don’t have any form of OCD whatsoever. Don’t worry. I want to talk about counting distinct values. I don’t have any form of OCD in the context of windowing functions because we can’t. It’s unfortunate. And the workarounds that we must use to get this to work are weird. And then getting them to be fast is also kind of weird. So without further ado, let’s go look at what the heck I’m talking about. Now, this is my index right here. This is the one that I have on the comments table. It’s on user ID and then score. And I don’t think that you could ask for a better index to support this query. It’s our where clause, right? User ID is not null. Score is greater than one. That’s good setup for this.

And then more importantly, though, in this query, it is our partition by element is user ID. And we are getting distinct scores. So we’re set up very, very well to do this. The problem is, and this query runs really quickly. This is one of the fastest queries I’ve ever written because it throws an error immediately. It’s just done, right? It doesn’t get much faster than that. We can’t use distinct with the over clause. Why? I don’t know. We can count. We can count distinct. We can count over. But for some reason, counting distinct over is just impossible. Can’t be done. Or rather, hasn’t been implemented. Why? I don’t know.

I hear Postgres does it for free, though. So I don’t know. Maybe you want to think about migrating to Postgres if this is really important to you. If you would, I don’t know. Maybe the world would end if this was implemented properly. Well, if you’d like to vote for the world ending, you can go to this link right here and vote for this. Vote for this as a feedback item. Keep in mind, though, like this bitly link is case sensitive. So mind those Cs and Ds and Ps when you’re typing it in to go vote. Thanks for doing your democratic duty to vote for the end of the world.

So without much further ado, I think there was already without further ado. So let’s cut down on even more further ado here. And let’s look at a rewrite of that query. Significantly more query was written in order to get that to happen. So here we are selecting a distinct from comments. Cross-apply comments to itself correlated on user ID with the filtering elements that we need in there.

And this will, thankfully, not throw an error. However, it will not be particularly fast either. Using my patented scientific rows to second ratio, this is not a very fast query, right? Three seconds, 14,000 rows. Not into it. Maybe if we got, like, down to three seconds from some much higher number, it would be better.

But starting at three seconds, I’m like, oh, oh, not cool. So I’m not going to show you the query plan for this now because it’s sort of irrelevant at the moment. It’s not fully relevant until I show you the rewrite that is fast and why the plans are different.

So let’s scroll on down. Let’s also get rid of some more further ado here. Now what we’re going to do in this query is we are going to add a sort of redundant-looking group by on user ID. I say it’s sort of redundant-looking because we are not projecting the user ID column inside of the cross-apply.

We’re not selecting user ID. We’re not bringing that with us. So what’s the point of grouping by user ID? We’re just getting a count of distinct scores. We’re correlating on user ID here.

What possible good could grouping by user ID do? Well, how about this? How about zero seconds for a rows to second ratio?

How about that instead? Now, again, I’m not going to show you the query plan here because it doesn’t matter here. The query plan matters down here where we can look at them together and make fun of them.

We did a thing. So let’s see. This one’s still slow. About three and a half seconds. See? Three and a half seconds right there.

And if we trace the lineage of that three and a half seconds back a little bit, if we step back from here and here and here and here and here, we can kind of see that the bulk of the time in this query is spent just about in here. This is where most of the time the query plan comes from.

There’s a little bit of time spent in this branch. Now, keep in mind this is SQL Server 2017. This is a row mode execution plan. And so the times going from right to left will largely be cumulative.

So all these times will add up going from right to left. All right. So we spent about 335 milliseconds from here to here.

All right. So we start 252 milliseconds and then another 25-ish. And then you get the point. Every operator includes the child operator or operator’s time and its time. By the time we get over here, we’re at about two and a half seconds.

Of that two and a half seconds, we’ve got about 500 milliseconds on this index scan. And then about a full second in the repartition streams. So then we have another, like, second or so here.

So why, though? Why? So if you remember back a couple videos ago when I was talking about other distinct matters and I showed you a pattern where people, to find rows that don’t exist in another table, will use a left join where some other column is not null.

And how that’s an anti-pattern because you’ll end up with a full join and then a filter after the join. And if you had the filter before the join, then you would be able to, or rather if you either had the filter before the join or if you write the query using not exists so that you don’t have to use a filter, you can use an anti-semi join to rule out rows, you have a much faster query.

And that’s true here as well. So if you look at what happened, we aggregated down to about 150,000 rows here, but then we had to fully join those 150,000 rows to about 25 million rows. Now, this is kind of funny and kind of funky because it’s a self-join.

So any row that comes out of that 150,000 is going to match to the 25 million, right? We’re going to have matches there. We’re guaranteed to have matches, like fully guaranteed to have matches there.

And then it’s not until after we join that and get a full 25 million rows back. And then after that, we filter down to about 10 million rows. And then way later in the plan, we’d get down to about 14,000 rows.

Nearly at the very end of the plan, we get down to about 14,000 rows. That’s different in the bottom, in the second query, the bottom query, where we get down to 13,000, those 13,000 rows fairly early.

You can see the filter is before the join here. We don’t have the full hash join of 25 million rows to 150,000 rows. We have an apply nested loops join.

You can see in there we have that outer reference. So that means we got apply nested loops. And we can verify that by looking at the index seek over here. For some reason, that tool tip is being stubborn.

But we can see the seek predicate there for user ID. So we get down to about 14,000 user IDs. And we hit those and we get our rows back here. And then we very quickly get back down.

We aggregate back down to 14,000 rows. So we skip a lot of the pain of this hash join of joining 150,000 rows to 25 million rows and filtering down to about 10 million rows afterwards.

We get down to only producing about 10 million rows from here and then aggregating down to 14,000 rows nearly immediately, which is, I think, pretty spiffy.

So what’s the deal? That’s a clever rewrite. Sure, it’s probably helpful to some people sometime. But I think more to the point is that windowing functions have a long way to go.

I mean, I appreciate that we’ve had them, that we got them, that the problems they solved, they replaced, you know, a lot of painful cursor and correlated subquery stuff.

But there’s a lot of basic linguistic support still missing. And, like, trust me, I like them better than the alternative, but boy, oh, boy, we haven’t had, like, a serious update to windowing function syntax since, I don’t know, 2012-ish?

Somewhere in there. There’s a lot of stuff that we would have, should have, could have had in windowing functions, but instead we got, I don’t know, big data clusters, distributed availability groups, and I don’t know, other wacky stuff that someone likes, I guess, somewhere.

I don’t know. I don’t touch this stuff. Anyway, that’s that.

That’s what I had to say about that. I’m going to go turn off recording, take a sip of coffee, and then record another video. So you’re welcome.

And I love you. Goodbye. Bye. Bye. Bye.

Going Further

The SQL Server Performance Tasting Menu: How Optional Parameters Hurt Query Performance

Posted on April 21, 2020January 21, 2026 by Erik Darling

NOOOOOOOO

Video Summary

In this video, I delve into the ins and outs of SQL injection by exploring common query implementations that can lead to security vulnerabilities. Starting off with examples of poorly written stored procedures meant to allow flexible searches across multiple columns, I demonstrate why these approaches are not only inefficient but also dangerously expose your database to SQL injection attacks. By walking through a series of flawed queries, including the infamous “universal search string” and dynamic SQL implementations, I highlight how these can be exploited to gain unauthorized access to sensitive data. To address these issues, I then introduce safer methods for implementing dynamic SQL that prevent such vulnerabilities while maintaining performance and security.

Full Transcript

And now, my friends, we talk about SQL injection. And of course, the way that we talk about SQL injection is by talking about where SQL injection usually stems from, and that is queries where someone decides against all better judgment in the universe to give users a flexible search query. Users are just the worst. So, what happens is, someone decides you need to be able to search across a number of columns, but only if we care about them for a particular search query, right? So, you know, you have a whole bunch of columns in your table, and you don’t know which one a user might want to search in for a value. And so, sometimes they might search in one or two or one and three or two and three or two and three or two and four or whatever. You get the point. It’s optional.

If someone isn’t searching for something in a column, we still need to be able to return results. And a lot of the ways that these queries start off is by writing a stored procedure like this. And I have to be honest with you that this is… I am not a fan of people who overuse the word literally for things. But this is literally the worst possible implementation of this kind of query.

Do not do this under any circumstances. This will not perform well. It might work okay if you have like a hundred rows in a table. But anything, any amount of real data is going to cause problems here. Now, if I create a stored procedure that looks like this, and I just do a simple search, this query will run for a long time because the post table is rather large.

If I remember correctly, it’s about 17 or so million rows. This query will run for 17 minutes and 43 seconds. 43. You see that right there? 17 minutes, 43 seconds.

Select top 1000, all this other stuff. If we look at the query text, you can verify that that is the query that I just showed you. If we look at the search parameters, we can see the parameter list. That is what I just showed you, 35004. And distressingly, if I show you the predicate that we searched on here…

Oh wait, we don’t have one here, do we? No, we don’t. Isn’t that unfortunate? We don’t actually filter anything here. We do a doot, a doot, a doot, a bomb, and we filter everything here.

Look at that ugly, ugly predicate. No wonder this thing takes 17 minutes and 43 seconds. No wonder. So let me now show you the second worst possible implementation of this type of query.

Where rather than having this alt string and doing this isNull doodad, we say where each column is like the search string or the search string is null. The one universal search string thing always sucks. Always sucks. Don’t do it. If I run this store procedure with this setup instead, it will turn out just as horribly as this one.

I’ve tested it, but I’m not going to sit here for 17 minutes and stare at you. I mean, I would if we were paying by the minute, but you’re not paying anything for this. So again, forgive my limited patience.

If we wanted to start to make this a little bit more sane… I’m not sick. It’s okay. If we wanted to make this a little bit more sane, what we would have to do is move on from the universal search string to separate searchable parameters that match the data type of the column that we’re searching for.

Which starts to get a little bit better, but you will still get pretty bad performance from this. Now, you could potentially make this a little bit better with a recompile hint, but I’d rather teach you the right way to do things. After I show you the wrong way to do things.

Because there are a number of wrong ways to do things. There is a fourth worst possible implementation of this, literally. Let’s look at the fourth worst possible implementation, where we use very unsafe dynamic SQL.

And we start off on the right foot. We start off doing what could potentially be considered the right thing, but we muck it all up over here. We just… this is not our friend over here.

Now, what I hate… or really, what I hate is that little Bobby Tables cartoon. I know many of you are fond of it, but I think it’s stupid because it doesn’t show you the real risk of SQL injection. Yes, you can prevent someone from dropping a table.

Amazing, I know. I’m not a security expert, obviously. I stopped bouncing years ago. But you can stop someone from dropping a table.

But it gets a little bit harder to manage more granular permissions. Especially when you have applications that might need fairly escalated permissions to run normally. There are lots of applications that I’ve seen that need to create and drop objects.

That need to query system views, to look at table layouts, primary keys, column data types, create agent jobs, create databases, add databases to availability groups. I’ve seen applications that need to do basically everything that a sysadmin would do. These applications are written to, I mean, sort of take the place of a DBA needing to do anything because they don’t trust a DBA to be able to do the right thing.

The application owners are like, look, I know what I need to do. I’m going to write my application to do it. Screw whatever happens, right?

Wherever you install this, it’ll take care of itself, right? Self-healing, something like that. Now, if we were to run this query, like this, with this implementation, this would actually run pretty quickly. Let me just make sure that I actually created that sort of procedure.

I sort of forget what happened five minutes ago. So let’s make sure that we have that created. And let’s run this. And that actually runs pretty quick. If you look at the query plan, that’ll happen about as quickly as we can imagine.

We get this one very smart person back who is interested in when they should build their own buffer system for IO in C++. Gosh, you’re smart. You are very, very smart.

How? How do you even come up with that? Building your own buffer IO subsystem in C++? My goodness! Your brain must just explode out of your skull whenever you have a thought. The problem with this kind of search procedure is that it’s not that we could drop a table.

It’s fairly easy to audit or to recover from a table being dropped. It sucks, but you can do it. What is worse and mostly undetectable is that people can start poking around in all sorts of system views that applications, the many applications that I see would have completely unrestricted access to.

So let’s take this body parameter and let’s add a little bit of extra T-SQL in there just to go look at sys.tables. Alright, so let’s go run that and we’ll get back fairly quick results. I don’t know why that took three seconds this time.

Kind of funny. But now our execution plan, well, let’s actually, let’s see. What took three seconds? I’m so confused. Nothing in there should have taken three seconds. Nothing took three seconds.

That finished quickly. Something is unlocked. Something is strange in here. But anyway, now we get back our one row. We had a successful search for this person who wants to build their own buffer system for IO in C++. But now we get back a list of tables.

Hmm? Yes? List of tables. Correct? Yeah? Yeah. And we even get back this one particularly interesting looking table called hackme. Now if we were to take this a step further, we could write another query to go look at sys.columns.

And we could have sys.columns focus in on that hackme table. Yeah? So if we run this, hopefully this won’t take three seconds.

Ah, that was much faster. Thank goodness. But now we get back again our very smart person who wants to build their own buffer system for IO in C++. And now we get back also a list of columns in the table hackme.

I mean, pretty bad. It sucks. And now we can even go a step further.

And we could just say to union all to the columns that are in that hackme table. Right? We could run this query and whoo, that finished quick too. Thank goodness our query finishes quickly.

That’s a primary concern. But now not only do we get, again, this very smart person back. But we also get back this additional row that has someone’s social security number, date of birth, name, account number, all sorts of things that we probably wouldn’t want falling into the wrong hands. A lot of SQL injection, you know, jokes focus on data being defaced, right?

Dropped, deleted, updated to have some quirky value, stuff like that. But I think most of what we’ve seen in the real world is being like what’s threatening to people, like what the bad outcome is, is data being stolen. Someone like walking out the door with a bag full of your user data.

It could be customers. It could be, you know, health patients. It could be credit cards. It’s like so many, like financials, like so many different things that people can leave with.

And the more personal your data is, the bigger risk that that poses. So now we have to write this store procedure in a way that is both fast and does not let someone steal your data. And we can do that by writing safer dynamic SQL.

Now, if you call in this store procedure, the way that we got everything was to concatenate these parameters directly into the string. So there’s a break in the string, right? The string breaks here.

We have an and sign here. String breaks here. We have an and sign here. String breaks here. Blah, blah, blah, blah, blah. You get the point. If you go down to look at how this dynamic SQL is written, there is no break in the string. It is one contiguous thing.

This is all red text here. There’s no break into normal text. So we have owner user ID. We have all these parameters inside of the dynamic SQL. This is a much safer way to implement dynamic SQL.

This is not always a perfect way to implement dynamic SQL, but it’s much better than the alternative, which is probably losing your job and a bunch of personal information of your customers. And the reason that I think, I mean, one reason why I see a lot of people not do this is because there’s extra typing involved, at least from what I can tell. So we have to have variables or parameters, rather, inside of the dynamic SQL.

And we have to assign those parameters to the dynamic SQL when we use sp execute SQL rather than just exec. And then we have to assign values to those internal parameters over here. You don’t have to use at I.

I just use it so that we can differentiate easily between what’s taking place inside here and what’s being passed into there. But now with our stored procedure written like this, using, I think, probably the safest form of dynamic SQL, what happens is if you try to run a search query that looks like this, we no longer get results back. And we no longer get results back because all SQL Server sees here are parameters, right?

There’s our internally passed in parameter. There’s our another internally passed in parameter. And we only get results back if we send in valid searches.

Now we get back our very smart person who wants to build their own buffer IO system in C++. Now I’m going to backtrack a little bit because I realized I forgot to show you what the dynamic SQL looks up, looks like up here for this stored procedure. So let’s go set this one back in.

And five that. Let’s come down here to where we execute that first query that goes and hits sys.tables. And let’s go over to the messages tab.

And what you can see in the messages tab is that rather than seeing parameters like we saw in the safe dynamic SQL, we see the actual full text of what got passed in. So now you can now you I think you can understand the difference when you use passed in parameters like this SQL Server just sees whatever gets passed in when you use parameterized dynamic SQL like this SQL Server only sees the parameter about only sees the parameter. It doesn’t see the value of the parameter when it executes this when it executes the string.

So you can use it to search data, but you can’t use it to execute execute more commands on top of whatever you are going to pass in. So this is much better from a number of point of views. This store procedure a finishes relatively quickly.

B doesn’t allow people to steal data. Now there’s other stuff that can get weird with dynamic SQL. SQL I I understand you can get plan cash blow. There’s there’s other stuff that my my dear friend Erlen Summerscog covers in his article.

I’ll link to that in the video. But if you if you have to settle on a way to write dynamic SQL, if SQL dynamic SQL is truly what you need and is truly the path you have to take for this kind of query, then his article about dynamic SQL is probably I think where everyone I know learned how to write dynamic SQL.

So it’s probably a safe bet that you could learn a fair bit by going there. Well, so that’s all I have to say here. I’m going to get a refill and I’m going to record another video, which hopefully you’ll watch.

Maybe maybe you’ll maybe you’ll learn something. Anyway, thanks for watching. See you over the next.

So very soon. Thank you.

The SQL Server Performance Tasting Menu: Why Read Queries Block Write Queries Sometimes

Posted on April 20, 2020January 21, 2026 by Erik Darling

IMPOSSIBLE!

Video Summary

In this video, I delve into an intriguing scenario where read queries can block write queries—a situation that might seem counterintuitive at first but is actually quite common due to how SQL Server handles key lookups. I walk you through a stored procedure that selects data from a table with an incomplete index, leading to a key lookup operation that holds shared locks on the clustered index for extended periods. By simulating this scenario, I demonstrate how these long-held locks can block write operations, even though the read query itself completes quickly. This video not only explains the technical details but also highlights the importance of understanding execution plans and lock management in SQL Server to avoid such blocking issues.

Full Transcript

Hello, dear friends. I hope this video finds you all well. And in this video, I’m going to show you something that I had seen happen before and didn’t quite understand, and it took me a long time to figure out. Actually, I don’t want to even say figure out. I want to say it took me a long time to puzzle out exactly what was happening here. And then, of course, after I spent a long time puzzling this out, I found that other people had run into it as well, and it was… I could have saved myself a lot of time. But then again, you don’t really get to looking how I look by being smart. So, there’s that. Anyway, this video is about how read queries can block write queries. And I know what you’re thinking. This is impossible in other things and a bad French accent. But I assure you, dear viewer, it is absolutely possible. But it takes a little bit of engineering. So, what I have here is a store procedure that selects some stuff from the votes table. Now, over on the votes table, I have an incomplete index. This is not a covering index. This index is on creation date and vote type ID, which helps with my where clause, but not the column I’m selecting, grouping, or ordering by. And even though I’m doing some variable assignment here just to throw away results, it still ends up being kind of weird. Now, I’m going to run the store procedure with a very selective date value in order to get a bad plan. Now, this plan isn’t actually bad. It finishes very quickly.

But what it does is it sets us up with an execution plan where we do a key lookup back to the clustered index of the votes table. We’re able to do an index seek to locate rows that we want from our where clause. But we need to come back here to output the post ID column from votes. Right. Cool. Now, we’re going to introduce a parameter sniffing issue. Now, this doesn’t happen because of parameter sniffing. This happens because of a key lookup. But this is a really, really great way to get a key lookup to run for a long time so that I can capture the behavior easily. This is partially done out of laziness more than it being important to the demo.

So what I’m going to do is I’m going to kick this off. I’m going to come over here. I’m going to start an update. I’m going to count to like three or so. And then I’m going to come over here and run sp who is active. I’m going to kill this because I don’t actually want this to finish. I’m going to roll this back even though I don’t think there’s anything going on there.

Now, we can see over here that this query did finish. Right. This store procedure did finish. We did run for about 10 seconds here. And over in the sp who is active window, we have something very interesting. We have this select statement. Right. This is our select.

And we have that select statement blocking our update. Here is our update. And here is that update having waiting for about six seconds on an intent exclusive lock. If we go over a little bit further and we look at what this thing wants, it wants to lock the object and it’s waiting.

Look, it’s trying to lock the votes table and it’s waiting there. And I forget if this is interesting or not. Maybe sometimes it is. Not really. Object. Yeah. So over here we have a shared lock that has been granted on the votes table. Right.

There it is. Granted. Request count one. Status granted. Request mode shared. What this means is that we immediately get the rows that we want from the nonclustered index. Or we get a row from the nonclustered index. And then we spend all that time trying to read data from the clustered index to output for the rest of the query.

While we’re doing that, we hold a shared lock on the clustered index. Under normal circumstances, another non-lookup circumstances, I would say instead. Under non-lookup circumstances, reads are taken and released very quickly as we read through the index.

If we let go of that shared lock here and we let that update happen, there is the potential for returning incorrect data from one iteration of the key lookup to another. Right. Because if we let go of those shared locks and let’s say that we are reading, reading, reading, and something changed in the clustered index, we could return basically a mangled row of some kind. It wouldn’t like it would be like using no lock.

Something other stupid thing. Yeah. So, read queries, when they do lookups, can block write queries. You can see that right here.

Here, select, blocking a write. If you, out there in the real world, ever witness a read query blocking a write query, I will bet several dollars to several donuts that you have in the blocking read query lookup in your plan.

Now, there are some additional whoosie whatsits to this demo. So, our query did a key lookup, which a key lookup is always implemented using a nested loops join. There is no way for a lookup right now to be implemented via any other join type.

I don’t see that changing, so I don’t see any immediate relief from this. The nested loops join that was used in the query plan, which actually might still be up here. Yep, it is.

Look at that. Beautiful. If you go to properties, with unordered prefetch is true. So, we can see that over here. The nested loops join used in unordered prefetch to get data from the clustered index. And that caused shared locks, object level shared locks, to be held on the clustered index until the query finished.

If I had waited the 10 seconds out for that query to finish, then after 10 seconds, this would have started doing the update. But we caught about 6 seconds of blocking between the read query and the write query here. So, blocking isn’t always just writers blocking readers.

Sometimes readers can block writers as well. Fun stuff, huh? The things you learn. The things you learn in a day. That’s so much fun.

It’s so much fun and it just keeps getting easier to remember everything. That’s the nice part about getting old and enjoying wine is that remembering things gets easier. Anyway, thank you for watching.

I will see you in the next video.