Indexing SQL Server Queries For Performance: Eager Index Spools

Indexing SQL Server Queries For Performance: Eager Index Spools



Thanks for watching!

Video Summary

In this video, I delve into the intricacies of eager index spools in SQL Server query plans, particularly focusing on their appearance and impact. I explore why these spools often arise, especially when nested loops joins are involved, and how they can significantly degrade performance without any accompanying missing index requests or optimizer hints. By examining specific examples, including a detailed walkthrough of a problematic query, I highlight the inefficiencies and potential pitfalls of eager index spools, offering practical advice on how to identify and mitigate them for better query performance.

Full Transcript

Erik Darling, right here in the flesh with Darling Data. Darling Data, of course, has no flesh except me. I am the only flesh. In today’s video, we are going to talk more about how to fix query performance issues with indexes. And in this one, we’re going to talk about when an eager index spool arises in your query plan. Now, I’ve talked about eager index spools many times. You’ll find other videos on this channel talking about them. But the things that I want to outline about them, that I find interesting about them, are they are not accompanied by a missing index request of any variety. There’s not going to be any green text in your query plan. And there’s not going to be anything in the DMV saying, hey, I created an index for you while you’re query ran, you should create this index. That doesn’t happen. Two, they are very, very likely to show up on the inner side of nested loops joins when you don’t have a good index, but sometimes even when you do. Three, even if they are building in a parallel plan, the index, the table that you read from to build the index, and building the index only happens on a single thread. So there’s a lot of wasted CPU threads in that whole thing.

And that can also make the index, make the table read in the index build take a very long time. Four, three, five and a half? B, Q, I don’t know, whatever. In parallel plans, eager index spool builds are accompanied by a weight called execsync, E-X-E-C-S-Y-N-C. But in a serial plan, a single threaded plan, they emit no weight, even though we did, we would be waiting on that spool to build. And that was something else. Oh yeah, building spools is highly inefficient. It is essentially a row by row build of the object. And Microsoft has made absolutely no progress in improving that scenario over the years. So there’s a lot of bad stuff about eager index spools, especially when they are built off very large queries.

So if you’re working with a query and you have big tables in your database and you see a nested loops join, and on the inner side of that nested loops join, you see an eager index spool, not a lazy table spool, an eager index spool, you can be pretty sure that either you don’t have a good index to support whatever predicate search you’re doing, whether it’s via join or where clause or a combination of both, or the optimizer is once again taking its nose beers and not using the index that you have. And I’m going to show you an example of that.

So with that out of the way, let’s talk about how you can buy me some regular beers, because I have enough nose problems without anything else. I don’t need your help. If you would like to support my videos, you can sign up for a membership. It is as low as $4 a month. That’s not even enough to get another knuckle tattoo.

You can join the 30 or so other people who have done so and who care enough about my endeavors and my mental well-being to pay me $4. There’s a link right there in the video description for you to do that. How convenient. Make it very convenient for you to hand over money. If you have spent your $4 on beers of any variety, you may like, you may comment, you may subscribe.

I would suggest waiting until you’re sober to do so, though, because the only thing worse than comments are intoxic comments. We don’t want those, do we? Because comments are forever. If you need help with SQL Server, if you feel like SQL Server has indulged in too much of any substance and it is just not working out well for you, I am the best SQL Server consultant in the world, according to BeerGut Magazine, of course.

And if you need any of this stuff done, I am just the nose to do it. And as always, my rates are reasonable. If you would like some very high quality, very sober training, sobering training, maybe.

It’s like these videos. It’s fun. About SQL Server performance, you can get all of mine for about $150 a month. That is the rest of your life.

You don’t need to resubscribe or rebuy. That’s about it. That’s the URL. That’s the discount code. Of course, there is a fully assembled URL for you in the show video description. The show description.

Is this a show? I don’t know. I am showing you things, but… Now, you show up on your screen. It’s not a podcast because I’m not just like, you know, vocal frying into a microphone about dumb stuff. Anyway, more about upcoming events in 2025.

For now, I don’t know. Enjoy your alone time. With that out of the way, let us talk about eager index pools. Now, probably the most fascinating thing to me about eager index pools is how often the optimizer will put them into query plans, often to the detriment of performance.

In a sane world, and by sane, I mean like in a world in which the optimizer kept up with the times, and the optimizer was more aware of current storage trends. I realize that storage in the cloud is probably as awful as the old spinning disk storage that you used to have on your, you know, on-prem SQL servers.

But, you know, at the same time, it would be nice if the optimizer were a little bit more aware of these things and didn’t feel the way it feels about IO patterns sometimes. There’s very little difference between sequential and, or rather, like when you think about like old hard drives, there’s very little difference between like doing this on an SSD and doing this on an SSD. Right?

Because SSDs don’t do this like old hard drives do. Everything is kind of randomly scattered about. Oh, there’s memory sticks anyway. There’s very little like logical order in there. So the optimizer, not really up to date on those things.

And the optimizer will frequently pick nested loops joins plans and put an eager index pool on the inner side of the plan. The outer side is up here. Then there’s the loop join.

And then you come down here. And this is where the spool goes. It’s not a very good time for your queries. In a sane, rational world, like the one that I was trying to describe there, the optimizer would just choose a merge join. Maybe not a merge join.

A hash join. Let’s say the optimizer would opt for a hash join and just throw a missing index request and move on. But no, SQL Server has to do something totally different. Now, there have been a couple weird occasions in my life where I have written queries in a way specifically to get an eager index pool.

And not just for the sake of a demo. There have been times in my life when having an eager index pool built was actually a better thing for the query than anything else. And the way that I ended up doing stuff like that was you can see this join up here.

It’s just a regular old equality predicate joining post to votes. But down here, I have written this as two inequality predicates. A greater than, equal to, and less than, equal to.

And since merge and hash joins require at least one equality predicate in order to work, this gets us a change in the execution plan. Now, I don’t recommend it for this particular query.

For this particular query, it is an absolute disaster. But this is the much better plan where we just have the single equality predicate. But if we rewrite that single equality predicate as two inequality predicates, all of a sudden SQL Server says the only thing I can do is use a nested loops join and have this on the inner side of the join.

So this is what I was talking about. Like in a sane and rational world, SQL Server would encounter a situation where it may have considered this in the past, and it would just do this instead.

Just do a hash join. Say there’s a missing index. Fine. Anyone can get on board with that.

What’s hard to get on board with is SQL Server building a 52 million row index in the middle of your query execution. Again, there were some very special edge cases in which I did that on purpose because it suited the workload better. In this case, not good.

Not a good time. So if I were to let that run and get an actual plan, that would run for about almost two minutes. So we don’t want to do that.

But what’s very funny, and I’m going to explain this query a little bit. Oops, what did I do? Oh, there’s already an object called P1. All right, well, we’re just going to call you P11. We’re not going to stop and mess with that. I guess I was practicing this one too much.

But we have this query here. And what I want to tell you about this query is that this is not a natural join in the Stack Overflow database. The users table, the ID column in the users table is a clustered primary key.

The ID column in the post table is also the clustered primary key. That is not how those tables relate. These tables relate by a column in the post table called owner user ID. The funny thing is that the ID column on the post table, like I said, is the clustered primary key.

So there is an index on it. So when I’m saying, like, joining one clustered primary key to another clustered primary key, you would think SQL Server would say, oh, I can do this in, like, a number of different ways. I can seek.

I can do a loop join and seek to everything that I care about. I could just scan it once and do a hash join. It could also scan it once and do a merge join, but we don’t talk about merge joins. But it doesn’t choose to do that.

It chooses to do this instead. And it chooses to build an eager index pool off of the clustered primary key here. This takes about, well, there’s four and a half seconds here. And this is where I’m going to show you a couple of the things that I mentioned about eager index pools.

One of them is that they read from this table and build the index pool on a single thread. So even though this is a DOP8 query, we ain’t doing that at DOP8, are we? We are doing that at DOP1, 17 million rows.

Two is that, like, so this is in a parallel zone and that happens. So if we take the four and a half seconds it took to read from this table single threaded and subtract it from this, we end up with, what, like 15 seconds or so of time building the eager index pool.

There is no missing index request up here saying, hey, we could use an index. And I forget if exec sync shows up in the query weights. It does.

There it is. This is the exec sync weight that I was talking about that shows up in parallel execution plans. If this query ran at DOP1, there would be no weight indicating that an eager index pool was built. We only get this in the parallel version of the plan.

So this is all highly inefficient, highly ugly. And the worst part is that we already have an index here. So sometimes when I’m tuning queries and I see an eager index pool, I’m like, dummies didn’t make an index.

And then I go look and I’m like, this is a perfectly good index. Eagle server. Toot toot. So sometimes what you have to do is you have to tell SQL Server, no, no, no. Trust me, buddy.

Put the spoon down. We should be seeking instead. So before I move on, I just want to say this query runs for 22.171 seconds in total. But if we run this query with the four secant, this should only take a couple seconds.

Oh, yeah, there’s already an object called P2. We’re going to make it P2.2 then. Again, I practice these too well. Sometimes I’m just too good at my job.

And this takes 1.5 seconds. And it turns out without adding an index or doing anything, we had a perfectly seekable index for SQL Server to use. Look at that.

We have a clustered index seek. Before we had a scan and we had a spool. SQL Server. Your trip to the powder room was ill-advised.

So in a lot of cases, eager index pools are really just irritating missing index requests. Sometimes they are optimizer deficiencies and shortcomings. But the grand scheme of things, I don’t know.

What do I prefer? Four secants or having to wait and sit there and wait while an index creates? I’ll take that four secant any day. So if you’ve already done your job and you’ve already indexed your SQL Server queries for performance and you’re not getting performance, take a look at those query plans.

You might see something awful like that. Now, let’s get this thing running. And we’re going to talk a little bit about this.

So a lot of the queries where I show off bad things that happen, I use either cross-apply or outer-apply because the inner side of nested loops is an interesting place for the SQL Server’s optimizer. It does all sorts of goofy things in there. Parallel nested loops are especially strange birds.

I thought for a bit about doing some videos about parallel nested loops, but I don’t think I could… I don’t think that even with all the time in the world in, like, Paul White’s notes and the SQL Server’s source code that I could do adequate justice to how weird parallel nested loops are. There is simply too much going on there.

But we have this query, and this is another situation where you might see an eager index pool arise. Like, just have a little subquery like this with an aggregate in it. And if we look at this query plan, and let’s just drag this way over here, something quite similar happened, right?

Here is our eager index pool. Here is our post table. So essentially the same thing happened.

Now, this is the index that we currently have on the post table, right? It’s on post type ID, then parent ID, then owner user ID. The part of the query where we got an eager index pool was on this.

So SQL Server is essentially complaining that we don’t have an index that helps with this. And again, for whatever reason, SQL Server could have just chosen a hash join or a join. But it didn’t.

It chose a nested loops join. It told us nothing about an index that might be missing. It just created that index down here. Now, the normal way of solving this problem is, or, you know, mostly what I do is I right-click on the eager index pool. And I look at two sections.

I look at the seek predicates. And this is what I would create as the key of an index to fix this problem. In this case, SQL Server wants an index on parent ID, right? We can see the parent ID column reference here, which makes sense because we are correlating the outer query with the inner query on parent ID, right?

Look at that. How nice. And the other thing you want to look at is, let’s go back to the query plan, and let’s get those things back up.

The other thing you want to look at is the output list, because the output list will tell you what columns you will probably need to include in your index. So look at the seek predicate. That’s the key of your index.

Look at the output columns. Those are the include columns in your index. 99% of the time, this will be good enough. You may have to think about things a little bit differently if you have, like, a top one with an order by or something, because the order by column might need to go in the key of the index, too. But for most cases, this will be good enough for SQL Server, at least to use the index.

I don’t know if that’s going to make your query as fast as you want it to be, but that’s what you could do. Now, there are ways that we could go about fixing this, of course. Let’s say we wanted to create this index to fix it.

We could say create this index on parent ID and then score, right? And that’s essentially what SQL Server was asking for in the eager index pool plan. It wanted a key column on parent ID.

Fine. It wanted us to include score and ID. Well, we have to explicitly reference score, because score is just a regular column in the table. We don’t need to explicitly reference ID in here, because ID is the clustered primary key on the table.

So that automatically gets built into any nonclustered indexes that we create. And now if we run this query, let’s just look at the estimated plan real quick. Make sure that we are spool free.

Look at that. We have no more eager index spool. And now if we run this query for real, it will not take 20 or 30 odd seconds to run. Oh, it should be a lot faster.

I don’t know what this thing is doing now. Ah, there we go. For some reason, that took 10 seconds. I’m going to have to have a talk with SQL Server. Did some…

There was some… Oh, see? This is the problem. This is the problem. This is why I hate… This is why I hate parallel merge joins. Look at that. Isn’t that ugly?

Look at this. I bet this repartition streams has an order by, and we have introduced intratread dependencies into our parallel plan. This is a terrible idea for a SQL Server.

If you see situations like this, where you have anything that preserves or requires ordered operators around SQL Server and parallel execution plans and your SQL Server queries, run screaming. So let’s look at…

This one has an order by on. Does this one have? It does. We have two order preserving exchanges in a row. Under worst circumstances, this could have been really ugly. We could have had exchange spills.

We could have seen spills here and here and maybe… Actually, no, that’s about it. I doubt that one would have, but you could have seen these spill and have real big problems. This one probably came very close to it.

So be very mindful of that sort of thing. I’m going to go a little off script here. And what I’m going to do is just to see if my hypothesis is correct. I’m going to throw a hash join hint on this query because…

Actually, hash join loop join because I have a feeling SQL Server would tell me it couldn’t create a query plan just doing hash joins here. But let’s just see if this goes any better.

Maybe it will. Maybe it won’t. We’re going to find out together. All right. So yeah, this one took five seconds. All right. Notice that we don’t have terribly painful things happening in here anymore.

And now we get a very helpful bitmap up here. And this, rather than taking almost 10 seconds, this is down to five seconds. So a little off script, but, you know, if I don’t do that sometimes, I don’t know, I might appear a bit stodgy here.

But anyway, that’s one way of dealing with it. Another way of thinking about this is let’s actually let’s drop this index because what I want to do is show you something a little bit different than just indexing.

So let’s think about what this query is doing. We’re looking for this, right? The thing is we’re looking for answers.

But in the query logic, we care about showing you back up here. We care about where the answer score is greater than the max score on this. The thing is, the thing is we know something SQL Server doesn’t.

We know something about our data. And what we know about our data is that only queries with a post type ID of two can be an answer.

And we already have an index where post type ID is the leading key column. So if we were to rewrite our query to do this instead, we wouldn’t need that other index, right?

So if we actually take a minute to consider our query a little bit, this looks like a very similar plan to the one we had before just without the spool down in here. Now we have another index seek in there. So if we run this query and we wait, I can’t promise that this is going to be as good and fast as our hash join query.

It’s probably about a second slower. But we look at the query plan. So this finished in about 6.2 seconds. The hash join plan was about five or so seconds. But we really do improve this area in here, right?

There’s no more eager index pool. And this time we took advantage of our index on post type ID and parent ID. And we have sort of a double hop seek where we have parent ID and post type ID being seeked into.

So sometimes when you’re trying to fix eager index pools, if you don’t have a good index in place at all for the query, you may need to create one. If you already have a good index in place, you probably need to use a force seek hint.

And if you have an index that like a nonclustered index that’s like is just a little bit off from what you’re trying to do, pay attention to the way that your query is written because you might be missing some valuable information to give SQL Server in order to be able to use that index more effectively. In this case, we used our domain knowledge about the data in the stack overflow database to say, hey, we only care about comparing answer scores to other answer scores. We need to just find answers in the inner side of that aggregate that we’re comparing to.

So with that out of the way, I guess that’s the whole video. Cool. It only took 22 minutes and some odd seconds.

Thank you for watching. I hope you enjoyed yourselves. I hope you learned something. I hope to see you. Well, I hope you see me in the next video. If you don’t, well, it was nice. You might be dead.

That’s real sad. Yeesh. I don’t like the sound of that. Anyway, thank you for watching. Goodbye.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.