Dangarang
Video Summary
In this video, I delve into a unique scenario involving eager index pools in SQL Server and how they can occur even when an appropriate index seems to be in place. Specifically, I explore why these index pools might form despite having indexes that should theoretically work well for the query at hand. Using a real-world example where a `cross apply` is used to fetch data from another table, I illustrate how SQL Server’s decision-making process can lead to suboptimal performance due to the order of columns in an index and the lack of efficient seek predicates. By walking through this case study, I highlight the importance of carefully considering index design when optimizing queries, especially those involving complex joins and correlated subqueries.
Full Transcript
Howdy folks, Erik Darling here with Erik Darling Data, and I wanted to record a quick video about eager index pools and a reason why they may occur even though you’ve created an index that is perhaps nearby what your query is asking for. Now, I’ve vlogged before about why eager index pools might pop up if you’ve got no useful index or even if you have a very index that the optimizer ignores. But this is kind of a weird third case. And I’ve seen things like this happen maybe when someone listens to C1 of SQL Server’s missing index requests and the order of the columns in the key of the index which are just in which are only supplied to the missing index request by the ordinal position in the table might not be the most efficient, effective, happy index request. So, in this case, I have a query that is selecting data from the users table and then cross applying that cross applying to the badges table. And we want to get the top end per group. This is what we’re doing in here. This little chunk of query. So we’re selecting the top one badge name from badges correlated on user ID ordered by date descending. And this is an OK query, but it’s not just in the top one.
But it’s not really a great index because we lead our index with name, then user ID, then date descending. If we look at the badges table, you know, we might see, oh, well, you know, I don’t know, maybe SQL Server gave us a stem missing index request and now we didn’t make our query any better. So, what you might see here is because we have to correlate on user ID in order by date descending, but name is the first column in the index, we’re kind of buried, these two columns are kind of buried behind it. We don’t have an equality predicate on name. We had, if we, our where clause was also like, and badge name equals happy camper, then we might, then we could seek to here and then seek to here and then we would have this in order, but we don’t.
So we can easily display this, but it’s not helpful as a first column in the index. Now, why this is kind of funny is because we have this index and SQL Server uses the index that we created on this index over here called squirrel in order to feed into this index. So it’s basically taking this index and rearranging the columns in it.
If we zoom in a little bit and we look at what it’s doing, it creates that index keyed on user ID and then it has name and date in the included columns. Eager index pool structures are effectively clustered indexes, but you can think of them the same way as like a nonclustered index where the seek predicates are key columns and the output list are includes. It’s just like if you created a clustered index on user ID, name and date would technically be includes and that index.
So SQL Server does this down here because we have a nested loops join here. SQL Server is estimating that we would have to loop 13,659 times and SQL Server does not want to take 13,569 rows from here and then scan the entire badges table that many times. So it scans this index on the badges table once.
We have one number of execution, one scan. And just like in other times when we create an eager index pool, even though the plan says it’s parallel, all the rows end up on a single thread, which is no bueno as far as I’m concerned. These eager index pools always build serially.
So we build that index, which allows SQL Server to seek into this index 13,659 times, do a quick top one sort and then return data out. So the reason why I write a lot of queries that show stuff like this using cross supply is because cross supply most often optimizes as a nested loops join. Because it optimizes as a nested loops join, we kind of get the effect in our query plan where SQL Server is going to do something repetitive down here.
And SQL Server uses spools to sort of mitigate the effect of repetitive behavior. So eager index spools, table spools, stuff like that. All those things come into play on the inner side of nested loops.
And it’s just a lot easier to get SQL to say, I’m going to use a nested loops join when I use cross supply. It’s simply a demo writing effect. It’s not because cross supply is bad.
It’s not because nested loops join is bad, even though it kind of is. I’m kidding. Nested loops join is fine. Fine. All you nice OLT people out there with your nested loops joins. It’s just to kind of show you that a lot of times on the inner side of nested loops, in other words, on this side of nested loops, a lot of awkward things can happen.
In this case, SQL Server took an index that we thought might be okay, or rather, maybe we got a missing index request that said name, user ID, date. And we were like, ah, we’ll just create this index blindly, and all our queries will be faster. And then this query got slower because we forgot a semicolon.
So eager index spools may also happen just because you made a bad index or because you made an inopportune index for a specific query. In this case, it would make total sense if we just reorganized this index a little bit. If we took name from here, and we stuck it over here, and then we said, oh, I don’t know, what’s that thing with, I don’t have SQL prompt over here, so you’ll have to excuse the crappy typing.
Drop existing equals on. And we reorganized this index a little wee little bit. And you see that that took four seconds, which is a lot faster than the 18 seconds that it took for SQL Server to create that index pool.
If we reorganized our index a little bit, we can avoid minimizing SQL prompt. We can avoid the index pool altogether and have a much faster query. Anyway, just a quick example of how SQL Server may rearrange a nonclustered index.
It doesn’t always have to just get everything from a clustered index to feed into an eager index pool. So thank you for watching. Thank you for bearing with me as I messed up several things in there and had some incomplete thoughts and blabbered a little bit like I’m doing right now.
So I’m going to cut this short and get ready to record another video. Thank you for watching, and I will see you in the next one, assuming that you can still tolerate me after this. Goodbye.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 25% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.