Down With Big Query
Video Summary
In this video, I dive into the challenges of working with large and complex SQL queries, particularly those that are overly nested or contain numerous CTEs (Common Table Expressions). I share a specific example where I had to optimize a query that was running for about 30 seconds. By breaking down the query into smaller, more manageable parts and using temporary tables, we were able to significantly reduce execution time—from around 15 seconds to just under two seconds. This process not only improved performance but also made it easier to analyze and tune each component of the query independently. The video emphasizes the importance of avoiding overly complex queries and suggests that breaking down large queries into smaller, logically separated parts can lead to more efficient and maintainable code.
Full Transcript
Howdy folks, Erik Darling here with Erik Darling Data. Still, apparently, I guess. I guess we’re cool for the summer. And I realized that I had been writing and working with and talking a lot, and I even have a couple blog posts scheduled about how much I hate people who write big queries because they’re very misguided. People think, a lot of people who I’ve talked to, are like, well, you know, I have a CTEof this view or this drive table and SQL Server should cache that result and then do something else. And that’s just not what happens. And kind of like the more you just string together these, these little constructs, or nest them as deeply as your heart desires, or, you know, keep sort of tagging joins onto things. You know, it, it, it just doesn’t, just doesn’t scale well. I’ll say that. At some point, it makes sense both from a performance perspective and from an understandability and tunability perspective to break your query up at sort of logical stopping points to just not keep adding things on to your query to give you the results you want or give you some new piece of information. It’s not a good idea.
So I have this, this query in particular that I wrote from scratch. And while I was writing it and running it and everything, SQL Server came up with a number of missing index suggestions for me. It actually came up with six. And I’ve thoughtfully labeled them missing index one through six down here. And I’ve added all these already. I’m not going to sit here and make you watch me add these indexes. But they’re all stuff that SQL Server was like, hey, if you add this, it’ll make the query faster. And to some extent, they did. But this query still runs for about 30 seconds. So we have this question post thing up here that basically finds questions with some filters on them to make sure they have like a positive score and they’ve accepted an answer and they haven’t been closed or community owned. And sort of the same deal with answers down here. I just make sure they have a positive score and that they’re not community owned. And then what we do is join those two tables together on the parent ID of the answer equaling the ID of the question to make sure that we don’t have someone who’s self answered. Right? So that’s the only point of that.
And then we go and hit the votes table to see if there are any bounties assigned to those questions or answers. And then we come out of there and we do some sort of annoying, complicated stuff to join back to the users table, get some information about comment scores. But this is, you know, I know it’s not fun looking at code like this. You don’t have to understand it all. You just have to understand that this is the kind of code that I see when I work with people that I end up doing this exact same process that I’m going to show you. So let’s look at the query plan real quick. This inserts 2,000, 29,380 rows and it runs for 30 seconds right there. We zoom in on this. That’s 30 seconds of wall clock time that we spend running this query.
If we zoom out and we look at the query plan, zoom to fit, I want nothing to do with this. I will not sit there and try to troubleshoot this plan as it is. That is completely misguided. It’s not a way to, not a good way to do anything. What I do when I see a query like that is I think, well, that query is too big for me and it’s probably too big for the optimizer too. I bet the optimizer is not having a very good time with that. So what I do mentally is I start looking for breaking points in the query and scroll down a little bit. Let’s say we can take these first two CTE, right? These first two, the answer post thing and the question post thing.
And let’s just control, let’s highlight the right part first and hit control L and SQL Server thinks that about 184,000 rows are going to come out of there and that about 773,000 rows are going to come out of there. And if we go the whole nine with that and we look, SQL Server thinks that about 8,000 rows are going to come out of that. That might be a good guess. It might not be. We don’t, we’re only looking at the estimated plan, so we can’t figure that out. So if we wanted to, if we wanted to do something smart with this, we could take the results of that initial set of joins and just dump those into a table, ensuring that there is no collusion with, no self-collusion on here.
Whenever there’s self-collusion, I get worried that there are other perversions afoot. So let’s run this query. Let’s get query plans turned on. Let’s run this and see what happens. Okay. That takes about, let’s go to the execution plan, about three seconds.
Okay. That’s fair. Three seconds is totally fair because we’re, we’re hitting not, if we look at the, the estimates and the actuals over here, zoom in on that. Wait, no, sorry. Zoom in on, zoom in on this one. Here we go. SQL Server thought that 8,300 rows are going to come out of there, but we ended up with 2.05 million rows.
So let me ask you a question right from the get-go. If SQL Server is making a bad guess by that much here, how much do you think that bad guess hurts us downstream? How much do you think SQL Server being off by, oh, 2 million rows is hurting the other query?
I would guess a pretty good amount. I would also guess that it would be really, really tough to track down that poor estimate if you were just staring at that great big lump of query plan. Staring at this much smaller lump of query plan, it’s a lot easier to figure these things out.
I’m not going to dig in on why this takes two, three seconds and we get the estimate wrong by that much. I’m just going to say, okay, we have, we have that set of data in a temp table. This is a pretty self-contained issue.
We can come back to this later. All right. So now let’s look at bounties, right? So this is the second, well, this is technically the fourth CTE in there. We had the first two that joined to each other, right, and that third CT. So this is number four.
This is bounties. So let’s run this and let’s see how long bounties takes. Let’s see how we do with bounties. Bounties sure is dragging on for a long time, isn’t it? Bounties is, oh boy, bounties is chugging. Okay.
So that took a while, right? Let’s look at the query plan. And this is just about 16 seconds. Yowza. 15.7 seconds. What are we going to do about that? Well, this is a really strange looking execution plan to me. If it’s not strange looking to you, I don’t know.
Perhaps you’re from another planet. But we have these, if we look over here, let’s start with the right because everyone tells you to read from right to left. So let’s start with the right. We have these constant scans. And they concatenate together. And they compute a scalar apparently. And I think it’s very funny that we go from these thick arrows to these thin arrows to this thick arrow to this thin arrow to this thick arrow.
It’s just like thick, thin, thick, thick, thick, thick, thick, thick, thick, thick. A lot of thicks. A lot of switches from thick to thin in there. It’s worse than me. I’m balloon, yo-yo dieting. But where these constant scans come from is a little tough to track down, especially in older versions of SSMS, where you don’t have these row counts automatically by operators. But if you look at this arrow here, we have that 2.05 million number, right? And if we go look here, and that’s the same number. We look here, that’s the same number. So these two constant scans are actually emitted from here. And what they’re doing is they’re trying to make sense of this join on an or clause. Sometimes you’ll get that constant scan thing. Other times you’ll get a table spool. Depends on how SQL Server is feeling that day. But it does something really goofy.
It takes these 2 million rows and these 2 million rows, and it turns them into 410 million rows. And then it sorts 410 million rows, right? Oh, it’s top end sort, right? 410 million rows. Attempts to merge them together, but doesn’t because we still get 410 million rows on the other side.
So this entire thing was an exercise in futility. Then we seek into this index 410 million times. Okay. I’m starting to understand why this thing is slow. And if you’ve read my blog at any point in the past, I don’t know, week or so, week and a half, you’ve probably seen that I hate joins with or clauses.
So let’s not do this. Let’s get rid of this. Let’s do this differently. So what I’m going to show you is what happens when we just union all those two things together. Right? So exact same query, just we have one join here and one join here, and I union all them together. Right? Because if I use union, SQL Server is going to try to make a distinct result set. And if I try to make a distinct result set, that can slow things down.
So if we look at this execution plan, we are down to about two seconds just by splitting up that join with the or clause in it. Right? Did two separate queries, yet somehow it was way, way faster. Right? It was like 15 seconds down to 1.9 seconds. I’m pretty cool with that. So since I’m so cool with that, what I’m going to say is we’re going to take that and dump it into a temp table on its own.
All right? And that will still be pretty quick. That should be about two seconds. Yeah, 1.8 seconds there. Not bad. Not too shabby. Right? And so now all we have to do is look at how this last part of the query performs with those temp tables rather than with all those CTE chained together.
And this finishes very quickly too. This finishes in about 2.2, well, 2.3 seconds. Right? So that finishes quick. So now we have three queries essentially. Right? We have that first CTEthat we stick in a temp table. The second CTEthat we stick in a temp table.
And now this third query that just hits that last temp table. So we’ve broken the query down into three steps. And the sum of those three steps is much, much shorter than all of those steps put together into one big chained together CTEquery. All right? So now because I want to show you something cool, what I’m going to do is take all of these.
I’m going to take all of that and I’m going to do it all in one go. I’m going to drop out those temp tables that I created before. And I’m going to do the exact same thing. So have this here.
And that’s going to stick into a temp table. And then this is going to go and stick into a temp table. And then this is going to go and run from the final temp table. And then we’re just going to check to make sure that we have the same number of rows going. Now this should take, if I turn the query plans off, it takes around about five seconds. With the query plans turned on, it’s a little bit longer.
All right. Eight seconds. I’ll live with that. But now what we have, and I think this is really the whole point of this exercise, what we have is three queries that we can try to figure out that are all much, much smaller in size. We have, you know, I don’t know.
We already added six indexes for this thing. There’s another missing index request there and another one there and another one there. Maybe they’d help. Maybe they wouldn’t. I don’t know. But we have three distinct queries now that we can work with. All right. We have, you know, we can investigate why each one of these is slow individually. And that’s much, much less difficult than it is to figure out why those chained together CTE and other nonsense are all performing poorly.
So we could go further from here. We could keep going. We could keep trying to tune things further. But I’m pretty happy going from roundabout 30 seconds, roundabout.
Well, I mean, that was eight seconds with the query plans. Oops. I have to get rid of those temp tables first. So if we turn query plans off and look at this, it should be a little bit quicker. But, you know, it’s nice to be able to test things in different ways and get your results back and everything.
But anyway, we went from, oh, it took seven seconds that time. Good for us. Anyway. So anyway, the moral of the story is when you get called in to tune a very, very big query like that, generally speaking, it’s not a good idea to try to consume that query as a whole.
The first steps I always take are just like what we saw here. I start looking at individual parts of the query, individual statements, how they tie together. And I start breaking them up into logical points like that, where I’ll stick one set of data into a temp table, work off that set of data. Just because, you know, we could totally try indexing temp tables and other stuff.
But I just find that, you know, materializing results, which CTE, derived tables, views don’t do, materializing results and giving SQL Server a known quantity to work off of, to build stats on, to, you know, figure out relationships with, is usually much, much better than asking it to do that over just the results of a whole bunch of queries, kind of tacked and hammered and duct taped and glued and, I don’t know, bound and gagged.
Oh, it’s getting, oh, it’s getting, sorry, I didn’t mean to bring my personal life into this. But it’s like kind of like lumping them all together into one sort of crazy mess. So anyway, that’s sort of the moral of this story.
Stop writing big queries. Unless you really know what you’re doing. If your last name is Mechanic or White, not Ozar, if your last name is Obish, yeah, then you can write big queries. Otherwise, I don’t want to hear about it.
Anyway, I hope you enjoyed watching this. I hope you learned something. I hope you take what you learned here and start fixing some problems with it. And I don’t know. I will apparently see you in the next video. Goodbye.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.
Great video Erik. I’ve seen some mammoth queries, especially data warehouse queries and the query plan looks like the London underground map!