Faster Better Wronger

Video Summary

In this video, I delve into an often-overlooked aspect of query tuning: ensuring that the results are correct and sensible. Erik Darling from Darling Data highlights how optimizing a query for speed can sometimes lead to incorrect or nonsensical results if the underlying logic is flawed. He demonstrates this with a real-world example, showing how a poorly written query can produce absurdly large numbers due to incorrect joins, leading to misleading data. The video walks through fixing these issues by rewriting the query and emphasizes the importance of validating both the correctness and sensibility of the results, especially when working on complex queries that affect multiple tables.

Full Transcript

Erik Darling here with Darling Data. And we’re going to talk today about an important aspect of query tuning that does not get enough attention. And that is making sure that the, not even necessarily, I mean, of course, necessarily the results of the query that we made faster, like making sure that those are correct. Like maybe comparing those to the results of the original query, but also making sure that the results of the original query that doesn’t really make any sense whatsoever. Because despite all of the good intentions that exist in the world, people may write queries with incorrect logic. They might write queries in a way that does not return sensible data. We all need sensible data in our lives. I’m not talking, of course, about no-lock hints, which only dangerous and irresponsible people use in their production queries. We’re talking about just general logic and correctness. So we’re going to talk about that a little bit today.

Down in the video description, you will see all sorts of helpful links. You can hire me for consulting by my training, become a supporting member of the channel, be an official Darling Data supporter. And you can also find links to Ask Me Office Hours questions, which are, of course, free. Yay! And then, the usual, if you enjoy this content in any way, shape, or form, however minor, please do like, subscribe, tell a friend so I can get a bigger YouTube or something.

Okay. Seattle. I’ll be in it. November. With Kendra Little doing two days of the most fantastical T-SQL pre-cons that the world has ever known. I’ll have all sorts of the swaggerific new stuff there, stickers and t-shirts and whatnot.

And I will also be giving all attendees access to all of the T-SQL training. So you’ll have that going for you, too. So think of all the wonderful things you’ll get for free just by showing, by having your employer pay for you to show up. Right? It’s a good exchange. This is absolutely bribery.

So let’s go talk about query correctness stuff. Now, that’s the wrong thing. Cool. All right. Let’s pretend that didn’t happen. And let’s go over here and let’s look at this query. This query is slow. Right? If we look at the execution plan for this query, it ran for one minute and 23 seconds.

Not a good time that right. Took a long time to run. We can see all sorts of strange things happen in this query that we don’t like. Right? All sorts of things that we would never want to see. Right? We have bad estimates and we have long running nested loops joins because of those bad estimates.

And everything just goes to hell in a handbasket. But what really goes to hell in a handbasket are the results. Someone wrote this query here. Now, we could do a bigger top here, but we would be here for a long time.

This took a minute and 23 seconds for one user. If we had more users, well, by God, by the time this thing finished, I’d probably be packing it up to go to Seattle, which is in about a month now. Actually, exactly one month now. I should probably book that travel.

But someone wrote this then, you know, we get the top one user and we select from that CTE and we join to posts and then we join to comments. But the way that these joins occur, we have this sort of incorrect multiplicative effect on things. Right? Look at the numbers we get back. John Skeet, despite being an absolutely prolific human being, does not have 236 million total posts and comments.

And these posts and comments scores are also very large numbers. These are like almost foreign phone number long. Right? They are crazy. So how did this happen? Well, let’s say that we wanted to rewrite this query in a way to make it faster. Right?

That would also sort of show us why we have this bad multiplicative effect on things. Right? And why we got such large numbers. So if we rewrite the query like this, right, we’re going to do the same thing here and we’re going to, but we’re going to separate the join to posts out. We’re going to pre-aggregate that.

And then we’re going to cross join to comments. Right? And we’re going to correlate the cross join to the, the user’s CTE down here. So let’s run this query and let’s, let’s, well, first we’re going to note that it is much faster. Right? That finishes instantly, not in a minute and 23 seconds. Good for us. And we get the same results back.

But the only way that we can get the same results back here is by giving like, again, the multiplicative effect. We would have to multiply post score by the total number of comments. And we would have to multiply the comment score by the total number of posts.

And when we would have to multiply the comment count times the post count in order to get the numbers to match. And that’s probably not right. Right? Because we know John Skeet does not have 236 million posts and comments.

And we can validate that. Right? If we go back to the source data and we say, hey, John Skeet, how many posts and comments do you have? Well, these are, these are the numbers that live in the actual tables. Right?

We have about 331,000 post score. We have about 14 and a half thousand posts. We have the comment score of 46,000 and we have a comment count of about 16,000. So like when you look at all this, you might wonder, well, how did we end up with those crazy numbers? Right?

And it’s really because of the way that the two inner joins work. Right? They’re both one to many relationships. And when we join the one to many relationships together, we get many more relationship. Right? Like if we come back to the original query, we joined, where was it? Oh, how far? Oh, those are indexes.

There we are. We joined that CTE to posts. Right? And on owner user ID to ID, that’s a one to, that’s a, well, I guess then backwards, that’s a many to one relationship. But then joining to comments, that’s also a many to one relationship. So we really get sort of this, I don’t, I don’t want to use the word Cartesian product, but I do want to say it sort of explodes the results in an incorrect way.

Where joining posts and comments on these things in the same sort of set of joins, it blows the results up and gives us incorrect numbers. Right? That like these sums and counts here, we’re just way, way, way out of proportion. So if we wanted to rewrite the query in a way to give us just what we wanted back, you know, we could, we could rewrite it like this. Right?

And we could just stick the select top one and join to the pre-aggregation on posts here and then join to the aggregation on comments here. And this would not explode the results. We would still have to add post count and comment count together in order to get the total like posts and comments. But that’s not so big a deal. And this number is correct. And these numbers match what we had in the source table for that one user.

We could also just have taken our original query and not blown things up with the multiplicative effect that we saw from the, the two jo- the two joins that we did that exploded our results earlier. And we would get, of course, correct results back from this. So when you’re writing queries like this, it’s so for me, as a consultant, it’s hard.

And this is why I do like prefer to like do things with like on zoom with like people who have more domain knowledge. So like when we’re reviewing results of things, we can sort of like, you know, like sanity check stuff on our own and be like, like, I can say, hey, like, I don’t know this data very well. You probably know it better than I do. Is there any way that like these numbers are accurate?

Cause this looks funny to me. Like, like, like, you know, like I usually like on a sales call, someone would be like, we’re a blah, blah dollars a year company. And I’m like, cool. Want to pay me more? Like, oh, you, so you have money. Great.

So like, but you know, like then like you might see some reports come back and like, you know, you’ll see like one customer has spent like $10 billion in the last month. And you’re like, well, you sure? Cause that sounds, sounds a little wrong, but these are important things for you to spot check and validate as you are making queries faster. Is that the, not only are the, are the results correct in that, like, maybe like they match whatever previous thing, but also that the results make sense.

Right. Cause you don’t want to give people data that is not sensible back. Right. Doesn’t do you any good to write a faster query that still gives you crappy results. Anyway, that was all I had to say today.

All right. Mind your data. Thank you for watching. I hope you enjoyed yourselves. I hope you learned something. And I think this is dropping on a Friday. So I will see you next week with the office hours, Monday video.

All right. Thank you.

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.