What SQL Server’s Query Optimizer Doesn’t Know About Numbers

What SQL Server’s Query Optimizer Doesn’t Know About Numbers


Video Summary

In this video, I delve into the nuances of SQL Server’s query optimizer and its sometimes surprising behavior when dealing with integer values. Specifically, I explore how the same set of indexes can lead to vastly different execution plans based on slight variations in query syntax—demonstrating that even mature software like SQL Server isn’t always as smart as we might expect regarding simple logical equivalences. Through a series of queries and detailed execution plan analysis, I highlight the importance of being mindful of how you phrase your SQL statements, as seemingly minor differences can significantly impact performance.

Full Transcript

Erik Darling here with Darling Data. My hair has reached a level of absurdity that it has not seen, I don’t know, at least since various times in my life when I’ve tried to cut my own hair. Still waiting for my lovely and talented hair person to get back from their vacation. Before I go on my vacation. Um, was it Tuesday? God. Uh, anyway, um, I had to take last week off from recording. There was like, this advanced form of RSV going through, uh, circulating through my household. And, uh, I don’t know, you can still kinda hear it in my voice. Uh, I don’t know, I just, I sound, I sound like I did in my 20s when I smoked cigarettes and enjoyed life. So that, that was, that, that’s what I’m taking from this anyway. At least, you know, I realized that, that, that, that level of happiness is, is quite possibly achieving. Again, my time. We’ll see though. Uh, what else? Um, oh yeah. Uh, if, if anyone out there is, is currently struggling with some advanced form of RSV and, uh, your, your throat hurts constantly, uh, I have some leftover viscous lidocaine gargle.

If you’d like some. Um, I will say that, um, I will say that the viscous lidocaine gargle confirmed, uh, confirmed some things about me. Um, that, uh, I have a petulant gag reflex. Is, is not meant for viscous gargling. That’s, that’s about what I learned about myself last week. That, uh, that, uh, I don’t know.

That, and I have a great immune system. Take the crap out of that. Tank. Absolute unit. All right.

Uh, let’s get back to SQL Server. Apparently that’s where we, that, that’s where we always end up back at SQL Server. And, uh, in today’s video, I want to talk about how, uh, SQL Server’s optimizer isn’t always smart about integers. By that I mean, it does not always infer things from integers that it should when it has plenty of information about the integers.

I don’t know how, I don’t know how the optimizer missed the boat on integers. They’re all over the place. Anyway, uh, I’ve got a couple indexes on a couple tables.

I’ve got, uh, I’ve got an index on the badges table, helpfully called not posts on the name column and the user ID column. And I’ve got an index, helpfully called not badges on the post table on post type ID. Remember, that’s the, it’s a big deal.

It leads on post type ID, uh, has owner user ID as a secondary key column and includes the column. Score. I’ve also gone out of my way to create an index on the post table to tell SQL Server’s cost-based optimizer that the values in the post type ID column are between zero, greater than zero, less than nine. And so it’s the numbers one through eight.

And, uh, I’ve got a couple queries down below and I’ve written, I’ve written these queries. In two different ways. The same query is one slight difference. One teeny, tiny, itty bitty, slight little difference.

And it might, it might shock you to find that there is a huge performance difference. In the way, in, in, in, in the execution plan between one query and another. So let’s, uh, let’s scroll down a little bit and let’s find this hotspot, the hottest spot north of Havana.

And let’s zoom in on it. So in the first query, I have told SQL Server, I want to see all of the post type IDs one and two. This is in one comma two.

Right? So just post type ID one and just post type ID two. And in the second query, I’ve said, I want to see where post type ID is less than three. And the other μέor strip that person has the opposite reality to tens of tens of tens of thousands of tens of thousands of tens of tens of tens of tens of tens of tens of tens of tens of tens of tens of tens of tens of tens of tens of tens of tens of tens of tens of tens of tens of tens of tens of tens.

I mean, four seventies. And for veryーム build on the currentiffe dizendo Houston, KL passé, you could be operational temp base. And, well, we should know that there’s nothing zero or below in there.

We would think so anyway. But here’s what happens when I run them. And I’ve actually, I pre-ran them because, well, I didn’t want to sit around on camera twiddling thumbs waiting for this thing to execute.

I get docked by beer gut magazine every time twiddling thumbs on camera. It’s a long story. It’s a very complicated contract.

But anyway, here is just, you know, some proof that the only numbers possible in the post-type ID column are between 1 and 8, and that’s what my constraint enforces and acknowledges. These eight digits are all you will ever see in there.

Not a very selective predicate, admittedly. Right? For like a 17 million row table, it’s the same eight numbers repeating over and over again. Not terribly.

Not a good clustered index. Just go out on a limb and say this would not make a good clustered index. So anyway, about those query plans, which you got a shocking sneak preview of a second ago, but maybe I was standing in the way.

Let’s see if we can zoom and focus in here. So if we come way over here, we’ve hit the limits of SQL Server being useful. So let’s drag that over here.

So in this top query where I said in 1, 2, what happens? Well, we get a nice little index seek, and we get a sort operator. We need the sort operator because we don’t have score in the key of the index.

That’s okay. It’s not really hurting us here. We get a lazy table spool, which, you know, I have mixed feelings on it. I’m going to do another video this week.

I have, I don’t know, three or four videos lined up to do this week so I can pad out next week. Next week, I’m on vacation. That’s all I ever wanted.

So I’ve got a video about table spools I’m going to also do this week. Maybe even today since the snow day here in Brooklyn. But this, you know, performs pretty okay for what it does, right?

It’s like performance-wise, it’s all right, right? It’s not great, not awful. Just it’s fine for what it does. This whole thing finishes in about 6.2 seconds.

Big score for us, maybe. I don’t know. I don’t really have big feelings about either of the, well, really, rather, I don’t have big feelings about most of either of these query plans, but the one I’m going to show you next is the one that I have much, much bigger feelings about.

Now, this is the query, just to remind you. This is the one we just looked at, where it was in 1.2. The one we’re going to look at next is less than 3.

That is not an emotional heart, all right? No emo, darling data. There’s only goth.

So let’s look at this query plan. Let’s see where this query plan gets offensive, which is something that if you’ve watched enough of my videos, you will immediately recognize and also be offended by.

And it’s when I said in 1.2, SQL Server said, cool, index C, got it, no problem. When I said less than 3, SQL Server said, ah, we’re going to build an index off your index.

Your index needs an index. What’s different about our index? Nothing.

On the same two columns. Oops, got a little weird there. I apologize. Got a little cocky, flew off the handle there.

It’s on the same stuff, right? Post type ID, owner user ID. Who cares, right?

Same index. We output the score column. Same index. SQL Server made a copy of my index. When the query ran, it was the same index. All because I said less than 3.

Maybe that got SQL Server emotional. Maybe I thought, oh, who loves me? I can build an index for it. I don’t know. It’s weird, right?

But it’s just logic that’s not built into SQL Server’s query optimizer. It doesn’t infer less than 3 is the same thing as being in 1.2. Even with an acknowledged, forced, enabled constraint on your table data.

And I think what’s particularly interesting is, actually, no, we should drag both of these way over. Just put the nested loops in the corner there and put the nested loops in the corner there. And the thing that I want to show when we zoom in…

Oops, I don’t need that tooltip. It’ll go away. Weirdo. It’s not even like a cardinality estimation thing. That’s a lot of white space, but whatever.

It’s not even like a cardinality estimation thing. Because if you look at this index seek, right? Look at the number of rows SQL Server estimates from the index seek. Granted, it’s wrong by 338%.

But SQL Server didn’t choose to build a copy of my index when the query ran for this one. The bottom one… I mean, we have the full table cardinality here, which makes sense because we had to read the whole table to make a copy of that index.

But then the estimates from the spool are exactly the same. Excuse me. A bit of last week coming back to haunt me.

It’ll be the only time this happened in this video. But yeah, the index from… So rather, the estimate from here is exactly the same as it was from up here, right? It’s 11635229.

It’s the same 338% that it was off. So, what did we learn? Well, maybe the optimizer isn’t as smart as we give it credit for.

Granted, it’s got a lot to deal with. Having seen your queries, I know just how much it has to deal with. But really, sometimes the way you phrase your queries makes a very, very big difference for performance.

Now, we’ve talked about that a lot from a lot of different angles. Sorgability and exists versus joins and other things like that. But this is a weird one, sort of admittedly, because you would think that a mature, like 30-year-old piece of software would be able to figure out that 1 and 2 are between 0 and 3.

Whether you say in, 1, 2, or you just say less than 3. So, be careful out there. When you’re writing your queries, sometimes little changes make a big difference.

Anyway, before any more of my lung attempts to present itself to you, I’m going to say thank you for watching. If you enjoy this sort of SQL Server content, don’t forget to like the video. If you want to be alerted every time, well, not every time my lungs try to present themselves to you, but every time I post this sort of insightful SQL Server commentary, go ahead and subscribe to the channel.

Coming up on 3,000 subscribers. Pretty happy about that. I don’t get anything at 3,000.

3,000 is actually a pittance of a number compared to people who play video games. Maybe I should start playing video games. I don’t know.

But yeah. Thank you for watching. I hope you enjoyed yourselves. I hope you learned something. And remember, if you need any viscous throat gargle, admit it medicinally, it’s lidocaine, viscous lidocaine.

If you need any of that, let me know. I’ll send you whatever I have left. Thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.