bit Obscene: SQL Server 2025 IQP Features

bit Obscene: SQL Server 2025 IQP Features


Video Summary

In this video, I dive into the world of SQL Server 2025’s intelligent query processing, discussing it alongside my co-hosts Prop Joe Obish and Sean Gilardi. We explore topics like AI-driven cardinality estimates, which can sometimes lead to unpredictable performance issues, as well as the challenges with parameter-sensitive plans and abort query execution hints. We delve into how these features might impact real-world scenarios for DBAs, especially those working with third-party applications where they have limited control over queries and indexes. The conversation highlights both the potential benefits and the pitfalls of relying on such advanced features in SQL Server, offering insights from practical experience to theoretical considerations. Join us as we dissect these complex topics and share our thoughts on how Microsoft can better support DBAs in their daily tasks.

Full Transcript

This computer. All right, welcome back to the Bit Obscene radio program. I have two candidate temporary co-hosts with me today, Prop Joe Obish and Sean Gilardi. This episode of the Bit Obscene radio program is brought to you by Meridomo 100% Cacao, Sin Azucar, apparently. Brand new sponsor. They sent me this box. It’s lovely. You can eat it and your face gets a little numb and you feel a little sweaty and you just want to talk to people. So I like it. So I took a bite of that so I could talk to these two. Today we’re going to be talking about SQL Server. Actually, we’re going to do five episodes in a row. Short episodes, hit and runs about SQL Server 2025. The first topic is going to be, you can see the anticipation on Joe’s face. It’s going to be intelligent. Query processing. I don’t like how you committed us to doing five because now we have to do five. Whereas you had been like ambiguous, then we could have. The ambiguous number five. Well, we’re doing, we’re doing five. And if we do more, then they’re called bonus episodes.

Oh, we have to do, oh man. And if we do, if we do fewer, not less, we do fewer, Joe. Excuse me. Yeah. Okay. So have a little respect. If we do fewer, then it doesn’t matter because I will do two by myself and just pretend that I fired both of you. Okay. All right. So we’re good to go. Start us off, Joe. What do you, how do you feel about intelligent query processing, the state of intelligent query processing in SQL Server 2025? So the way I feel about it is a long time fans of the pod will know that I strongly believe in fixing a query performance issue once, and then you fix it forever and you never have to look at it. This is different from the consultant approach of wanting as many billblahers as possible, but I’m not a consultant. So I can get away with that. And, you know, the thing about intelligent is kind of means unpredictable.

Hmm. Right. So erratic even I’m going to show you my pets. So this is my, my pet rock. It is very predictable. It, it, it, uh, never poops anywhere. It shouldn’t. It never chews on wires. Very house trained. It is perfect for me. And this is how I want the Kray automation to be. I don’t want any surprises. I don’t want any. Very, very intelligent. Um, Oh, Joe, you were born just a little too late.

Changes. Right. Oracle have that. They called them. Roles. If you, uh, like if, like if we just get into the, um, well, you can have carnality estimates that change through AI or something, right? Like what does it even mean? Um, like I have a performance issue yesterday. I tried to reproduce it.

Today. I’m getting a new AI cardinality estimate. I can’t reproduce the issue or something’s working today. It gets worse in the future. Cause as we all know, all other than Sean, sometimes better estimates can lead to worse performance. That is true. I’m sure you’ve seen it. That is true.

So like with respect to the entire feature, uh, I guess, I guess, I guess it’s a bit of a, uh, design choice. Cause I remember someone, maybe it was, uh, Pablo Blanco told me that the, the, uh, legacy. See, there are a lot of like special cases, Microsoft developed over the years.

Presumably this was back when, uh, you know, connect actually worked. So, so people would, would complain about their issues and Microsoft would hear their complaints and fix it. And part of the, the new sea new being 11 years ago is it’s, they got rid of a lot of those special cases for, for better and worse. And I do wonder if this AI carnality estimate, like if that fits in that model or not, um, um, so to, to, to, to give an example of something I saw recently, as we all know, other, other than Sean, maybe if, if you’re filtering on a column that stores, like when the row was created, let’s say you do like, uh, creation time greater than get date, you expect there to be very few rows returned by that query.

Yeah. Yeah.

Yeah. Yeah. It was greater than get date at time zone.

Anything. Hmm. Suddenly that’s an unknown inequality. Pretty good. Hmm.

We don’t know anything about that. So now we’re in the, uh, 30% default guests situation. And I think the biggest Delta you can get for at time zone is 26 hours or so. Uh, you know, in real life, it’s probably going to be only a few hours.

Uh, so from the point of view of like, you know, I, I want. My pre-optimizer to know things like it could, it, it could constant fold at time zone, but it doesn’t. Or it could treat at time zone is not really being there.

Cause you’re generally shifting by a couple hours. Personally speaking, I wouldn’t expect to get a totally different Carnelli estimate just because I’m shifting a parameter by a couple hours. But like, that feels like the kind of thing, which would be, um, that might be a special case in legacy CE.

Maybe, I don’t know, but it’s certainly not in the new CE. And, you know, if, if we’re talking about all of this do AI stuff kind of feels like those types of fixes might be dead because like, why, why do you special cases when you can just have. Yeah.

I do everything. Right. I mean, in the, in the specific case of that time zone, I have never trusted that function in a query, uh, aside from maybe like, unless I need to, unless I’m using it, like in the select list to figure things out. Uh, most of the time, if I need to do anything with time zones, I’m not saying anything at time zone.

I’m figuring out like what the difference in hours is with that time zone and then doing date math based on that difference. Like I, which is probably more accurate to be honest. Cause yeah.

Cause yeah. Cause like then you can, you, you get, you constant full, like whatever that, that, that one scale our value and you’re good to go. So at time zone can screw itself. I hate that thing.

Uh, it ruined three of my nights when I was working on stuff in query store. So I don’t, I don’t even want to go there. Um, as far as like intelligent query processing, processing stuff in general goes, uh, you know, like I, I, I appreciate the, uh, the attempts and efforts to make things better, but it all just feels a bit bolted on at times. And it doesn’t feel, and it feels like all this stuff where it’s like, we’re going to do this heuristically and we’re going to like figure it out.

And like, you know, whatever, like, like, so like the two things that I have the biggest, uh, gripes with so far are, uh, batch mode on Roastor. Um, there is at least a newly discovered hint by our Russian friend who to override batch mode heuristics. So we have that going for us.

No, granted there are a million other ways to override batch mode heuristics anyway, but the details of that stink. But, um, the other one is the parameter sensitive plan optimization. Uh, the way that it, it gauges skewness and worthiness of kicking in is nonsense.

And the way that it buckets things is absolute garbage. And I, I wish that like anyone in the world with like half, uh, half a sense would, would take another look at how that, how that thing works and fix it. Because it makes no sense to have like a bunch of like middle values get bucketed because those middle values are not the highest value or the lowest value.

So like you’ll have stuff with like 10 million rows sharing a plan with stuff that has like a hundred thousand rows because they’re not the highest or the lowest. And I feel like that, that whole scheme is just the, one of the dumbest things that I’ve seen choice wise in SQL Server in a long time. Um, um, dot feedback is another one that belongs in the garbage bin.

Oh, wait, wait, hold on. What do you, how do you, why don’t we talk about brand of sensitive, whatever, as a group? Okay. Before, before we go into that, before we go into that. Okay.

Yeah. It seems like Joe, you’re, you’re almost asking, uh, for the old Oracle style rules where you say, join this 10% do, do a nested loop. Join always do this, always run it this way.

And you kind of lock it in because you know better. Oh no. That’s what it sounds like. So I’m asking for a way to be able to communicate possible query optimization improvements to a top Microsoft man like yourself. And then maybe there’s a small chance we get that in the product.

Um, like the, at the, at times, an example is entirely like, like theory based in terms of like, you know, like what’s reasonable or. Let me wear it this way. The more information you have during crowd designation, the better in general, right?

Generally at time zone, isn’t some scale or UDF black box. We have no idea what the output is going to be. It’s a very well-defined thing.

It has a well-defined range and how it’s going to change your data. If Microsoft wanted to, they could enhance, uh, constant folding or just. Kernel yes submission general to work better with that time zone.

Could. For sure. Now I’m not saying that, that, that, that they should, but it feels like, you know, like, like that type of tweak. Is maybe going in the wayside.

I’m not looking for anything like, like, like Oracle. Oracle is. Way too expensive. Oracle is its own set of problems. Um, so with respect to.

We can get to PSP, but I wanted to also get your feeling on abort query execution. Man, we’re just jumping around and I don’t have an opinion on that. Do you have an opinion on that?

All right, go ahead. Oh, I have, I have a strong one. Oh, go ahead. Oh man. Wait, we’re going to mix everything up and our poor radio viewers are going to be confused. Maybe we should go one at a time. So, uh, the, uh, the abort abort query thing is a major whiff by all of Microsoft, the entire organization, top to bottom.

Um, the fact that all it does is kill a query. If it starts running is absolute gobbledygook. Like that thing, if, if there were any, again, if there were any sense to anyone, then that thing would have a number that you could supply to it.

They would tell a query when to time out. Cause like most of the time, if you’re a DBA, you’re not like, it’s not like, oh, I never want this query to run. You’re like, this query has five seconds to finish.

And if it doesn’t finish, it’s over. Like, like just having a query get killed immediately is, I mean, I suppose there are some circumstances where you would put zero in and be like, no, you, you don’t get to run or whatever. But like, for the most part, you know, like this, like that, that hint is absolutely gorgeous for off the rails reporting tools like Looker and other crap like that.

Um, and like, you want to give those queries a chance, but if they, if they miss their window, then they, the query just gets executed. Um, there was something like that in resource governor, but you need a trace flag for it. It was like the max CPU time thing.

But like, unless you have a trace flag that doesn’t really pan out. And I just want to point out too, that the use, the use cases, not necessarily for a board query execution, but for a lot of this stuff can be, uh, you know, Joe and yourself are doing a lot of actual query tuning system tuning. You know, design, design, design help and work and stuff like that.

A lot of poor DBAs. And I, I was one for a long time, got handed very crappy applications bought by some third company, third party company that God knows what. And it said, you can’t touch it.

You can’t do anything. Also, we’re going to call you when everything breaks and run slow. Yep. And I’m like, what do you want me to do with it? I can’t, you’re telling me I can’t even, I can’t even make a, uh, you know, a plan override for any of these.

Like you’re, I can’t do a plan. I can’t do anything because I, you know, void our warranty on, on the app. So some of this stuff doesn’t make sense for people who are actually tuning queries, getting stuff done. But for those people, you know, have to deal with some of this crap that comes through and it, it, it’s not ideal, but it’s better than nothing.

Uh, I mean, sure. But like, like, I think Eric’s point is, it seems very similar in engineering complexity to have the feature as is compared to have the feature where he can like set a number of seconds. Because the feature as is, is effectively killing after zero seconds.

Maybe. And Eric just wanted to be able to tweak that one number. Is that accurate? That is correct. I, I think I can see both sides.

Like to Sean’s point, that was my interpretation of the use case too. Where like, if you find yourself implement, cause you could just do it yourself if you had to, right? Like, well, we have the query timeout, right?

You can set a query timeout, but it’s already run for that long and use those resources. Well, in the application you can, but like, yeah, well, I thought we were talking about you, you’re locked out of a third prior application. You can’t.

Oh yeah. Go ahead. Because I’m sure people have done things like I’ll throw an agent job up every minute. And if I see queries running longer than X seconds, if there’s a certain pattern, then I don’t know if we can kill those queries. So instead of doing that disaster, you could use this if you’re okay with just not letting them run out.

But you know, like that’s, that’s evaluating the feature from maybe too narrow of a use case. Whereas Eric is dreaming bigger and greener and with more rainbows. We’ll wash that.

I mean, my question here is how many, how many queries in, okay, let’s just say like, we’re in Sean’s world from when he had a real job. And like, you have these crappy third party applications, you’re not allowed to change a query and index a setting, anything like that. All you can do is like, I don’t know, like restart SQL Server if everything locks up.

Okay, there you are. And like, you know, you’re, you’re, you’re, you’re in that world, you’re in that place in time. And Microsoft is like, here’s this hint.

And you’re like, cool. Now I can just have these application queries not run. And now all of a sudden, you know, end user Bob and end user Sally are like, well, it’s time to go to work and do my job. And they, they click execute.

And it just comes back with an error. That’s like your query has been killed. Like, like, like what, like, what can you reasonably block an application that’s not going to mess up an end user’s day? Like, you have to give it a chance to run.

And if it can’t run, you say like your query timed out. That’s a much, that’s a, that’s like nice and friendly. If someone just like, you know, hits whatever button to like, you know, like, like submit something, and it hangs out for a little bit. And then like, it just immediately returns, like, you’re not allowed to run that query.

And they’re like, oh, cool. I can’t do my job. Who do I talk to now? Like, it’s just, it’s an, it’s, to me, it’s just wholly unrealistic to have that situation where you’re like, like this, like, and, like, unless, and look, and, and end users are annoying. Right?

Because like, you might have like a bunch of like, like business intelligence people who are going to be running stuff through management studio. And they can easily change like one or two little things about a query and have it completely bypass whatever you set up for that abort thing. So like, like, what’s the point?

Like, like, get, like, give the query a chance. If the query can’t finish within like, I don’t know, five, 10, 30 seconds, kill it off. Like, if it used the resources that caused a problem, it spiked things up for that time. Fine.

But then now it’s over. Right? Like, you’re done. I have an example for you actually from, from real life even. Oh boy. Like a developer messes up a query, which is like checking a queue. Uh huh.

And instead of running once per minute, it runs a billion times a day. Mm-hmm. Which is something I’ve seen. Yeah.

So like for that case, you don’t want your billion times per day query to even run for like, well, for like one second. Okay. Fine. Like, like, like, I would view it as like, you’re picking the lesser of two evils. Like, you know, the entire application is slow and down versus, well, we can turn off this, this queue thing for a couple hours and we can process the data manually later.

And then you just might not care that much about this thing, which is suddenly running a billion times a day when it wasn’t supposed to be. Okay. But if you’re, if one of your developers messes up a query like that and you turn off the queue service and you say, hey, developer, make a hot fix with a wait for delay in there.

Like, like, like to me, like, like, like, you’re not like, you’re probably not going to find an enterprise application that screws up that badly. Like, granted. Okay.

Like there, there are cracks that everything can fall through, but like for the most part, you’re not going to run into a situation where you’re just like this query can never run. So you’ve never used teams. No, you’re right.

I specifically avoid not using teams as in my consulting contract. So you’re effectively basically saying the same thing, right? Where you’re viewing the, the kill zeros at zero seconds thing as too narrow and in your, in your technicolor dreams, you want to be able to change that from zero to X. And that’s going to cover more use cases, which I don’t think anyone would disagree with.

But I mean, you know how it is. Everything Microsoft does, it’s never done right the first time, right? Yeah.

But the problem, the bigger problem for me is that there’s often not a second time with a lot of these things. So you’re just stuck with things in V1 and you’re like, cool. Oh, never going to use that now. Like just, perhaps we can add some intelligence to the development process for intelligent query processing.

Perhaps, perhaps we could. You could call it, I, I, Q, P. But, but how do you know, but how do you know how to intelligently fix a query if it’s not allowed to run?

Like, you know, you, you, you, you’ve got to give it a chance. You could even have like a whole thing. You could even have a whole thing that kicks in and it’s just like, like this query has been aborted because it didn’t finish in 10 seconds, like a hundred thousand times.

Like, let’s really think about this one. Let’s really do our best to make this one better. Speaking of doing our best.

Yeah. I believe we agreed to a specified length of the video and we’re too much over it. But with respect to parameter sniffing, your favorite thing in the world, my understanding of the, what’s this thing even called?

Parameter sniffing. Plan optimization. The parameter sensitive, parameter sensitive plan.

Oh, parameter. Yeah. Parameter sensitive plan optimization. Do you know if it still only works for equality predicates? Yes.

Only equality predicates. There, there are a number of strange rules. That one feels weird to me. Like maybe my world view is too narrow, but. I don’t think it is. You got the gloves on.

That’s why it feels weird. Most of the. More fake news from Microsoft. Right. Most of the. Parameter sensitive.

Yeah. I see are relating to inequality predicates. Yep. On dates. Where you know that they, they have passed in today or a year or, or a 10 years from now. Yep.

And it’s not so much. Equality predicates. Uh, at least that that’s in the workload that. The workloads that I know of. So the whole thing kind of feels. Not exciting.

Yeah. No, I mean, like in, in my, you know, consultant demo world, it’s very easy to find a quality predicate issues in the stack overflow database. You have like the post type ID and vote type ID stuff.

And there are wild swings in the number of post and vote type ID. So it’s very easy to show it a not working when it should, uh, with those and be the bad bucketing that happens even when it does work. Um, but I think generally you’re right in that, uh, you know, a lot of the problems that I see with, uh, parameter sensitivity are with, um, bounded date ranges where, you know, for the majority of the queries that come in, they’re looking at the last hour or day of data or something.

And, or like the current working day of data, whatever it is. And then every once in a while, someone will come along and say, oh, well, I need to see six months of crap now. And then that plan just, you know, bites it.

Yeah. So like, yeah. So like, I, I feel like there should be some, like, there should be something like rather Microsoft has enough crazy rules around date. Things in query plans, like get range through mismatch types and get range through convert that it feels like a very natural fit for the project.

Parameter sensitive plan optimization to have special rules around dates where it should be able to look at them and be like, oh, like, yeah. Like the last time this ran, it was for the last like four hours of data. And Hey, this person’s looking for a much bigger range.

We should probably figure something else out. Like that seems like a, like a good special case for me, but what do I know? Maybe part of the thing that makes it tricky is you often see, you know, the start date and end date parameters too.

So then it’s like, you have to optimize for the pair of parameters, which I’m sure is way too much. No, you only, you only, you only have to ref, you only have to engineer for the distance between them. Well, right.

But if your startup teacher has like a hundred parameters and you know, like you as the, as the expert know, well, these two parameters really matter the most. I, you know, like the problem has shifted from one equality predicate to one inequality predicate to the right pair of inequality predicates. It certainly seems possible, but I mean, I, I still would like them to do inequality first.

Yeah. You know, for, for me, it feels like the optimizer is smart enough to pick up on those patterns when a query is being like optimized. Like it, like it, it can, it, it like, it’s like looking at the parameters and it’s looking at the columns and it’s looking at the way that they’re being assessed.

And you can make a, it’s pretty easy to infer from all of the various trees and, you know, parsings and bindings and all the other stuff that goes on that like, oh, like, yeah, these, these two pair up together. Like, make sure if you just like saw like a store procedure and a list of parameters, it might not be obvious at first, even though you might have something called start date and might have something called end date. And maybe some queries use start date and some queries use end date and they don’t pair up like that fine.

Like, you know, there’s, there’s edge cases, but like, as soon as you start like really getting into looking at a query and trying to figure out a plan, you have a pretty good sense of when they are paired and when it’s probably a good idea to figure something out. Oh, Sean, Eric thinks it’s easy. Will you have good news for us in C++ Server 2021?

Or 2030 or whatever, whatever else. Yeah, whatever. When are they going to have local variable deferred compilation? No idea.

Probably never, because think about how many like behavior changes you get from that. Well, I know, but like it should be, the option, the option should be on the table. We’ll just do that.

Add that to the connection options. Yeah. Option. Eric’s additional option. It’s everything is just a database scope config or a query hand. There’s just, there’s probably.

Just look under preview features. Yeah. Speaking of preview features. No, we’re not going to switch topics. We won’t do that. We won’t do that to our nice listeners. We should switch topics.

You could throw a local variable into a one row table variable and then maybe get some deferred compiling on that. But you still don’t have a histogram on that.

Well, they shouldn’t prove that and add a histogram. Ah, well, then you just have a temp table by a different name. Yeah, but you have a temp table without all the temp table issues. But if it has a histogram, they just, well, I guess if they, they just don’t, if they don’t do the caching.

It’s not a single row, like it’s going to be no problem. Yeah, sure. There, right there. We figured it out. Okay. So table variables can have histograms as long as it’s on a single row. Right?

Sean, you’re taking notes. Yep. You see my list? Okay. Yeah. Yeah. I saw it. I saw it. How do we feel about DOP feedback? I actually had an opportunity back in 2019 when I was on Microsoft’s campus, a very intelligent gentleman who I respect greatly, but who also has no internet presence.

I’m not going to say his name. Asked me if I wanted to give feedback on the DOP feedback feature. I said, this seems not very useful.

You shouldn’t do it at all. Which was, I guess, not the feedback Microsoft wanted in that case. Because sadly, they still did it.

I looked at it when it was 2022 and thought it was pretty flawed. Should I just go for it? Yeah, go for it.

From what I remember… It’s a long silence. Well, it’s hard to figure out. I’m for it.

I don’t know how to word things sometimes. We’re reading from disk on this one. One of the big problems with… Azure pass. Row mode parallelism is you can really get hammered by performance depending on how busy the server is. Yeah.

Way more than batch mode, way more than serial queries. There’s a demo on a blog post on your site where I think the performance difference was something like 25 to 1. Is this the round robin?

Not round robin. It was the… Demand? Well, this is where demand would be way better. Ah, okay.

It was some very simple query with row mode repartition stream operators. And if you had like a single CPU that was very busy. Yep.

The query runtime got like 25 times worse. Yep. Due to the yielding. Mm-hmm. So I don’t know how you can effectively do DOP feedback when you’re not kind for things like the server is busy during these hours.

And this is when I really care about performance. Yep. Because the thing is gathering data all the time.

And if you start gathering data like after a server restart, maybe that’s at 2 a.m. Well, all your parallel queries are doing great at 2 a.m. Because there’s still little work on the server.

Yep. So there’s that big issue, which maybe it’s better in 2025. I have no idea.

I think they’re kind of cagey on the details. I don’t. The other thing that bugs me about it is I’ve never personally tuned a query to anything but max.op1 or really, you know, encouraging parallelism for query. Yeah.

Like I’m not doing, oh, will this query run at DOP2 and this one run at 4 or this one run at 6 and this one run at 7 because I’m feeling lucky that day. And like even worse for the poor soul who has to do that, this feature isn’t actually respecting that.

Because if you ask me how to design it, if we had to do it, I would say, well, if, you know, if, if, if a person or an AI added like, like the literal max.opx hint to the plan, presumably someone like looked at that query and like that DOP should not be overwritten by whatever automated process this thing is running. But that’s not how it works. I don’t know.

I don’t like it, Eric. I don’t like it either. I, I think it’s goofy because it only adjusts DOP down. It doesn’t adjust DOP up. It would never look at a query running at DOP 4 and be like, oh, DOP 8 is better, like magically.

But I think the, the, the, really the, the biggest flaw in it for me is that it will never adjust down to max.op1. It only adjust down to max.op2. And I remember hearing at some point that they couldn’t accomplish that without a recompile.

But we all know that’s nonsense because when a SQL Server comes under enough memory, enough CPU pressure, it will run a parallel query plan at DOP 1 internally. So it doesn’t actually need to recompile. And there’s really no reason for it not to get down to DOP 1.

But like, like for me, you know, the, the thing that I would much rather see happen is if like, if we’re talking about like parallelism feedback is like, like fix the parallel page supplier. Like if you have a, if you have a parallel query that’s ending up with incredibly lopsided threads, like, like you, like you have a DMV that shows that, that is, that is detectable by SQL Server. Like it, like you could actually do something meaningful with like whatever algorithm is being used to distribute rows to threads in order to balance that on like a future execution.

And that would be far more meaningful to me than like, you know, you know, some other features sitting in the background being like DOP 8, nope, DOP 6, nope, DOP 4, nope, DOP 2, mm, DOP 2, just leave it. Like it, it, it, I think it’s just a waste of time. You reminded me of another complaint I had, which is like, it seemed very focused on preventing recompiles.

But if we’re talking about tuning these big parallel queries, like I, I’ve never thought, oh man, I just parallel query that, that runs for 30 seconds. I have to avoid recompiling no matter what. Like, like that’s like, that’s my number one priority.

No recompiles. Um, that was split. Maybe that was just decided by, uh, cookies and it can’t be overturned, but. I don’t think cookies is on that one.

If you, if you’re able to move past that, I actually had a good idea. I think just now, like, what if we had a batch mode heuristic, like IQP where if, if you have a remote query, it has terrible distribution on the repartition streams, tons of parallel weights. Yeah.

Next time you execute it and enable it, uh, batch road heuristics. Well, that would be nice. That would be, that would be sweet. Joe mode on the. Yeah. That would be way better in my mind. Then we’re going to take a terrible row mode parallel query and maybe downgrade it.

Yeah. Cause like, cause like, if you think about a query that’s running properly, as in it’s doing a batch mode or demand based parallelism.

Yep. I can’t think of a single case where, oh, I have this great batch mode hash join. It’s running a DOP. A. Whoa, whoa, whoa. That’s terrible.

We need to knock that down to four. Like that would never, that would never happen because well, you know, just the internals are somewhat, are so much better suited to do. If you’re running a DOP, it’s too high for batch mode or demand based parallelism or broadcast parallelism.

It doesn’t really matter unless you’re running out of worker threads. But again, if you’re in batch, when you’re not going to run out of worker threads anyway. Oh, all right, Sean. Eric and I figured it out. Yep. We’ve. Ship it.

You’ve got ship it. Nothing else to do. It’s all done. Yep. You guys did it. We did. We did. Congratulations. Thank you. You should, we should get hired as co-PMs to just fix SQL Server. Cop them.

Yep. Anything else on IQP to make it IIQP in the future? Hmm. Nah. I’ve talked about everything I want to.

Yes. Wait, wait, did you want to complain about the SPX QSQL? I mean, my, my complaint with that is I don’t understand why you would want a similarly bad behavior to occur.

Compile storms. Yeah. So like I, I’ve had a number of clients where they’ve had store procedures run and all of a sudden, like one of the queries is compiling a plan and you have compile locks and now you have a bunch of other copies of the store procedure waiting to compile a plan and are waiting to use the plan that the current store procedure is compiling.

And I fixed that at quite a few places by using SPX QSQL so that now you just have like a bunch of queries compiling independently. None of them really sit around waiting.

Everyone, everyone’s happy. Like they don’t have like one store procedure that just locks up a million other copies of the store procedure. And now that’s all going to get broken, right? Let’s screw up my whole system.

I had a very good system. I had a very good thing worked out and now I’m getting screwed on it. Now you get more hours. I have a question on that actually, because I’m, I’ll admit I’m not familiar with compile storms or so-called compile storms.

So is the way it works. You could have many procedures. There’s no compiled plan. The first one to start compiling gets some type of lock resource. The other one’s a wait.

Yep. The first one to finish is compiling. There’s now a compile plan. The other sessions stop waiting and they use the compile plan from the first one. Correct.

So in what scenarios would that be bad? The one I’m thinking of is maybe the CPU you’re compiling on is like super busy and the first compile takes a long time.

So it holds a lock for a long time. Is that an example? It can be a lot. Yeah, there’s, there’s, there’s two main examples that I’ve dealt with. One of them is, I don’t even know if this is the word at this point. One of them is mitigatable by async stats updates.

Cause like some, sometimes the long compile lock is because of a long synchronous stats update. But then other, you’ll run into other times where it’s a long, like just like a long query compilation.

And like, for whatever reason that just sucks the life out of the server. Like I said, Joe, as a real life example, there’s a store procedure that was tens of tens of thousands of lines long, for example, you know, and, uh, it would take minutes sometimes to run the compilation stuff on that.

And the, I don’t know if you remember when, um, microservices was super hot. Everything was a microservice. Still kind of is.

Still kind of is. Yeah. We just don’t, it’s not super hot to say anymore. No. So, uh, we’ve moved on. Yeah. Yeah. CIOs don’t get jacked off on all over the place anymore. So, uh, anyway, so you would have this and then the, you know, microservice, oh, we need to spin up another microservice and it needs to, the query didn’t execute.

So we need to rerun it again. Check it in one second. All right. All right. Now submit it a hundred more times. And all of a sudden you have a 32, 64 core server that has 15,000 new connections getting to it.

All trying to run that query because it didn’t run the first time because microservices and you don’t care. Why are we just firing forget and hope everything goes? Well, I don’t understand that example because if it’s taking multiple minutes to compile, like wouldn’t it be worse to have many sessions all doing the multiple minute compile?

Cause, cause I feel like you’re complaining that, you know, the code’s written very poorly. Well, you need to. Right.

But you do have to pick, pick one or the other. Like if you had the compile storm, you have a bunch of sessions waiting on the lock. If you don’t pick it, you have a bunch of sessions all using CPU and other resources, trying to compile the plan.

Like they’re both bad, aren’t they? Or, or like, I don’t really see why having a bunch of sessions waiting on a lock is worse than having a bunch of sessions all doing the multiple minute compile at once.

I mean, so for that example, a, a better thing would be, let’s say it only takes 15 seconds to compile. But a lot of these, when I’ve, when I’ve worked on these, it’s been, and I mean, thousands of connections all coming in, all running the same query.

And if you would let them go and all run in the 15 and compile and do this stuff, then you’ll start getting out there. Like it’ll, it’ll, it’ll start making headway.

But if you let them all block up, then they continually block up. I mean, you’re still going to hit a point where you DDoS yourself, but that’s, there’s really not, there’s really not a good way other than not having any, even if you take out the compilation storm, it’s still not good.

Right. Yeah. All right. So I think you’re saying there’s some internals reasons as to why getting those hundreds or thousands of pile locks at the same time is going to gum things up.

Like not to, not to blame you guys in this case, cause it sounds like a pretty annoying problem, but like there’s something about the lock manager or whatever’s, or whatever’s even doing it, which makes it not able to efficiently untangle all those thousands of lot of compile locks.

Well, just think about if you have 20,000 connections, all sleeping, waiting on one lock, and then you, the lock gets freed up and then you go have to say, all right, let’s walk through 20,000 sessions and say that these are all, you know, runnable.

It just, the fact of doing that makes it ridiculous, right? You’re doing more overhead to do that than to just say, actually sit there. I’ll pick one of you.

What, none of you will get to run if we do it the other way, at least this way, one of you gets to run. Yeah. Yeah. My, my experience has been converting like, like, especially so like, like, like maybe, maybe my example was a bit oversimplified where like, there’s one query in a store procedure that like locks things up for whatever reason.

But like, you know, there can, like when you have incredibly long store procedures with a lot of, you know, like if branch logic, stuff like that, where, you know, rather than compiling a plan for everything along the line in that one go, putting this, putting stuff into the dynamic SQL gives you some separation of duties or like putting stuff into other store procedures, gives you, gives you some separation of duties, some like, some like deferred compilation stuff.

And a lot of the times that does solve a pretty big problem for the compilation locking stuff. So, you know, there’s, there’s like, I’ve just dealt with so many weird permutations of it that like the dynamic SQL thing is a very obvious answer to me.

But like, you know, like I just really hate that now dynamic SQL is going to have like that same sort of compile locking behavior. Because that, that, that really, that really undoes like a, like a behavior that I’ve relied upon for, I think the first time I ever actually had to do that was like early 2020.

So like for me, for me, that feature is a big lose because I like it, to me, it undoes a very like, like, like reliable and dependable behavior that fixed a lot of problems. Sounds like we’re all looking forward to UQP in the future, right?

Yes. New Q, New QP. U QP.

U as in intelligent. Yes. About that 15 minutes. Yep. All right. Well, I’m a good in this topic. Yep.

Me too. All right. We’re going to end this one here and we’re going to talk about standard edition next. I think if anyone can still talk after all this. Yep. Yep. I’m doing it.

I’m doing it to all of you. All right. So we’re going to stop this recording and thank you all for watching. And I hope that you’ve, you’ve found a way to survive this verbal intercourse. All right.

All right.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.