Some Thoughts On The Future Of Query Optimization
Video Summary
In this video, I delve into my thoughts on query optimizers, specifically focusing on those found in database engines such as SQL Server and Oracle. I start by clarifying that this discussion is not about the human query tuners who optimize queries or my personal approach to optimization; instead, it centers around the software within a database engine that generates executable query plans based on cost models. I highlight how these optimizers make assumptions, often assuming a cold buffer cache and using generalized cost models that may not fully account for modern hardware capabilities or specific system configurations. Throughout the video, I explore various feedback mechanisms like memory grant feedback and adaptive joins, discussing their limitations and potential improvements.
Full Transcript
Erik Darling here with Darling Data. And in today’s video, we’re going to talk about some of my thoughts on query optimizers. Now, I don’t want you to get this confused. I’m going to make this very clear at the outset. I am not talking about the people who optimize queries like query tuners, like, you know, people like me and maybe like you, if that’s your thing, too, who, like, you know, rewrite queries, fix indexes, you know, change things around. So that they magically run faster. Nor is this my thoughts on query optimization generally. This is not going to be about, like, my approach to optimizing queries or what I look for, what I do. This is, this is more about, like, like, the piece of software inside of a database engine, a lot like SQL Server, that does the costing and the modeling and the query plan stuff that that gives you the cost of the software. gives you an executable query plan. And sort of like, how they can be improved in like the future of those things. Because I don’t know, I just I think I think what I think is interesting. So I’m going to share what I think with you. If you like this, this sort of content, and you want to support me recording these videos, I got to pay off this this camera somehow. You can sign up for a membership down in the link, the link in the video description below for as few as $4 a month. If you also have a camera to pay off, and you don’t have four extra bucks a month, you can like you can comment you can subscribe and gosh, seeing those numbers go go up just well, it doesn’t help me pay off a camera, but it does does do my sad heart some good. If you want to ask me questions about SQL Server performance, you can go to this link, which is also the video description below and you can you can you can insert as many questions as you’d like. Just please try to stay on topic. It’s all I it’s all I require of you. Now if you need help with SQL Server, from a consulting point of view, well, gosh darn it, I am still the reigning champion of SQL Server consultants. If you need help with SQL Server, checks, health checks, performance analysis, hands on tuning, query optimization. If you’re having a performance emergency, or if your developers just need some training because you’re having too many performance emergencies, well, guess what I do at a reasonable rate? That stuff, you’re in luck.
If you would like some high quality low cost SQL Server training content, again, all about query optimization, you can get all of mine for about 150 USD. Just go to that link, enter that that coupon code and all will be applied. There is a fully fully formed link for you down in the video description as well, just in case. SQL Saturday, that’s what it’s called. New York City is taking place in this year 2025 on May the 10th. I highly encourage you to attend. It’s going to be good. All you have to do is drag your behind the Times Square on Saturday, May 10th. It’ll be good for all of us. With that out of the way, let’s talk a little bit about this stuff. Now, query optimization. You have a query optimizer. I’m not talking about a query planner like Postgres has. I’m talking about a query optimizer like SQL Server has, like Oracle has, like many other good database engines have.
They make a lot of assumptions about stuff. I think the first thing that if you’ve spent any time thinking about or reading about query optimizers, like SQL Server is one of the first things it assumes is that you’re starting with a cold buffer cache. Like no data is in memory. Like any read you do is going to be a read from disk. And it has all sorts of costing things built into that. Like some are fixed costs like, you know, doing an I.O. Like what’s a random I.O. What’s a scan of an I.O. Like how much does it cost to like compute this scalar? A lot of things have fixed costs. Other things have sort of dynamic costs based on how many rows, the size of those rows, you know, stuff like that. Right.
And the thing that you run into with those sorts of cost models is that they are very, very highly generalized. SQL Servers is no exception. SQL Servers Query Optimizer does not know anything really about your system. Like it like there are some things that like SQL Server itself is aware of. Like when it starts up, it looks at your CPUs to kind of figure out like like what kind of SIMD support you have. It knows how much memory is in there so that it can work out memory grants, you know, stuff like that. But like as far as like, you know, what kind of storage you have, SQL Server has no idea. SQL Server doesn’t interrogate storage to figure out what’s underlying there. So I.O. costs are very, very highly generalized.
I.O. costs are also ancient, right? Like SQL Server still like costs I.O. like you’re on a spinning disk that has a huge penalty for random I.O. That’s why things like key lookups are costed very highly because the random I.O. So and Microsoft has been getting better about some aspects of the query optimization model by sort of like providing feedback to the model after a query or while a query executes.
So there are three three stages, right? There’s stuff that happens before a query executes. And you can think of that as stuff like table variable deferred compilation where SQL Server will be like, oh, there’s like you’re like like let’s just say you’re in a store procedure. You declare a table variable. You’re not using it. It’s not a table value parameter. It’s a table variable. You declare the store procedure. You insert some rows into it. So SQL Server will stop query optimization at that point and be like how many rows are in this table variable. Let’s come up with a query plan based on table cardinality.
Again, you still don’t get column level histogram information about what’s in the table variable, but you do know how many rows are in there. Perchance SQL Server might choose a different query plan if there are 10 rows in your table variable versus a million rows, right? That’s the hope anyway. Then there are some runtime feedback mechanisms like adaptive joins where SQL Server is just like, well, I’m going to do this loop join.
And if I hit enough, if I get enough rows out of this loop join, I’m going to switch to a hash join, right? So that’s like during intra execution. And then there’s post execution stuff. Memory grant feedback is the one that I think has been around probably the longest, but like there’s cardinal, cardinality estimation feedback in the product now too, where after a query executes SQL Server will be like, wait a minute.
I was way off on this thing. I should probably figure something else out. Other examples, you know, like the parameter sensitive plan optimization. That’s a pre-execution sort of mechanism to make things a little bit more, a little bit less ride or die for the optimizer.
So like, like, like, and I think these, these are very cool things because like one, one thing that a lot of people sort of miss about models. And I don’t mean like runway, sexy runway models. I mean like model modeling things, right? Like probably the most common thing that people will argue about models with is the weather, climate, stuff like that.
One thing that like a good model will take into effect is feedback, right? So like, let’s say you predict that like, like you have a query optimizer and you assign all these cost models to it. If your cost models don’t get any feedback on how good they did or how well they did when a query ran, your model is just going to keep making the same mistakes over and over again.
That’s why, you know, for a long time, you know, SQL Server would choose a query plan. And if that query plan was good, you were real lucky. If that query plan was bad, you were real unlucky because SQL Server would keep using that query plan and you would keep getting the same bad plan.
And, you know, it would be up to you to figure out some scheme of rewrite, indexing, you know, breaking stuff up into temp tables, whatever, in order to improve that. So I really like that Microsoft is actually providing more and more feedback mechanisms into the product. But at the same time, a lot of those feedback mechanisms are very, very conservative in how they’re like conservative in how they’re applied.
So, you know, any sort of like automation for that stuff, it’s like SQL Server has to collect a lot of metrics about how things are going generally, whether it’s whether it’s right about cardinality estimation, whether it’s right about memory grants, all that stuff. And then it has to say, OK, well, you know, most of the time I’m right about this.
Sometimes I’m wrong about this. Or, boy, I was just wrong right out the bat on this. But I think one thing that is going to have to start happening is it’s like it’s going to have to start rethinking alternative plan costing.
It’s cool if, you know, you figure out that cardinality was wrong for something in a query and that maybe costing and plan choices after that were incorrect. But just improving the cardinality estimate may not improve the overall plan that you get. Like you’re going to have to start thinking about like completely like like how that cardinality affects other plan choices.
And if perhaps a completely different query plan would be more suitable. This is because like especially like zooming in on this part, you know, you can have like a like and like I’m going to say this off the bat. I think like if you if you look in the world of query optimizers out there, I think Microsoft probably has the best one like across the board.
There may be specific things that other query optimizers do better or they have different like ways of handling things that are better. But I think just as a general cost model, I think I do think that SQL Server has the best one regardless of how much time I spend complaining about it. But, you know, like when you think about sort of like the quality of the assumptions you start with, Microsoft could probably do probably do better with IO costing.
And for that, the quality of the quality of the data you start with is mostly up to you. Right. Like, you know, how how up to date your statistics are, like, you know, how like if they’re full scan versus like the default sampling, things like that. And then like making sure that you’re gathering sort of the right metrics about the, you know, like the runtime metrics of the query when it runs mean like, wait a minute.
These are all the places I was wrong. This is where stuff took a long time. I need a different I need to like I need to like recompile this branch of the query plan because this was completely bonkers. I shouldn’t have done a loop joint here at all.
Like I should have done something way different. So there’s a lot of stuff that, you know, like and this is these are things that have been improving as server hardware improved. Right. Like, you know, if you think about when these sort like like query optimization first started, computer hardware was very, very limited.
Like you just didn’t have like any extra like there was no extra gas in the tank for stuff like this. Right. Like you like your hardware was dedicated to that workload. And, you know, I hate to say it, but the cloud does allow for a lot more flexibility with these things because you have like other hardware you could do this on.
Right. Like copies of the data, things like that. And, you know, you can start considering all sorts of like alternative plan stuff like early aggregations. Maybe, you know, I should have used like a batch mode on rowstore here.
I was wrong about this thing. Maybe I should have used a hash join instead of a merge sort merge join, things like that. There are lots of there are lots of like alternative costing scenarios that like need to start going out there.
There are some query optimizers I’ve learned that will come up with a set of like candidate plans and then run all of them. And whichever one comes back first, that’s just the winner. Right. Like that’s that’s just the one that gets used for the future.
So that that’s kind of a cool thing, but I don’t expect Microsoft to do that because who wants to run three copies of the same query on the same database? And like, you know, one one finish it like let’s say like like one finishes in 30 minutes and one like the other two are still running. Like you just going to like like let that one run for 30 minutes and kill the other two off and then start thinking about that one that ran for 30 minutes being like, OK, well, what can I fix in there?
Now, what was I going to say? Oh, man, I lost it. I was on a I was on a I was on a roll there and then I completely, completely, completely lost it. I talked I talked about this already, how all optimizer model assumptions are generalized to work across lots of different hardware.
Basically, that means like if your if your server had like four cores and 12 terabytes of memory or if it had, you know, like 128 cores and four gigs of memory, you would get the same execution plan on either set because SQL Server would have costed any like all that stuff equally like regard again, regardless of the hardware. There are lots of things that are sort of like out of model constructs. Just probably the simplest example of that is like let’s say that you pass in a blob and it could be XML, it could be JSON.
It doesn’t really matter which one. Like SQL Server has no idea what’s in that blob. Even when you parse that blob out, it’s not doing any statistical analysis of what actually came out of the blob.
You get you very much get fixed cardinality estimates and fixed costs based on just like SQL Server knows that it has to do something with XML or JSON. It doesn’t stop to look at what was in there. That’s why a lot of stuff with like string splitting or like like, you know, runtime XML and JSON parsing where you’re like joining to the result of it or like it’s an aware clause or something.
You just get terrible plans and cardinality estimates from because SQL Server is not like pausing to do that. Is it possible to do that? Yes.
You could have a rule where like any XML that you parse like or JSON that you parse or string that you split, you know, like would do some. Well, I mean, with XML and JSON, you have like pretty explicit with the types. You can be wrong about the types, right?
You can parse an integer out of JSON as a varchar max if you’re stupid, but SQL Server could like put that into a temp table, look at what’s actually in there and then do some work with that. But it just doesn’t do that yet or may never do that. I don’t know.
There are also like, you know, I want to say impossible to model constructs. Like, because the XML and JSON stuff, maybe that would fall into it because you would have to then trust the, like whoever parsed or pulled the stuff out of there to make the right choices, which is tough to do on a good day. But, you know, that would certainly fall into difficult, maybe not impossible.
I think sort of impossible to model stuff are things where like there are like there are no good statistics on things and there are will probably never be any good statistics on things. Table variables are probably the easiest example of that where, you know, like I’ve talked about a billion times on this channel. You get like there are ways to get table level cardinality to know that if there are 1,000, 10,000, 50,000 rows in a table variable.
But there’s not a good way to figure out if there’s like what’s like there are no statistics on what’s what actually makes up those 10, 50, 100,000 rows. So at some point, the optimizer is, I think, going to be less based on like like improving the cost models. And there are just going to be more rules that get applied and there are going to be more optimization choices where there are feedback mechanisms for it and where the model starts getting like more and more feedback about how it did when things ran.
Because that’s the only real way to improve things. Right. Like you can only model costs for so many scenarios.
At some point, your cost model needs to get some feedback and use that to adjust costs so that like when queries run, SQL Server has a much better idea of what to expect. Right. Like SQL Server has a much better idea of like, OK, like this choice, like when I run this query, this choice always turns out bad.
I need to make a different choice here. Right. I need to do something different.
So I don’t know. That’s about the end of that. But I think there’s a lot of room for adding in mechanisms like this. There are probably there’s probably a lot of room for adding in additional rules that you could apply to this sort of feedback.
And there’s probably a lot of rules that Microsoft could include. Or there are probably a lot of like planned exploration spaces and heuristics that Microsoft could include to do things a bit smarter. You know, my favorite example is when you join on an OR clause.
SQL Server should be able to unroll that into like a union or union all of like both sides of the OR clause with the correct sort of filtering out of things for one side. Because that’s a that’s a really big thorn in a lot of people’s sides when they start writing these queries. But anyway, I got to get going.
I forget. I have something to do at 2.30. We’ll figure out what it is in a minute, I guess. But thank you for watching. I hope you enjoyed yourselves.
I hope you learned something. And I will see you again in the next video. Goodbye. Goodbye. Bye.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.