A Little About Optimizer Timeouts In SQL Server

A Little About Optimizer Timeouts In SQL Server



Thanks for watching!

Video Summary

In this video, I delve into the intricacies of optimizer timeouts in SQL Server, dispelling common misconceptions about what they actually mean. Erik Darling from Darling Data shares insights on how these timeouts occur and discusses the different search phases that the optimizer goes through to generate a query plan. He also provides practical advice on monitoring and identifying queries that have met an optimizer timeout using Query Store and DMVs like `sys.dm_exec_query_optimizer_info`. Additionally, I touch on the importance of having a better relationship with your audience by encouraging support through subscriptions or memberships, which can be accessed via links in the video description.

Full Transcript

Erik Darling here with Darling Data. Feeling… Oh, I don’t know. Does it matter how I feel? Never seems to. The videos must flow. So here we are. In this video, we’re going to talk about… What is wrong with my face? It’s like a red… Looks like I got smacked. I have not been smacked recently. Perhaps… Perhaps to my own detriment. In this video, we’re going to talk about optimizer timeouts. Because everyone seems to think that optimizer timeouts mean… Your query took a long time to optimize. Which isn’t true. Oh no. Oh no, I’ve given it all away. But before we go in and we talk about these optimizer timeouts, we’re going to talk about how you and I can have a better, stronger relationship. If you love this content so much that you feel that… As little as $4 a month might be a good way to say thank you for all of these videos. You can subscribe or you can get a membership to my channel at the link below. In the video description. It’s a wonderful place to contain these links and whatnot. If you like this content, even the one-armed man type content, liking, commenting, subscribing, all wonderful ways to support my endeavors with SQL Server. At least, you know.

For now. If you need any consulting help with a poorly performing SQL Server, I have a variety of things that I am pretty good at. If you need something else, please let me know what it is. My rates are reasonable. For some low-cost, high-quality… It’s a good thing I didn’t mess that up and reverse those because low-quality, high-cost training… I don’t know. That sounds like what you get from everyone else. But if you want some high-quality, low-cost training, you can use this discount code and you can go to that link and you can get all of my high-quality, low-cost training for about $150 for life.

If you would like to see me live and in person, if you would like to see me strut about a stage with a microphone quite similar to this one, talking about subjects quite similar to what I talk about here, you can catch both Kendra Little and I co-hosting two days of performance tuning madness, madness, madness at Past Data Summit, November 4th and 5th in Seattle, Washington. Seattle’s lovely that time of year. You should see the sunsets over the water thing. There’s a ferris wheel and all sorts of other stuff. Quite nice.

Quite nice. So, with that out of the way, let’s talk about Optimizer Timeouts because this is something that comes up quite a bit in my consulting engagements when people pay me to talk about SQL Server. So, I guess, I don’t know. Are you stealing from me? Are you getting it for free? Oh, God. So, when your query passes all of the initial terribly boring stuff that queries have to go through before SQL Server starts optimizing them, like parsing and binding and whatnot, the Optimizer makes a first pass to assign this sort of initial starting cost to your query.

This is, I don’t know, sort of a low-level heuristic based on complexity. Things like the number of joins, sub-queries, I don’t know, things like that. Probably some other stuff.

I don’t know. Whatever makes queries more complicated to SQL Server. And it’ll assign an initial starting cost, which tells it not an initial time, like on a clock, but a number of steps that it’s willing to take in order to come up with a good enough query plan. So, some plans are trivial where there are no cost-based decisions.

I’ve talked about trivial plans on the Chan before. Some queries are relatively simple, but do have cost-based decisions. That gets you to, like, the, pretty much like the OLTP type queries, where it’s just like, oh, yeah, select a customer and their order or whatever.

So, that’s search zero. Some queries do require a bit more thought. That gets you to another search phase called Search 1, where SQL Server, you know, is willing to do a little bit, spend a little bit more time thinking about join types and ordering and whatnot.

And some require a real big think. You know, one of those, you know, grab a book and head into the bathroom type thinks about how to come up with a good plan for your query. And sometimes, some queries are real, real bad, and the optimizer will run out of memory trying to compile a plan for them, and you’ll get something that says memory limit exceeded.

That is what I wrote over here that was slightly cut off on the screen before. I do apologize for that. But the initial cost of your query sets what’s called a budget.

Most people are not aware of what a budget is. It is essentially your limitation. Something.

How much you are willing to spend on something. And that initial, that budget is dynamic and may increase. It will not decrease.

If the optimizer doesn’t explore every single possibility, if it does not exhaustively search through every possible combination of things, because it spends the whole task budget, that’s when it adds this statement opt early abort reason. And that’s when it’s timeout. So, to kind of show you where you can see some of the both initial phases, some of the stuff that I talked about up there basically.

You can look at this DMV called sys.dm.execqueryoptimizerinfo. And that will show you when you had a, how many times you had a trivial plan. And how many times like me you had no plan whatsoever.

And so just winging it. It’ll also tell you about all the search phases. So search zero, one, and two in here. It’ll tell you how much time it spent, how many tasks met it for all these things.

It will also tell you if anything went from zero to one or one to two. So that’s what I meant by it sort of being dynamic. It will also tell you if there are, if there have been any timeouts and if there have been any memory limit exceeded on here.

It might be interesting to monitor for memory limit exceeded, but you know, I don’t think most people are writing queries where that would be of any real benefit to you. So that’s where you can see that stuff. Where you can, how, one way you can find queries that have met that optimizer timeout.

Well, I’ve written a query here for query store where we can look and we can parse out the XML a little bit and we can find queries that have met the optimizer timeout. Now, what’s interesting about these is if, I don’t know, if we scroll down a bit, there’s all these queries that say stuff like msparam in them. There’s even this crazy XML namespaces query.

I don’t know who would write something that insane. But we have all these msparam queries. And if we open up the query plan for them, you can see that these are rather big and complex. The initial cost of this query though is 0.89 something query bucks.

So not a very expensive starting query. And, you know, if we zoom all the way out, it might be fun. But if we look at this one, we can see this is a rather large query plan.

So clearly SQL Server didn’t, you know, based on the initial starting cost was like, I’ll take this many stabs at coming up with a good execution plan. But if I get, if I go through all this, I’m just picking the cheapest possible one and throwing that out the door. If we look at a more realistic user query, now these are just, of course, these are msparam.

These are Microsoft internal queries timing out. If we look at an actual user query, right? And we look at this query plan, we’ll see one that’s far less complex.

There’s really only one, two, three joins in it, right? There’s an adaptive join from users to comments. There’s an adaptive joins from posts to votes.

And then there’s an adaptive join. Oh, sorry. There’s a non-adaptive hash join. Oops. I made a mistake. A non-adaptive hash join to join the results of both of those joins together. So there’s like essentially three joins in here.

But if you look at the initial starting cost, that is 13 query bucks or 13, I guess almost 13 and a half query bucks. Well, we can be kind and round up a little bit there. Make a donation to the Darling Data Home for Little Wanderers by rounding up a couple cents to the nearest 50.

And if we look at the properties of this, notice the compile time was only 46 milliseconds, but we still see this timeout down here, right? We still see SQL Server saying, I ran out of time to optimize this query. But what it really did was run out of tasks to optimize that query.

So the number of steps that it was willing to take and try to come up with a good execution plan for this query, SQL Server was just like, you know what? 13 and a half query bucks. I don’t know.

I don’t actually don’t know how many tasks you get per query buck or if it’s like gated a little bit. Because something some things in SQL, something inside of SQL Server like memory grants, SQL Server does use query cost as sort of gateways for figuring out who will get the next memory grant. So and but that’s bucketed from like less than 10, 10 to 99, 100 to 999, 1000 to 9999.

So maybe there’s like, you know, depending on not just like per query buck, but maybe for every like 100 or so query bucks, you get a different number of tasks allotted to you. I’ve never I’ve never quite had the temerity or the tenacity to go try and figure figure that part out. But of course, that also might be dynamic even within that.

And I just don’t want to give you wrong information. So I’m not going to give you any information on that because I don’t have any. I don’t want you to start thinking the wrong things.

But looking through some of the other metrics that we get back from query store, we can see compile, optimize duration or sorry duration and CPU. We can see compile memory. We can see some information about the queries like the number of executions, the average duration of the query, the average CPU time of the query.

But this this stuff over here slightly less sorry, this stuff over here highlighting. Thank you. Slightly less useful. This stuff over here for the purposes of what I’m trying to show you is that none of these spent a very, very long time compiling or parsing or binding or anything really.

Or they didn’t even use a lot of memory to do what they had to do. We see eight megs of memory for the first query, 31 megs of memory for the second query. Then if you’re like, you know, like a real memory stick or you’re eight megs, we went to the moon on that.

I don’t know what to tell you. Okay. I just I just don’t know what to tell you there. But we can see that we just did not spend a lot of time compiling plans or optimizing plans for the query.

For these queries. And now that I’m looking at this, I think that I actually forgot to convert those from from milliseconds to or from microseconds down to milliseconds. So those actually look a little bit bigger than what they were in the query plan.

But you know what? I’m 12 minutes into this thing and we’re just going to give me a slap on the wrist for maybe doing some math wrong. And we’re going to we’re going to carry on from there. But hey, it’s free.

At least for the majority of you, it’s free. So anyway, that’s about it for this. Just to sort of sum things up a little bit.

When you see a timeout in your query plan, like if you see that your optimizer timed out, that doesn’t mean that the optimizer took so long trying to come up with a good enough plan for your query that it just, you know, went just, you know, took its ball and went home. It just means that it tried through a certain number of steps or tasks or its budgeted tasks. And then it said, you know what?

I’ve spent enough. I’ve spent enough tasks on this trying to rearrange this query. I think I’m I don’t think I’m getting anywhere. I think I’m just going to give you this cheap, this good enough cheapest plan I found so far. So sometimes that’s fine.

We’re rather actually not even sometimes most of the time that’s going to be good enough. Other times you might need to, I don’t know, do other things to influence the optimizer to come up with a better plan. There have been some, well, actually, I think the use hints are still not quite functional.

It was like an optimized for analytic query or workload or something hint that I don’t think I’ve ever really gotten to work. But then there were some trace flags that would expand how long or rather how many tasks SQL Server was willing to go through to come up with the query plan. So those were, you know, actually, I’ve never actually seen those be useful for anything I was working on.

But I’m sure that they were useful for someone because they ended up as trace flags. So some someone out there probably at like Walmart or Walmart or SAP really needed them. So, you know, the rest of the rest of us little eyes, you know, we don’t get we don’t get stuff added to the product willy nilly.

Lucky to get a working cumulative update sometimes. Anyway, I hope you enjoyed yourselves. I hope you learned something.

I’m going to record some other videos now. So hopefully the key grips and gaffers don’t take their balls and go home. We got some work to do today.

Anyway, thank you for watching. My rates are reasonable.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.