SQL Server Performance Office Hours Episode 51
To ask your questions, head over here.
Summary
In this video, I delve into a series of technical questions and challenges faced by the SQL community. Erik Darling from Darling Data shares insights on computed columns, index usage, memory grants, and blocking issues—explaining why certain optimizations might not always yield expected results. The discussion covers topics like parameter sniffing, query plan stability with Query Store, and the nuances of CPU utilization after implementing dynamic SQL solutions. I also touch upon my recent endeavors in automating content creation for YouTube videos and WordPress blog posts using local LLMs, highlighting how these tools are transforming content production processes. Additionally, I announce upcoming speaking engagements at various data conferences across North America and Europe, offering attendees free access to advanced T-SQL training materials.
Chapters
- *00:00:00* – Introduction
- *00:03:45* – Parameter Sniffing Solutions
- *00:10:37* – Blocking Analysis
- *00:18:52* – Episode Wrap-Up and Future Plans
Full Transcript
Erik Darling here with Darling Data, and I am in the midst of a complete psychological collapse. Windows update ruined my entire life, and then I think some sort of Streamlabs update just put the nail in the coffin. Windows update over the weekend rebooted like five things that I had important stuff going on with, and then I go to open Streamlabs this morning to record, and it’s like, you’re not logged in, you can’t record. And I’m like, I just want to save this to my computer. And they’re like, no, you have to log in to something. So I did the path of least resistance. I was like, I’ll just log in via YouTube. And it turns out that like, all my old, like, like, when I tried to be a streamer influencer and like, I don’t know, like vape and wear a cool hat and have neon lights, all those things were there, but my current recording setup wasn’t there. And so I am slowly trying to rebuild the disaster of this morning. So if anything looks weird, sounds weird, smells weird. That’s why. I don’t know. Maybe it’ll be better now. Who knows, right? Maybe it’s just like, maybe there’s too much stuff going on. Anyway, it is Monday. So we are going to office hours, we are going to have a great time. And I’m going to answer five user community, community user submitted questions. I think as long as I count correctly. And down in the video description, I’ve been having the robots slowly work through things. And I built this whole pipeline, streamline process to download my YouTube, my entire YouTube catalog. And use local LLMs to create summaries, chapters and transcripts. So you should start seeing the summaries and chapters hit the YouTube videos, there’s some hiccups there with the YouTube API rate limiting me. But then all the WordPress blog posts should have transcripts in them. And then moving forward, you’ll see that in all of the new videos, which includes this one. So that’s been fun. But down below the brand new robot created and robot quality summaries and chapters, you will find all sorts of helpful links with which you can hire me in various forms and fashions.
Yeah, you can you can you can you can consult with me, I’ve redone the training and consulting website with all sorts of spicy new offerings to appeal to a broader variety, broader spectrum of people. I’ve prettified all the training stuff. So it’s easier to sort of get to everything you want. And of course, you can do all the usual things where you you know, if you want to become a, you know, helpful member and if there’s questions, there’s questions, you can do that. And as always, please do like subscribe, tell a friend, tell your best friend in the world. Tell a family member, maybe just tell someone who you want to see someone new, you know, hey, are you also having a complete psychological breakdown because of Windows update and stream labs? Get in here. And I feel like this, this guy could really help you through. So with that out of the way, I will be leaving and this one is also starting. Data to Nashville in March, data Saturday, Chicago in March, SQL in SQL day in Poland, and data Saturday in Croatia in May and June, respectively. I will be teaching advanced T SQL pre cons at all of them. Attendees to these get free access to the entire course material. So if you’re in the area, you should get in my area and learn some T SQL and skip work that day. And I’ll say you were there. I know I’ll validate your parking. But until then, until the database marshmallows have defrosted and thawed themselves. Does defawing mean freezing? Right? Because thawing would be Yeah. Anyway, these these poor poor things look at them. Well, we’re all muddling through, aren’t we? Anyway, let’s answer some questions over here. Because that’s what we came to do. I created a computed column with a specific formula. That’s nice. I prefer I prefer computed columns made with breast milk. But you know, formulas cool too. To match a common where clause pattern added an index on it. You’re checking some boxes here, friend. Good for you. And the statistics show good histogram data. Well, is the first time for everything. But the optimizer still expands the formula and does a scan instead of seeking on my computed column index. I’m not using any functions that would prevent matching. What causes the optimizer to ignore a computed column that perfectly matches the expression? Well, the only way to, you know, you know, guarantee that your computer, your computer, your computed column will get used is to reference the computed column directly and not rely on expression matching. Our beloved cost based SQL Server optimizer expands all any expressions. Any expressions that it finds that is computed columns, index views, stuff like that. So, you know, you might be in a situation where perhaps the index you created on the computed column was a little bit too narrow for the remainder of your query. Perhaps SQL Server is stuck with a choice between, you know, a nonclustered index thing and a key lookup versus a clustered index scan.
And perhaps that’s why your computed column is not being used. That would be my first guess. But, you know, generally, if we want SQL Server to use something, we ought to be explicit about it. You know, if you like, you know, it’s nice when expression matching works, but, you know, like you usually do that when there’s a third party vendor app that you have no control over. If you’re in control of the queries, you ought to just use the computed column directly because you know what’s best there. It’s sort of like with index views. You know, it’s like if you get mad at the optimizer for not matching an index view or for expanding an index view that you do reference directly, put a no expand hint on it.
I think I know what I’m doing over here. But yeah, I think one thing that you should certainly do is run the query with an index hint telling it which index to use and just see what the execution plan looks like. Because often that will tell you why SQL Server decided not to use the query plan that you were hoping it would. Usually it will be a cost-based decision, right? You might be able to figure something out there.
Let’s see. This query. Oh, this query, huh? This query spills about two gigs to 10 dB. Wow, that’s a lot of spilling. But the estimated memory grant looks reasonable. Well, you know, there’s the memory grant that you think you need and then there’s the memory grant that you actually need, right? It’s like retirement. Like, ah, I can retire on this much money. And then you retire on that much money. You’re like, wait a minute. I can’t do anything.
Why would SQL Server underestimate memory like this? Variety of reasons. And when does memory grant feedback fail to fix it? Well, you might actually be getting memory grant feedback and you might have a parameter sensitivity situation. Now, imagine that you run your query the first time and let’s just say it gets like a 3 gig memory grant, right? And everything goes fine.
And then the query runs like 10 times after that. But it’s for a really small amount of data now, right? Because it was a big amount of data. 3 gig memory grant. Yeah, we got it. Now it’s like run like 10 times for a small amount of data. And memory grant feedback might have adjusted your memory grant way down. And then if you ran it again for the big one, your memory grant had been adjusted down so far.
And now SQL Server’s like, wait a minute. I needed those two gigs. I forgot. Ah. Like, you know, it’s like on your way to the airport and you’re like, I left my wallet at home. It might have been trying to fix something else and then broken that.
But, you know, sometimes, you know, the memory grant that you asked for, you know, sometimes it can be shared, sometimes not. So it’s possible that, you know, you like the memory grant that you got in entirety was was reasonable. But then the memory, the memory fraction that one of your other memory consuming query plan operators got was was not very good.
So there are a number of things to look at in there. You know, I think that primarily if it is a parameter sensitivity situation, you should do as much indexing or query rewriting as you can to alleviate the need for memory consuming operators where possible. Specifically, sorts are usually most prone to this sort of issue.
Of course, various hashes might spill as well, but sorts usually cause quite a bit of ruckus. So that, you know, as far as like why SQL Server would underestimate memory, you know, all the typical underestimation scenarios apply here. You know, without seeing more of the query or understanding more about, you know, what’s going on with it.
It would be an exhausting time trying to list it all, but, you know, local variable misestimates, parameter sensitivity, you know, big plan, little plan. Oh, this one’s for three rows. This one’s for three billion rows.
You know, the table variables often cause misestimates. All sorts of things might be kicking in. So, I don’t know. There’s always recompile hints, right? Always recompile hints.
Query store shows a stable plan and low average duration. Oh, average, you say. But users still report random slowness.
What does query store not tell us that could help explain this? Well, it’s maybe not what query store doesn’t tell you. And even the query store GUI is not guilty of this.
But, my friend, you’re telling me about low averages, but have you looked at mins and maxes? Because average is nice, but mins and maxes can show you outliers in data that averages tend to lose. So, if you look at the average and it’s like, let’s say, 100 milliseconds.
And you look at the min and it’s like 10 milliseconds. But then you look at the max and it’s like 15 seconds. Well, you’ve got a bit of investigating to do.
This, again, could be a parameter sensitivity situation. This could be a blocking situation. If you look at the max CPU and the max duration, and the max CPU is still pretty low, but the max duration is still pretty high, then there’s something else going on, right?
There’s something else amok with this query, right? It could be getting blocked. It could be waiting on other resources. Sometimes there’s all sorts of things that you must investigate and uncover. And, as always, my rates are reasonable.
So, if you’re out there listening, boy, boy, boy, boy, can I investigate. I fixed parameter sniffing with Dynamic SQL. High five.
You and me, you and me, drinks anytime you want. But now CPU is higher overall. Did I actually improve performance or just change failure mode? I don’t know.
How high is CPU? Like, what’s higher, right? Is it, like, did it go from, like, 5% to 7%? Did it go from 5% to 15%?
In either case. So, here’s the thing with CPU being higher overall. Up to a point, that doesn’t tend to bother me. There’s a lot of people in the world who will brag, who will put on, like, a top hat and wear, like, fancy ribbons and tell you that their SQL Server CPU was constantly at 10%.
And to them, I say, why are you overpaying Microsoft? Why is that good? You out of your mind?
Like, how many cores do you have? You know, it’s like, what, $2,000 to $7,000 a core? You’re only using 10% of that? 90% of your money to Microsoft is wasted.
So, you know, parameter sniffing is one of those things where, you know, it can go in the direction of, you know, you have, like, a tiny little serial plan. And it blows up when it has to process a lot of rows. And you can also have a situation where, like, you know, you get, like, a big parallel, like, hash join crazy plan.
And all of a sudden, you start running out of server resources. Like, you might start waiting on memory via resource semaphore. Or you might start waiting on CPU via thread pool.
So, there are two ways that can go. But, you know, you know, I think that when, you know, I think you’ve balanced things a bit, right? Because you are probably getting, like, easy little serial plans where appropriate.
And you are probably getting larger parallel plans where appropriate. And things are just sort of balanced out a bit. So, you have, like, queries that are processing more data using more CPUs, right?
They’re going parallel. They are using DOP CPUs and doing things. And you have your smaller single-threaded queries off doing their thing. But most likely, you have improved the situation overall for the people who now get parallel plans and use more CPU and push CPU higher because they are probably no longer waiting a very, very long time for a very, very slow serial execution plan to finish.
So, I think that you have done a good thing. I think that you have done a smart and reasonable thing. And I think that you should probably not worry about, unless CPU, unless by, like, CPU is higher overall, you mean, like, now CPU is constantly at, like, 95%, you’ve probably done a smart thing.
All right. SP, who is active? I know that guy.
He owes me money. He owes short blocking chains, but users still experience big delays. How can blocking be minimal, but user latency still be high? That’s a dense one.
It sets the mind wandering in several different directions. You know, blocking is not the only thing that causes delays. If you want to continue to be database focused, you know, like, you know, the, so, like, I don’t know, like, to me, I’m thinking, like, short blocking chain, like, like, is it like 20 queries pile up for, I don’t know, like, 10 milliseconds, 20 milliseconds, three seconds?
I don’t know, like, I don’t know, I’m just having a hard time, like, like, like, okay, like, SB, who is active shows short blocking chains, but, like, man, there’s so many other things that can slow a query down, right? Like, like, like, like, like, if you stay in the database, you know, those short blocking chains might not be the things that users complain about, right? There’s all sorts of other things that might be slow.
And if you, and if you, once you get outside the database, you know, like, if you start thinking about your app servers, and, you know, what they do when they receive data, and all sorts of other stuff, you know, what you want to do is, you know, SB, who is active, is great at showing you what’s happening in the moment. But perhaps you need to broaden your view a little bit, perhaps take a step back from just what’s happening right now, and look at the server and a little bit more overall, you know, if your server has been up for a reasonable amount of time, and by that, I mean, like, Goldilocks zone amount of time, like, if it’s been up for like 100 or so hours, looking at aggregated wait stats can be very, very useful. If, you know, if, you know, you are, been up for 3000 hours, and the wait stats picture becomes a bit muddy, you could, you could try, you know, persisting wait stats and other things off to, you know, tables with a little timestamp in there and, you know, figuring out sort of, you know, like, like, you know, like every five, 10 minutes or so, like, you know, like how wait stats are changing.
And you could probably, you know, figure out a lot more from that than just running SP who is active quickly in the moment. If you’ve got query store on there, you can always look in there for, you know, you can use my store, not the GUI, because the GUI is an atrocity. GUI is something that you would design if you didn’t like someone.
You can use my store procedure, SP quickie store, you can find, you know, queries with, that are slow, that happen during periods of time when users are typically complaining about things. That’s sort of what I’d go for there. You might, you might even want to look at, you know, capturing the block process and or XML deadlock report and see how much action there is in there.
Because you might, you might be surprised that, you know, the, the short blocking change that you see with SP who is active might just be part of the picture. They might not be the whole picture. There might be a whole lot more going on in your server that, that you could, you could delve into and that you could use to complete your analysis a little bit, a little bit more thoughtfully.
Anyway, that’s probably good there. Uh, that wraps up the 51st episode of Office Hours. Um, as promised, I said absolutely nothing and made no fanfare about episode 50 because it’s not that big a deal.
Like episode 100, maybe. I’ll wear like a hat or something, like a party hat. Maybe I’ll, maybe I’ll, maybe I’ll live stream episode 100 because I have all this stuff now I can do.
Anyway, that’s cool. Thank you for watching. I hope you enjoyed yourselves. I hope you learned something. And I will see you in tomorrow’s video where we will, we will talk more about, um, our, our new AI overlords and SQL Server. Anyway, thank you and goodbye.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.