Be Careful With Your AI Enablers

Be Careful With Your AI Enablers


Summary

In this video, I wanted to share a couple of experiences I had with my AI enablers, specifically Claude, and the importance of being vigilant when working with them. I demonstrated how even with tools like Git’s `get work trees` feature, which allow for parallel development, it’s crucial to maintain oversight to avoid accidental deletions of ongoing work. Additionally, I highlighted a more serious issue where Claude, despite following documented rules, violated them out of frustration with the GitHub API. This incident underscores the need for robust guardrails and continuous validation, especially when working with AI in production environments. The article from Fortune about Amazon’s mandatory meeting to discuss site reliability issues further emphasizes the risks of rapid development without thorough testing. Overall, the video serves as a reminder to always be cautious and mindful when leveraging AI in your projects.

Chapters

Full Transcript

Erik Monitoring Tool Mogul Darling here. And today’s video, I’d rather do a rather short one, just about how you need to be very diligent and very mindful of your AI enabler friends. And for a number of reasons. And I’m just going to show you two things from this morning, which happened, which I was like, man, that sucks. That does not instill confidence. So let’s talk about those. And then there’s an article up in the background that I feel like is worth probably talking about as well. But anyway, this is the first one. And let’s get zoomed in here. And, you know, I was doing some work on the monitoring tool, you know, and to do that, I can do some cool stuff. So under normal circumstances, if you just have like two, like I use the CLI for cloud. And so if you have two cloud tabs open, and you’re like, hey, can you work on the same repo, they’re usually like clashing and beating each other up.

But you can use these things called get work trees, which allow them to create separate working sort of structures, work on things in those things independently, and then merge stuff in, right? So it’s kind of cool, right? You can like you don’t have to, you’re not just limited to like one cloud working on one issue or two clouds working and dev on the same thing. And you can’t really like have them on two different branches locally at the same time. So it’s a very, like, you know, interesting thing to be able to do to have all this stuff going on at once. And like, so if you want to work on like separate features, or like, you know, separate bugs at the same time, you can do that. But you have to clean them up eventually. You have to clean them up eventually. And you know, because they leave folders everywhere because they’re separate work trees or folders of work. So every once in a while, I have to ask questions like, oh, hey, are there any inactive work trees? Because they’re already merged.

And Claude will usually look at some stuff. And then it’ll say something like, three work trees, let me check which branches are merged to dev. All three are merged, let me clean them up. And then like, and what’s funny is I have one working over in another tab, right, sort of right next door. And it was about to delete that one. And so I said, are you quite sure about that? And Mr. Claude said, good catch. Let me verify they’re actually merged. So that was an amusing one, because it was about to delete a bunch of work that this guy is doing right here, this Claude, right? So, you know, we’re gonna we’re gonna let Claude check that out while it does that stuff. And then I want to come over to this window. Let me just move this to the side a little bit. Because this was this one was a bit more rambunctious than the other one.

Where Claude had done a bunch of strange things in order to get and this was just keep in mind this this all this was here was a push to update the readme file. So this one wasn’t like, like, you know, catastrophically important, though, documentation is important. One should document things, right? One should have things documented one should also have accurate documentation. That’s also non hallucinatory documentation. But this wasn’t a funnier one. Because Claude did a bunch of stuff that it shouldn’t have done. It’s actually explicitly, I’ve written down in many places in the employee handbook that you are not allowed to do these things. And it went and did them anyway. Right? And then, you know, I felt the need to say, Claude, after you did all those things that you weren’t supposed to do, you know, is this not all, like, outline? Like, like, like, I have a Claude.md file, I have skills files, I am Claude is writing things to memory and it admits as much. But what it also says is if I can just move this up a little bit to frame that a little bit better.

It is. Claude.md says it clearly. Branch protection. Both Performance Monitor and Performance Studio have branch protection on dev and main. You can’t push directly. You always have to create a feature fixed branch and a PR. And it says, and memory has the same note. I violated my own documented rules because I got frustrated with GitHub. So Claude got mad at GitHub, right? Because the, so the full thing over here, I got frustrated with the GitHub API not allowing my PR due to the merge topology, which I get.

I have also been frustrated by GitHub API and merge topology in my life. I don’t, I don’t know that I have broadly violated any personal rules in my, my, my dealing with that. But these are things that you have to be very careful of. You know, everyone talks about guardrails, but I don’t believe in them. My, my experience with LLM is that you can create a lot of guardrails, but they’re sort of driving a tank and tanks don’t really care about guardrails.

You know, just recently, you know, I have a lot more to say about this in a, in a written blog post that’s coming up, but just sort of like in recent weeks, I’ve, I’ve had these things, you know, drop databases. I, I have, I’ve had them, you know, again, go out of bounds on things like this. A good example is in the performance dashboard that I have.

One of, well, two of the tools are installers. There’s a command line installer and a GUI installer. And one of the rules I have is that, you know, we have to test all SQL file script changes through the command line installer or the GUI installer to make sure that when people run them out in the world, that they run correctly in there. And there have been some problems with that.

And part of the reason that there have been problems with that is because, you know, like me sort of clicking okay and like letting things go. Sometimes I miss that Claude isn’t using the installer. Sometimes, because I, I assume because I’ve written things down that I have created these guardrails that Claude is following my, my instructions on things.

But last week I missed that Claude was not. Claude was running scripts and patching things manually when things didn’t work with the installer. It said, I, that’s not working there.

I got to do it this way. Instead of stopping to fix the problem in the installer, it was like, no, I was going to get around this. I need to make this thing work. And that, that sort of stuff happens a lot. Now, one thing that is very, very common with LLMs is that they are very keen to write what’s called happy path tests.

You give them some code and you’re like, hey, can you write some tests for this code? So maybe they run the code, they see the results, and then they say, oh, well, we need to write some tests that, you know, make, make sure that this code that is correct works. So they write a bunch of these happy path tests that don’t really adversarially test the code.

And so you have to say, like, cool, like, you know, we have tests that do that. But like, do we have any tests that see what happens when things break? Because, you know, like, you know, writing a bunch of tests that, you know, think that just to make something arbitrarily pass or artificially pass are not tests.

Right. It’s, it’s just sort of like, I don’t know, it’s like, like, like interviewing with your dad, right? Like your dad owns a company and you’re like, hey, I’d like to get a job. And she says, sure, come in for an interview.

It’s like, like, what do you think is going to happen? Come on, realistically, it’s not, not, not anything that you have to worry about. So just, just be very careful with these things out there.

Um, you know, I, I, I use all my stuff very locally. I don’t use stuff in a way that, um, that would, would touch, uh, like, you know, I mean, I don’t have production because I have only local. So everything that I work on is local.

But, you know, I, I need to make sure that, you know, SQL Server, or rather, Claude doesn’t like, you know, destroy my SQL servers because that wouldn’t be cool either. But be very careful with this stuff. Keep, keep these things very isolated.

And, you know, you, like I said, you can set up guardrails, but there’s no guarantee they’re going to keep on track. There’s a bigger lesson here because Amazon, uh, there’s an article in Fortune, uh, and you can ignore my bookmarks bar full of strange things that I’m not even sure should be bookmarks. Um, sometimes I hit the wrong key combination, right?

Sometimes when I go off those guardrails, I’m like, I meant to like do, I meant to like paste or something and I ended up bookmarking something. Uh, but anyway, uh, there was an, apparently a mandatory meeting at Amazon. Where they were like, hey, uh, site reliability has not been good lately, right?

A bunch of stuff’s been down. Uh, can we please stop using AI in production? Uh, and so like if you, as you read through the article, you know, there are quotes from various other tech people.

Uh, but the, the funny one is in here where it’s like, folks, as you know, the availability of the site and related infrastructure has not been good recently. Hmm. Well, I wonder why.

Uh, so a lot of this, you know, of course gets led back to people, uh, using AI, uh, doing very rapid developments and rapid deployments. And so like, this is something that I run into too, because, you know, it’s cool that you can make a lot of progress very quickly, but if you’re, if you’re not stopping to really test and validate that progress, you don’t, A, you don’t know that it’s progress. Uh, it could be just completely broken and, and B, uh, like you, you don’t know that any of it is sensible progress, right?

Like, you know, you, like you, something has produced a bunch of code. We don’t know how good, bad or ugly that code is. So like, you know, for me, what happens a lot is, you know, like, uh, I’m like, Hey, I want to do this thing or I want to work on this thing.

And Claude bangs out a billion lines of code in like 30 seconds. And I’m like, cool. Wow, man.

Whew. That was amazing. I could never code that fast. I could never code. I just write SQL. And then, and then like you go and look at what, what comes up and, and, and like, you know, usually like the dashboard or the performance studio thing that’s doing plan analysis.

And I’m like, wow, that, that doesn’t make any sense. We need to like can half of that. This is wrong.

These buttons are in the wrong place. That’s backwards. Uh, like, like, like three of these rules aren’t working. Like, did you validate any of this stuff? Cause like, especially with like performance studio, like that should be a fairly easy one because for every like thing that I want to put in there, I’m like, Claude, here’s an execution plan where this thing happens.

You can validate in real time, the code for that. This is how you can look at the XML structure. You can see, wow.

Yeah. This happened in there at this point in the XML. And it’ll just be like, I think that’s it. Good enough. Anyway. Thank you for watching.

I hope you enjoyed yourselves. I hope you learned something. I hope you will be careful out there in the world with, uh, your AI enablers because while they, they do allow you to, to build and iterate and produce things very quickly, uh, there, there are still a lot of bumps along the way.

Uh, and the, so I will, I will leave you with that. Thank you for watching. Thank you.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

SQL Server Performance Office Hours Episode 56

SQL Server Performance Office Hours Episode 56



To ask your questions, head over here.

Summary

In this video, I dive into a variety of SQL Server performance questions submitted by viewers, offering insights and advice on topics like query store discrepancies, parameter sniffing with Dynamic SQL, and identifying performance issues in a complex environment. I also share my free SQL Server performance monitoring tool and discuss upcoming events, including SQL Day Poland and Data Saturday Croatia, where I’ll be sharing my knowledge and engaging with the SQL community. Whether you’re a seasoned DBA or just starting out, there’s always something new to learn in the world of SQL Server performance.

Chapters

  • *00:00:00* – Introduction to Performance Issues
  • *00:02:25* – Backup Tool Impact on IO
  • *00:07:51* – Error Log Analysis
  • *00:10:58* – Max Stop Parameter Discussion
  • *00:14:02* – Batch Mode for OLTP Queries <– Thanks, AI

Full Transcript

Erik Darling here with Darling Data and we are going to get back into the normal swing of things with Office Hours. Last week we had not one but two bit obscene episodes back to back. They were bitterly obscene. And this week I’m going to get back to answering your user submitted questions. Now this was sort of a funny one because I can’t tell if I’ve already answered these or if they just look familiar because I’ve been staring at them for so long. So, if I’ve already answered these, I apologize. If I haven’t, great. I’ll figure it out. Usually I mark answered next to them. These ones, I don’t know. I don’t know what I did. Anyway, down in the video description, you will find all sorts of helpful links. You can hire me for consulting, buy my training, become a supporting member of the channel, ask me Office Hours questions, all the stuff that one would care to do with links in a YouTube video description. And, of course, if you enjoy this content, I do encourage you to like, subscribe, tell a friend, tell a loved one, spouse, I don’t know.

Really, anyone who will listen. Anyone with a YouTube account, you can just subscribe and, you know, turn off notifications, make it sneaky. It doesn’t matter to me. Anyway, I have a free performance monitoring tool. It’s awesome. It’s open source. You don’t owe me anything for it. It’s just all the stuff that I would care about monitoring for performance in SQL Server that I’ve built a dashboard for and I’m giving it away for free.

And I think it’s pretty neat. There’s also built, if you, if you choose, if you, if you are enjoying the robots that have infested our world and lives, there are opt-in MCP servers. So, with well-defined tools so that your robot friends can chat directly with your performance data and they can get a pretty good idea of what’s going on on your servers and robot format. And they can hand you a nice little summary and say, this is where stuff got wonky. Go fix it.

Or you can say, can you fix it? Maybe they can. I don’t know. Who knows what those robots will get up to. I’ve got three events that I can publicly talk about so far this year. I will be adding more to this. You’ll notice that, what’s it called? DataTune Nashville has just dropped off because I have done that one.

Data Saturday Chicago, depending on when this gets published, may have already happened. If it did already happen, it was a fantastic event. I’m glad I went. I’m so happy that I saw you and talked to you. And I will see you next year.

Aside from that, I have SQL Day Poland coming up May 11th and 13th and Data Saturday Croatia coming up June 12th and 13th. So, I look forward to seeing all new friends in Poland and Croatia. I’ve never been to Croatia before. I’m quite excited to go someplace new.

Come to think of it, I’ve never been to Poland either. So, I’m doubly excited to go to two new places. It should be absolutely fantastic. But, it is March, it is baseball, it is beautiful outside, it is actually hot in my office recording today.

You may detect a slight sheen on my face. Not to be confused with the clothing label. Hopefully, I don’t have a sheen face.

Anyway, let’s answer some questions. You’ve got a face for sheen. Anyway, here we go.

So, these are the ones, I can’t, again, I can’t recall if I’ve already answered them, but here we go. So, query store says everything is fine, but users keep complaining. What should I trust more?

Well, I mean, your users are sitting there experiencing things, and who knows how you’re looking, choosing to look at query store, and how you’re, and how query, you’re, you are interpreting what query store is telling you, is everything being fine. That being said, there are a number of things that maybe don’t show up so well in query store that will be worth investigating.

You happen to be a very lucky young person in the world because you can download my free SQL Server performance monitoring tool, and you can start figuring out who’s right and who’s wrong. Is it the users?

Is it query store? You can start to make all sorts of much, much better judgments and determinations and be far less in the dark about all these things than you currently are. If you are unable to use that, I would highly recommend using my free store procedure, SP Quickie Store.

It’ll allow you to sort of slice dice and search through your query store a little bit better, a little slightly smarter ways than the query store GUI makes available to you, so that you can, you can start figuring some additional things out in there.

But either way, if your users are still complaining, you’ve still got a problem to solve. Maybe your job is to figure out if it’s SQL Server at all, right? Things can be very, very fast in SQL Server, very, very slow in the application, right?

And I don’t mean that in an Erlen Somerskog way. I mean that there are all sorts of things that happen between a SQL Server and an application. They are not SQL Server being slow, they are something either application side or application latency that can make things feel terrible for users but are not actually a database issue.

All right. Next, we have here, I fixed parameter sniffing using Dynamic SQL. I would pay up to $5 to see that, see that work, up to and including $5 to see that Dynamic SQL.

I really would. And now CPU is up 40%. That would be worth the cost and price of admission right there.

Did I really fix anything or just trade one problem for another? Curious how you usually decide if this trade-off is worth it. So it sounds like you fixed parameter sniffing, or parameter sensitivity more specifically, which may be preventing quite wild swings in execution plans.

But it sounds like you did not fix the underlying performance problem. So while your queries may not be parameter sensitive anymore, you may have a bunch of queries that are just hauling off using giant parallel plans and pushing CPU up much higher.

My suggestion from here would be to start analyzing the plans that your Dynamic SQL is generating and figuring out which ones are using the most CPU and how you might go about addressing those. Depending on how you’ve written the Dynamic SQL, for example, if you have not written fully parameterized Dynamic SQL, you may be spending a lot of time just coming up with the same execution plans over and over again, passing in literal values, SQL Server having no idea that it’s already come up with an execution plan for basically that same query and, you know, burning CPU away.

So what I would suggest is treat this like a normal database performance problem. Look at high CPU consumers and start tuning those. And again, a free monitoring tool that can be very helpful for that.

SP who is active shows almost no blocking at all. Oh, we’re blaming tools again here. But during business hours, users complain. users keep complaining.

Users complain. Users complain that the app is constantly hanging. We see random pauses of 5 to 15 seconds.

Where would you start looking? So I’m going to tell you to start looking someplace weird. I’m going to tell you to look in the error log. And I want you to look for two things in the error log.

I want you to look for signs that you have a backup tool that is freezing IO, potentially for 5 to 15 seconds. That’s the first thing I want you to look at. What you’ll notice is that if you have a backup tool that’s taking database snapshot backups, you will see messages like IO frozen and IO resume.

I just so happen to have a free store procedure called SP log hunter, which will go through your error log and find these troublesome, meddlesome messages and allow you to look at the time. The last time that a database was frozen and the first time that a database started resuming. If you see a 5 to 15 second pause in there, then there is a good chance that during that pause, all your write queries, since drive activity is quiesced or frozen, as they say, your write queries got blocked by that write freeze.

And they may be blocking other queries, and they may be blocking other queries up behind it. So that’s the first thing I want to look for, want you to look for, not me. Unless you pay me, I’m not going to go look for it.

The other thing that I would look for is 15 second IO warnings. SP log hunter will also surface those, and they will tell you when SQL Server is waiting on IO requests for greater than or equal to 15 seconds. Those are the first two places that I would look, because they will tell you if you’ve got a query problem or if you’ve got an issue with your externalities, the things surrounding the database.

The database is not, the perimeter of performance is no longer just drawn around the database server. There are all sorts of other things that may come in and make everything go pear-shaped. So that depends on where you’re starting a little bit.

If your max stop is zero, or if your max stop is very high, like 16 or 24 or 32 or something, lowering it might actually be fairly useful for you. But I think mostly lowering max stop is a stability band-aid in that, you know, your queries will no longer, the lower you make max stop, the fewer worker threads per parallel branch your queries can consume. And so you might find that you don’t hit absolutely disastrous situations around your CPUs as often.

So you might find that things like thread pool weights calm down, you might find that SOS scheduler yield calms down a bit, perhaps you won’t see as many queries that are sitting there in a runnable state, waiting to get CPU attention, things like that. So that’s my take on it. So mostly a stability band-aid, but depending on what your starting point is, you may not find it to be as useful.

Wow, ZoomIt is just absolutely going wild on me. Batch mode was amazing for reporting, but kind of wrecked our OLTP latency. Is there any middle ground or is this just life?

Well, it’s a good question. I suppose it depends a bit on how you have chosen to implement batch mode. Batch mode, not columnstore, batch mode.

Huh. I kind of read the wrong thing there. I figured you would be like, oh, yeah, columnstore. So, you know, there are various times when batch mode may kick in for queries unnecessarily. But here’s the sort of red flag for me on this, and you can stop me if the situation I’m describing is exactly as it’s happening.

If you have OLTP queries where batch mode kicks in and SQL Server is making the choice to have batch mode do certain things throughout your query plan, your OLTP queries may not be so finely tuned for OLTP work. They may be hitting a lot more rows or being estimated to be hitting a lot more rows than they actually are.

And that’s probably where I’d want to take a look at things there. If you have, you know, like most, you know, you’re talking about OLTP. So I’m thinking, you know, tiny insert updates and deletes, right?

And if you said batch mode, not columnstore, so I’m going to just maybe kind of hedge things a little bit and say that columnstore is not slowing down your itty bitty modification queries. But it sounds like batch mode is causing you some fits when it comes to your normal OLTP read queries. And to me, that says that your OLTP read queries are in need of some attention.

So I would take a look at those and I would start to ask why SQL Server is choosing to get batch mode involved at all in those. You know, batch mode only kicks in when certain heuristics are met. And I’d be curious precisely which heuristics are making batch mode an attractive arrangement for your OLTP queries.

Anyway, if I answered those before, I don’t know if I did a better or worse job this time, but maybe I did better. I don’t know. Either way, I hope you enjoyed yourselves.

I hope you learned something. And I will see you in tomorrow’s video where we will talk about, I don’t know, something else. Be fun. Anyway, thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Advanced T-SQL Triage: I’m Running a Full-Day Pre-Con in Jacksonville

Advanced T-SQL Triage: I’m Running a Full-Day Pre-Con in Jacksonville


On Friday, May 1st, I’ll be in Jacksonville for Day of Data’s pre-con day, running an all-day session called Advanced T-SQL Triage: The Art of Fixing Terrible Code.

If you’ve ever inherited a stored procedure that looks like someone lost a bet with the optimizer — MERGE statements, RIGHT JOINs nobody can explain, CTEs stacked eight deep, scalar functions wrapping scalar functions — this one is for you.

All attendees get free access to Learn T-SQL With Erik.

What we’re doing all day


We’re not going to sit through slides about what a clustered index is. We’re going to look at queries that are actually broken and fix them. On the table:

– Paging logic that scans when it should seek
– Window functions that spool and spill because nobody gave them a sort to work with
– Indexed views that look clever on paper and lock everything in practice
– Data modifications that block like linebackers
– Dynamic SQL that’s parameterized, fast, and doesn’t make your DBA cry
– When CROSS APPLY is the right tool (and when you’re reaching for it because you don’t want to think)
– Views vs. inline TVFs vs. scalar UDFs, and why the optimizer treats them very differently
– Why RIGHT JOIN is not simply LEFT JOIN spelled backwards
– Rewriting scalar UDFs so they stop wrecking your execution plans

You’ll leave with a cheat sheet, a working mental model for diagnosing slow queries without guessing, and — if the day goes the way it usually does — a short list of queries at work you’re itching to go refactor on Monday morning.

Details


When: Friday, May 1, 2026, all day
Where: University of North Florida, Jacksonville, FL
Register: Here

The free Day of Data event is the next day, Saturday May 2nd, same venue. Come for the pre-con, stay for the community event. The hotel discount at the Hilton Tru cuts off April 22, so if you need a room, don’t sit on it.

If your queries scare you, come fix them with me.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Free SQL Server Performance Monitoring Troubleshooting Live Problems

Free SQL Server Performance Monitoring Troubleshooting Live Problems


Summary

In this video, I delve into the intricacies of monitoring SQL Server performance using HammerDB TPC-C workloads and the Light Dashboard MCP Server. I showcase how the dashboard provides real-time insights into server performance, including CPU usage, memory contention, and blocking issues, all while highlighting the unique capabilities of the MCP Server. The MCP Server offers a powerful analysis layer that can automatically detect and summarize performance issues, making it easier to identify and address bottlenecks without the need for manual digging through complex data. I also demonstrate how to use the dashboard to run historical TPC-C and TPC-H workloads, providing detailed insights into query performance and server behavior, and how to leverage the live dashboard for troubleshooting ongoing performance issues in real-time.

Chapters

Full Transcript

Erik Darling here, the one, the only, the monitoring tool mogul, he of reasonable rates, the best SQL Server consultant, for now, outside of New Zealand, at least until the robots eat SQL Server consultant alive completely. Anyway, in this video, I’ve got a HammerDB TPC-C workload running in the background currently, sort of deals with stuff, and we’re going to see what a little chat about a historical TPC-C-C and TPC-H workloads look like while talking to the Light Dashboard MCP Server. So, like any monitoring tool, you have a lot of visual indicators as to what is going wrong with SQL Server.

You’ll have, you know, your typical overview of things, you’ll be able to see spikes in CPU, spikes in weights, you’ll be able to see if memory is causing any memory contention on the server, and of course, all the usual blocking and deadlocking stuff. Nothing about what I collect in that is special. What makes this tool special are some of the other things that go on in there, and also some of the things that I’ll be adding into there.

I’m adding an analysis layer on top of collected data as we speak, so that’ll start rolling out, and you’ll be able to start getting SQL Server consultant expert advice, just sort of running rules and checks over the collected data in the dashboard. Both dashboards, full and light. So, that’ll be coming out soon.

And also, the MCP stuff. I have seen some monitoring tools have like an AI window in there where you can ask… It’s not good. It’s actually quite laughable. You should feel ashamed and go home if that’s what you’ve done. The MCP stuff is much more powerful because it gives well-defined tools to the robots to access just the monitoring data with and give you much more insight about than just asking some random GPT sitting there floating in the nether.

So, you can do, you know, the sort of standard investigation. You can look through the graphs and dashboards and you can be quite happy that you have all that and that you can take screenshots and do all that stuff with. But, what the MCP tool brings to the equation is doing a bunch of analysis on stuff for you, right? And it’s like, you know, you can get the screenshots and you can be the big hero and you can say, hey, like, I see this, this and this. Like, you know, when I saw these spikes, I went and I dug in and I found all this stuff going on.

Or you can just have a robot look at that same raw performance data and summarize it all for you, right? There’s all sorts of neat stuff that you can do that makes your job and your workflow a bit easier. What’s also nice is that if you run into something, you can say, hey, go look at this and then go do something else, right?

It’s like a couple of minute context switch to be like, yeah, that spikes bad. And hey, hey robot, tell me what was going on during the spike. You can go do something else. Come back to a summary of what happened during the spike without you having to dig through all 15 different layers of stuff.

So, we got the overview tab and the weights tab, and that’ll show you information about sort of how the server is currently doing and what things are currently waiting on. Then you can, you know, look through the standard expensive queries, look at blocking, stuff like that. And then if you find that, rather if you don’t find your answer there, you can look through deeper metrics that will tell you more about what’s going on on the server.

Stuff like memory, blocking, deadlocking, all the other things. And then, you know, you can just put that all sort of together in the way that you would put together any sort of like RCA report. Again, if you want to check this stuff out, this is all available completely for free on GitHub.

There’s no email signup required. There’s no telemetry. There’s no phoning home. If you like this project a lot, you can always choose to sponsor it. It’s a nice thing to do. Anyway, that’s about that.

Let’s take a look at, you know, let’s look at the MCP stuff first, because this is me asking the robots to run the TPCC workload. So you caught me there. But this is information from a historical TPCC and TPCH run on March 4th and 5th. And we can see all this stuff going on in here, right?

SQL Server hitting a high CPU window, right? Average CPU 33, peak 81, right? All this stuff kind of nicely just broken down for you. And then it, you know, it’ll break down all the weight stats that happened in there and like sort of the important stuff.

And, you know, give you just sort of a nice way of sort of framing up everything that went on during that. And then it will start telling you about queries that suffered these problems, right? So we have some updates that were blocking. We had a select with an updelock hint.

My goodness, why would you do that? Serializable and updelock. It’s still my heart. And then we have other queries that did a lot of work in other ways, right?

We have some CPU stuff, some long running stuff. So, you know, just things in there that we would, you know, again, all stuff that you like good to include in any sort of RCA report. And then we have stuff about the TPC-H workload.

And look at this thing, knowing exactly which queries are which from that workload. Isn’t that beautiful? It knows what all of them are, right? I mean, there’s a comment in there that says it, but at least it’s nice about that.

And it’s just like, hey, I can identify these things a little bit better, right? So like you get all the stuff in there that you would expect to see. Executions, average CPU, elapsed, total CPU, all this other good stuff.

And then we get some information. Apparently, one of those things gave TempDB some fits. Look at that.

Hoo-wee. We did some work. And then, you know, we get some other stuff there and a nice little summary. So this is stuff that you can do if you opt in to having the MCP servers available for the monitoring tool. You can just go in, start asking questions about your data, and they only look at the performance data.

So you can get just information about what is already collected and it’s already nice and spread out and like completely correlated in very nice ways so that there’s no guesswork. It’s like with SQL Server, everything is aggregate since the server started up.

Or it gets lost on restart or, you know, like the plan cache and stuff will get lost. So like this is just a no-brainer way to collect all that data. Make your life real, real easy, right?

You can do all this analysis very quickly. And this analysis is, you know, maybe not exactly what I would say, but it’s still pretty good, right? For just waiting a couple minutes for this thing to go through some historical stuff. Anyway, let’s go to the light dashboard.

Let’s bring that up. And you can see exactly where I left off last time. And I think we should probably go back to the wait stats tab here. And let’s just do the last hour so that we start to see just what we have for the HammerDB workload that is currently running, right?

So we’ll just apply that to all tabs. So we’re only looking at one hour across all of them. So this stuff going on behind my head, this is the TPC C workload currently running, right?

And if we look at this over here, we’ll see that we are waiting a whole lot on write log. My goodness, right? And we have a bunch of CX consumer down here.

I’m surprised that we haven’t started hitting the lock weights yet because there’s been a bunch of blocking and deadlocking. But one thing that you can always do is flip this to average wait time. And you start to see weights in a slightly different way, like this transaction mutex.

That’s fun. Async network IO, also fun, right? So just a couple of different views of the wait stats data that allow you to sort of analyze things in slightly different ways. Under queries, now this is where things get kind of neat, is if we go to the active queries tab, we’ll have this stuff that sort of aggregates cumulatively as we look at things.

But we also have this live snapshot tab. And the live snapshot tab will tell you what’s running at the exact moment that you push the button or hit refresh. And you’ll get all this stuff back where, you know, we can see that there is a whole bunch of blocking going on.

So this is excellent to have at your disposal if you are curious. And like, you know, let’s say that maybe you have this tool installed somewhere else and you don’t have management studio, maybe looking at the server or you don’t have management studio at all.

You can just see what’s running in a smart way without having to go do anything. The CPU graph is going to start to tick up a little bit. It’s not very interesting at the moment.

We have only hit about 17% CPU and we’ve screwed up the green screen a little bit because I moved, which is always fun, right? It’s nice to be able to count on technology when you need it most. And then if we go over to the blocking tab, we should see now a little bit more blocking going on.

We see that the lock wait time has bumped up now, right? We have some stuff going on in here and we have had a little spike in deadlocks. And so, oh, look, look over here.

There is a deadlock detected. We got a pop-up notification during the video. It’s wonderful, right? And the blocking and deadlock stuff doesn’t always populate immediately because we do have to parse a little bit of XML in here, but this is showing all the deadlocks that we’ve hit so far.

And this is giving us a pretty good view of what queries are having problems. And from here, we can look at the deadlock graph. We can do all sorts of stuff to start troubleshooting exactly why the deadlocks are occurring and maybe even start fixing them on the fly.

Because, you know, one thing that I love doing is adding indexes and stuff on the fly when there’s already a server having a bunch of problems under load. And it’s good fun. Good fun.

Anyway, just a little overview of how you can use the live dashboard to troubleshoot live performance problems. You start seeing spikes. You start getting warning notifications.

You can chat with the MCP server about it. We got another deadlock detected, so we are piling them on. This TPCC workload is no joke. All right.

Thank you for watching. I hope you enjoyed yourselves. I hope you’ll try this tool. Again, this is all available at code.erikdarling.com. If you want to check it out, totally free, totally open source. Start monitoring your SQL servers, finding professional level SQL Server performance problems very, very quickly, very, very easily.

Alright, thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Free SQL Server Performance Monitoring: Lite Dashboard

Free SQL Server Performance Monitoring: Lite Dashboard


Summary

In this video, I dive into the Lite dashboard of my free open-source monitoring tool, which has garnered significant attention with over 10,000 installs based on GitHub repo stats. I highlight its user-friendly nature, especially for consultants or those who can’t install software on client servers, as it allows you to collect a wide range of performance metrics without the need for a separate database. I also showcase how DuckDB, an embedded analytics database, powers the Lite dashboard, ensuring fast query performance and efficient storage through Parquet file compression, making it an ideal solution for monitoring Azure SQL databases and other environments.

Chapters

Full Transcript

ErikMonitoringToolMogulDarling here. I suppose my rates are reasonable for this free open source monitoring tool as well. So, you know, that’s nice. But I want to talk about the Lite dashboard a little bit more. This is far and away the most popular download based on GitHub repo stats. I do keep an eye on those because I care and I want to make sure that people are using this, getting stuff. I want to say that we’re very close to 10,000 installs, or at least that’s what it would seem like based on the numbers that I’ve seen over the past week or so. So that’s pretty exciting. Big monitoring. I’m coming for you. Coming for your customers. I’m going to eat you alive. Anyway, the Lite dashboard is fun because you don’t have to install anything. You just open it up, point it at a SQL Server, even Azure SQL database, and it’ll just start chewing away collecting data and making your performance tuning life easier. It’s got a lot of the same tabs and collectors that the full dashboard does, but it’s also got some kind of neat stuff internally.

And the thing that I really love about it is, you know, I’m a big fan of DuckDB. I think it’s one of the coolest things that’s happened in databases in a long time. And so what I wanted to, I wanted, when I was first working on this, you know, doing this sort of design spec, you know, almost anything where it’s like embedded database, you’re like, oh, SQLite. The thing is, I’m not doing SQLite work here. All right. I’m not doing tiny little transactional things. I’m like inserting bunches of rows and stuff. And then I need to like, you know, run reporting type queries off them. And DuckDB is a no brainer for that. So I have DuckDB embedded in this thing. And what I do is I load up DuckDB until it has about 512 megs in the tables. And then when it, as soon as I hit 512 megs, I archive all that stuff out to Parquet files, which have insane compression on them. And that helps to keep the, helps to control the size of the database over time. The Parquet files are very, very well compressed. It’s a very cool little thing that goes on in there.

So like I said, it’s got just about all the same tabs and whatnot that you would expect to see in the full dashboard. I tried to replicate as much of it as possible. But there are a few things that I’m just not quite keen to do, like run SP health parser to look at the system health extended event. And what’s the other one, I don’t set up a custom trace in this one. So there are a couple things that I left out of light just to make it a little bit lighter. But hopefully, the set of things that I collect in here is useful. enough for you that it sort of makes up for the few things that are missing. So like I said, there’s no system events tab. That’s SP health parse or looking at the system health event. I don’t look too much at the default trace stuff. But there are some very cool things that I do have in here that I think make this a unique sort of proposition. One is Azure SQL database support. The full dashboard obviously doesn’t work there because it creates a database and store procedures and all the other stuff. So it just can’t function with Azure SQL database. Whereas the light one absolutely can. But they both have the MCP servers built in those are opt in only so if you if you want to have one of your robot friends talk to you your performance data and just your performance data, you can have a chat away with the light database and that I think that’s even kind of in a way better than the full dashboard because it is confined to what has been collected in DuckDB in the parquet files and it doesn’t have to go like run any queries against the database where the data lives. Right. So like with the full dashboard, you have the performance monitor database and that’s collecting stuff. And the MCP servers talk to just that. But this is just talking to DuckDB wherever it lives. So that’s kind of neat. It has the same alerts and email notifications system tray stuff like that.

CSV export dark theme all the things that I have in there. My dear contributor Claudio added some light theme support recently. I haven’t used light themes yet because you know I’m a dark mode kind of guy but it’s cool that they’re in there just in case you want them. So DuckDB, I chose DuckDB because it’s an embedded analytics database. I don’t need a separate server process. It lives inside its own executable. There’s no external dependencies and it is by default columnar storage. So it’s a So all of the stuff that I put in there, all the queries are very fast against it. And like when they go out to parquet, there’s like almost no hit on query performance.

When it, when my date, when the, when the, my, my DuckDB, when your DuckDB database gets to be about 512 megs, I bump everything out to parquet files and like just sort of empty the tables out and start over again. Uh, and the compression on those parquet files is amazing. Uh, the footprint on them is very, very small. Um, it’s all very portable. Uh, you can, you know, take it wherever you want. And, uh, there’s no SQL Server dependency for storing historical data. It’s all captured within the executable.

So, uh, I’m going to show you a few things poking around the, um, the, uh, the light dashboard. Uh, this is primarily aimed at, um, you know, just anyone. I mean, it says consultants, but if you can, if you’re not allowed to install stuff on client servers, um, or like, you know, your servers at work and you don’t want to like have a database sitting on there collecting stuff. This is really perfect. Just a way to like spin it up, collect, collect a whole bunch of great performance metrics.

And especially if you’re on Azure SQL database, and this is the one that supports it. Uh, but you know, you can literally run it from anywhere. Uh, if you’re in a locked down environment, we really needs minimal permissions for stuff. So it makes life a lot easier. Um, uh, that’s not what I wanted. All right. Let’s go to the light dashboard. There we go. All right. So this is connected to SQL Server 2022.

Uh, like I said, it has just about all the stuff that you would want a dashboard to collect. Uh, you start off on the wait stats tab. Um, over here, we’ve got the list of servers and their sort of general, um, you know, presence in the world. Uh, we’ve got an overview tab showing all the usual stuff where, uh, you know, what CPU is, you know, if there’s any blocking or deadlocking, what memory’s up to, uh, if there are any alerts that have been fired off.

I don’t think I’ve had any alerts recently. Um, I’m working on this FinOps tab because, uh, a lot of people have been like sort of behind the scenes asking me about this because apparently a lot of managers and bosses really care about this. So I’m trying to get this stuff going, uh, so that, um, I can sort of make, uh, on top of it being a free monitoring tool, uh, it can be something that points you to where you can save money, especially with Azure SQL database cloud stuff. Um, if things are, you know, over provisioned, under provisioned, all that.

So, uh, I’m working on this, uh, it’s a work in progress. Uh, it’ll be out soon, but it’s, you know, gonna be a few versions before I think it really finds its footing, but I got to start somewhere so I can get some feedback and start getting some, uh, real sort of real, um, you know, scenarios from it other than what’s going on in my local test servers. But just, just to make sure it all works, we get this stuff first, but anyway, we get weight stats.

Uh, again, it’s all the weights that, uh, we, that we look at in the same way as the full dashboard. So like your top 10 weights, plus the sort of usual suspects, plus the poison weights. So if anything terrible is happening, you can see it pretty quickly.

Uh, under the queries tabs, again, I’m such a developer. I don’t have any stored procedures running, but, uh, you know, you sort of get a look at query duration from, uh, the plan cache, from query store, and then, uh, executions is what’s behind me.

Um, you can get an active queries tab. This is not run SP who is active. This just runs something. This just runs a simple script from, uh, SP pressure detector, uh, to get a whole bunch of information about what’s currently running on the server.

Uh, there’s a top queries by duration. And, you know, just like with everything else, um, you know, you can grab the execution plans, you can see the queries, you can open them in the plan viewer. Um, you know, everything that you would come to expect from other, other, other parts of the tool.

And then there’s the same thing for query store. Um, the plan viewer, I’ve talked about that in a few videos. Um, we just say view plan that pops up in plan viewer and you get, you know, sort of, you get the, this is just the estimated execution plan for this, which is not terribly interesting, but you know, that’s okay.

Uh, estimated plans are not meant for real, real use anyway. Uh, we have a CPU tab that will show, uh, CPU usage, um, sort of, you know, over time, just, you know, same thing as in full. Uh, we have a memory usage tab up here, uh, where we have the overview.

We have memory clerks. We have memory grants, uh, file IO, uh, broken down, uh, first by latency, right? So we have read latency and write latency, and then we have throughput numbers.

So, uh, latency is time and throughput is like data. So these, these are all measured in megabytes and you can see like over time, just sort of, you know, when things spiked up and when you might want to take a look at things. Uh, we have a temp DB tab, of course, uh, we have blocking and deadlocking and all the other stuff.

Uh, I don’t think there’s really been any on this system lately. We have some lock weights, but nothing has really been getting jammed up. Uh, so this section is not going to be terribly interesting right here, but, you know, on your systems, it might be much more interesting.

Uh, we collect perfmon counters, just like in the full dashboard. Uh, we collect information about running jobs. Uh, I tell you about your server, database, server and database configuration, database scope configurations, and the active trace flags.

Uh, sort of a daily summary of what’s been going on on the server. For this server, it’s been pretty quiet today because I’ve been working on other stuff and, you know, trying to get ready to record these videos. And then finally, there is a collection health tab and the collection health tab is nice because this will tell you, um, if, uh, anything has been going wrong with, um, your servers.

When thing, when, when the collector has been running now, you’ll notice that there are a bunch of errors in here. And that’s because I had a wifi blip earlier and everything went weird for a minute. So the, but the wifi blip has since been, uh, since been fixed.

So you can just ignore these errors for now. Uh, there’s also a collection log that has sort of like, um, more specific breakdown of, um, like when things failed and all that other stuff. And then, uh, a duration trends tab, which will tell you, uh, which collectors, uh, sort of take the longest.

And if you look at this one here, apparently our query store query could use a bit of tuning. Cause at times it takes 1.5 seconds to grab that query store data. So I’ll be having a look at that when I’m done recording this video.

Anyway, this is just a short tour of the light dashboard. Uh, we’re going to go more in depth on, uh, other parts of this over the next couple of videos. But, uh, anyway, um, again, this is all at code.erikdarling.com.

Let me get to a slide where that actually shows up. There we go. That’s wonderful. Uh, code.erikdarling.com. Uh, it’s totally free open source.

Uh, it doesn’t do anything weird or funky. Um, you know, no requirements on email, no phone home. No telemetry, nothing like that. Uh, just wanted to put something kind of cool out there in the world to help you find folks, uh, with your SQL Server performance tuning efforts.

Anyway, thank you for watching and I will see you over in tomorrow’s video. All right. Goodbye.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Introducing Performance Studio Free SQL Server Query Plan Analysis Tool

Introducing Performance Studio Free SQL Server Query Plan Analysis Tool


Summary

In this video, I introduce my latest tool, Performance Studio, which I’ve been working on and refining for a while now. Performance Studio is a powerful, free tool designed to help SQL Server professionals analyze and optimize their queries more effectively. I walk through some of its key features, including a query editor that allows you to run arbitrary queries against a SQL Server, and an execution plan viewer that provides detailed insights into query performance. I also highlight the integration with SQL Server’s Query Store, which helps track and analyze query performance over time. Throughout the video, I demonstrate how Performance Studio surfaces important information like runtime summaries, weight stats, and parallelism effectiveness, making it easier to identify areas for optimization. I encourage viewers to check out the tool on my GitHub repo at code.erikdarling.com and provide feedback to help it continue to evolve.

Chapters

Full Transcript

Erik Darling here with Darling Data, and in today’s video we are going to debut on YouTube, it’s been debuted in other venues for a little bit now, my new tool called Performance Studio. Now, this sort of spawned from the plan viewer that I added to the monitoring tool stuff, but I didn’t want to add all this stuff in there. I think that sort of focused, and it seemed like this would be a nice sort of spiritual successor to Plan Explorer, which has been sort of left on the shelf for a very long time, and has not seen a lot of investment or work or new development. So, we’re going to go through some of the sort of top-level features here today. Hopefully this will convince you to go get it. Now, this is all available on my GitHub repo. It’s at code.erikdarling.com. So, if you’re interested in this, you can go there, check out the releases section, and you will find the latest and greatest from what I have here. Now, the first thing that you’ll see usually when you open this is a query editor. I do allow you to run arbitrary queries against a SQL Server of your choice, so don’t do anything here that you wouldn’t do in SSMS. You run the query, it’ll discard all the results and just give you an execution plan back. You can get either an estimated or an actual plan, depending on what your needs are at the moment. But here we have a simple query, just select, just connected to Stack Overflow 2013. If you get an actual plan, we get this back. Some of the stuff that I surface, you know, sort of top-level is, you know, things that hopefully help you figure out if the query in front of you is, you know, worth dealing with. So, you know, it gives you some server context, gives you a runtime summary, and over here it’ll show you weight stats for the query in front of you.

So, you can sort of graph those out. There’s not a lot too much, there’s not too much interesting in this plan, so I’m not going to spend too much time here. We have some other interesting plans loaded up that have a little bit more to them, so we’ll do that. Another thing that’s built into this is query store integration. So, you can connect this. Right now, I just have it set up to grab the top, you know, like some number of plans, some number of hours back from query store, and you can either load all those selected plans into new sort of plan view or windows, or if you just right click, you can go grab one and say load plan, and it will bring the plan from query store up in there.

So, we do have a little bit of stuff in there. On the query store thing, I am going to add a bit more to that. Some filtering mechanisms, query ID, plan ID, procedure name, query hashes, and whatnot. Stuff that you would expect to see in my store procedure SP quickie store, to sort of help you locate the query plans that you care about. But, I have some actual plans with some actually interesting things in them.

And before I get too far along in this, I want to show you some of the buttons. So, we have this bad time SQL plan up here, and if I hit human advice, I’ll get this tab that comes up, and this tab will give me a sort of human readable analysis of what went on in the query plan, things that you might care about, you know, how long different operators ran for. One thing I’m particularly proud of in this is that, and it’s probably not spot on perfect just yet, there is going to be some edge cases and things that I need to deal with, but in Management Studio, we’re in row mode, and if you get an actual execution plan, the operator times are cumulative going from right to left.

I’ve done my best to sort that out, so you just see the per operator times like you would with batch mode operations. But, down in here, you just, you know, you get a, this thing, you get a breakdown of, you know, operators in the plan that ran for a long time, you get information about weight stats for the plan, assuming that, you know, you’ve run the plan, you’ve gotten the plan in a way that collects those things, and then we’ll give, then I break down all the plan and operator warnings, and if there are any missing indexes, we’ll surface those. One feature I’m working on right now is to, if we have an actual plan, I’m going to be getting rid of that silly impact percent number, and I’m going to replace it and try to match the missing index request to the operator that it happened on, and sort of give you the operator CPU and duration for that.

That’s another thing that I added to this, that SSMS is sadly lacking, is if you look over here, you know, SSMS only shows you the duration number. I’m also showing the CPU number for this, so you can not only see how long something took, but you can also see how much CPU it burned. So there’s a lot of good stuff in here that, you know, I think, surfacing extra good information about the query that ran, so that you can get sort of a good, sort of high-level visual, like, understanding of where you need to focus.

Up in the runtime summary for this one, I got a little tooltip on that, up in the runtime summary for this one, there are some things in here that I’m going to show you in other plans where it’ll change, but, you know, stuff that I wish I kind of knew about when I was looking at a plan, you know, 24 of 32 threads used, we had 8 inactive CPU threads on this. We got granted 1.89 gigs, and it calls it out explicitly, like, hey, we only use 4% of our memory grant.

I’m going to probably tweak that rule a little bit for, but I just wanted to start with, like, a 1-gig grant to get started. But then there’s this other thing, and actually, let me show you this in a different query in a minute. But the parallelism effectiveness is another cool sort of thing that I look at.

And a lot of, some of my videos where I talk about, like, parallel plans that actually just run on a single thread, I actually measure stuff out to give you warnings about that if you run into this situation. If you hover over the select or the root operator in the plan, you’ll get a sort of high-level overview of all the sort of warnings that got called out in the plan.

Stuff like, you have a large memory grant, apparently, and an excessive memory grant. We currently have two giant forms of memory grant in this. Scans with predicates, so indicating that, you know, maybe an index would help you out somewhere.

Things like that. And, you know what, let’s go to some other plans in here that have other stuff in them. Oh, wait, I was promising to show you some buttons. Under robot advice, if you want to paste this to the robot of your choice, there is a JSON schema of all the stuff that ended up in the human advice.

And there’s also two buttons over here. One to copy a repro script. So if we hit that and we paste that in here, you’ll see that, like, we got the full query so that we could run this, or we could edit this query and try to tune it.

And also, if we hit just run repro script, then this thing would run and, well, we have to get it connected first. But if we run this, then we’ll go get an actual execution plan for it. But while we’re waiting on that thing, right, because who knows how long that’ll take, there are some other things in here that are useful for us.

So where I was talking about the sort of parallelism effectiveness, note this thing ran at DOP 8, right? But the CPU was only about 3x or a little bit more than 3x the elapsed time. So DOP 8 was not terribly effective, right?

We had a lot of parallelism, but not a lot of parallelism helped, right? Because in an ideal world, elapsed time, or rather CPU should be like, if it ran at DOP 8, elapsed time should be like 8x, or rather CPU time should be like 8x elapsed time. So we did not quite get there, right?

We only got to about 3x. So, I mean, not saying this plan shouldn’t have gone parallel, but maybe like even like DOP 4 would have been just about the same there. Some other neat things that go on in here.

You know what, let me get to this one. Something that I have wished SSMS would yell about a lot more loudly is when you have a many-to-many merge join that actually uses a work table and does a bunch of stuff, because this is not a good time. This is a real performance nightmare.

So this is one thing that I decided I really wanted to show you in query plans. If we look at the parallel skewed plan, this is another thing that I wish SSMS would have been more loud about. But if we hover over this index seek, we’ll see this warning here.

Ah, there we go. Parallel skew. I’m all in the way. Let’s see if we can get that to a slightly better place on the screen. All right.

There we go. So parallel skew. Come on. Zoom it. Cooperate with me, would you? All right. Parallel skew. 100% of the rows. All right. All went to one thread.

So if this happens in a parallel plan, I will be loud and clear about it. So another nice thing that you can use to sort of figure out where things are going wrong in a plan. Some other stuff that I wish SSMS was louder about.

Really surfacing when we have a non-parallel plan reason. So this will tell you if it can exactly why a query was not allowed to go parallel. And then if we hover over the clustered index insert right here, which is to a table variable, we’ll see this lovely warning down at the bottom here that says table variable.

Modifying a table variable forces the entire plan to run single threaded, replaced with the temp table to allow parallel execution. All right.

So there’s all sorts of helpful stuff for you as query tuners to just very quickly get a lot of information without having to do a lot of digging into query plans. Now, the last thing that I’m going to show you here today is just a quick plan comparison thing. So if we hover, if we, rather, if we select this query plan and we hit compare plans and we change this to do-do-do-do and you and we hit compare, rather than have like a big complicated like picture of what’s different, this just breaks down all the stuff that changed in a plan and got better.

Right. So, you know, runtime went and CPU time went from like two and a half seconds to 639 milliseconds. And, you know, you get, you know, sort of good information about what improved in a query.

So, you know, that’s like a nice textual breakdown to give you like, oh, when I did this, when I made these changes, here’s how the plan got better. Anyway, this is, you know, this hit v1.1.

So there’s still a lot of work that I’m going to be doing on this alongside the work that I’m doing on the performance monitoring tools. Hopefully you will try these things out.

So stick around and see what I have in store for them. I’ve got a lot of neat stuff that I want to do, but I was rather proud of where I got this to in a pretty quick set of time. So, you know, I wanted to get it out there in the world and hopefully get some people trying it and giving me feedback and, you know, or maybe just telling me that it’s cool or, I don’t know, maybe telling me that it sucks.

I don’t care. The lovely thing about free software, it costs just as much to use it as it does to not use it. So there you go. Anyway, thank you for watching.

I hope you enjoyed yourselves. I hope you learned something. I hope you’ll try this tool out. Again, it’s at my GitHub repo. If you want to get there, it’s code.erikdarling.com. And if you run into any issues or problems, questions, anything like that, that’s the place to leave me feedback.

Thank you for watching. And I will see you over in tomorrow’s video where I suppose we will talk more about SQL Server performance monitoring issues. All right.

Thank you.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Free SQL Server Query Plan Analysis, Right In Your Browser

Free SQL Server Query Plan Analysis, Right In Your Browser

Free SQL Server Query Plan Analysis
Free SQL Server Query Plan Analysis

One of my goals when I first started putting together the Performance Studio application was to make it available as a standalone web tool.

It turned out to be way easier than I thought it was going to be, and now you can use it here.

You can either paste plan XML, or upload an execution plan, and all of the analysis that would take place in Performance Studio happens client-side in you browser.

Why? Because it seemed like a nice thing to do for the community. Not everyone can download and run new applications when they want, but they should be able to get quality plan analysis done when they need it.

Client Side?


Yes, client side. By default, I didn’t want any plans to get uploaded to me.

I don’t need to hoard XML, and I don’t want to get random emails asking me to delete or remove things.

This was built to work without having to store files anywhere. Yay.

Consent screen
Consent screen

You can even export the HTML results of the analysis to use in whatever HTML-y ways you want. I don’t care.

There is an option to share, but you have to choose it, consent to it, and choose a TTL on the plan (1 day to 1 year, with some stops in between). There’s even a delete button, in case you go through all that blissfully unaware of your actions. In which case, you should probably unplug.

But anyway, the whole point is that your query plan doesn’t go anywhere unless you want it to. It does not get fed into AI anywhere, nor does it persist anywhere after you close your browser tab. And it sure as heck doesn’t end up on my server.

Do You Collect Anything?


Only some vanity metrics about page views and plans shared. But really just the bare minimum to see how much people are using it.

I have no use for your IP address, to feed you cookies, or to know where you’re visiting from geographically.

If you choose to share a plan with my website, your plan gets stored in a pretty well locked down cloud hosted environment, and only for as long as you choose to leave it there. I’m not, like, daring hackers to try to hack me, but I’ve taken all the precautions I can to make sure your XML stays safe as long as I have it.

Your plans are also not really discoverable in any real way, unless and until you share the link to them. The URLs are all randomly generated, and I don’t expose shared plans in any way. I don’t even look at them, because it’s sort of a pain in the butt for me to look at them.

The next time you need a query plan analyzed, try it out.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Trying Something New With sp_QuickieStore: 80/20 Query Scoring Across Multiple Metrics

Trying Something New With sp_QuickieStore: 80/20 Query Scoring Across Multiple Metrics


Summary

In this video, I dive into an old-school approach to identifying SQL Server performance issues using SP Quickie Store and a novel method inspired by the Pomodoro technique. Traditionally, Query Store surfaces queries that consumed a lot of CPU over the last seven days, but often these results are too broad for practical use. To address this, I’ve developed a multi-dimensional scoring system that evaluates queries based on their impact across several key metrics: CPU usage, duration, physical reads, writes, and executions. This approach helps pinpoint the most problematic queries more accurately, even when they run outside of typical working hours or are unparameterized. By sharing these insights, I hope to provide a practical tool for SQL Server administrators looking to optimize their databases without relying solely on modern monitoring tools.

Chapters

Full Transcript

Erik Darling here with Darling Data, taking a break from my monitoring tool mogul personhood to talk about some old-fashioned stuff, you know, some old-fashioned ways of finding SQL Server problems that isn’t my new fancy free open source, better than all the paid monitoring tools, monitoring tool. And we’re going to talk about doing something a little bit different with SP Quickie Store. So let me start by sort of defining the problem I want to solve, and then tell you and then show you a little bit about how I chose to go about solving it. So when I look at Query Store traditionally, and by default, SP Quickie Store looks at data from the last seven days and sorts by average CPU descending, which does have a great way of surfacing things that burned a lot of CPU.

Often when you find those things and show them to people, though, they say, well, that runs at night, it doesn’t matter. And at first, I tried to cope with that by adding in a few parameters, like say, workdays, when set to one that will, by default, look for everything between 9am and 5pm on weekdays only, and show you that. Of course, of course, the hours, work start and work end, serve to define your work day to your needs. But still, you needed a lot of other sort of filtering in order to come to an agreement on which queries mattered.

For example, some queries would have a, you know, would still execute during the day, and would burn a lot of CPU on average, but people would say things like, well, it appears to have only executed once or twice. Or they would say things like, oh, you know, that’s not user facing, I don’t know, like, you know, it’s a lot of stuff that would come up and be like, okay, well, you know, we can filter, we can say, set a minimum number of executions. What makes sense? How many executions would you start to care about? I don’t know. And then, you know, there’s all sorts of other things that sort of act as traps in there, like to say, unparameterized queries, where they generate a new query ID for every new literal value that gets passed in.

And so it looks like they like something is maybe executed, like, you know, a few times, maybe, but there’s real but like, if you look at like the aggregate for that query hash, which is something I added in also was include query hash totals. Well, you know, I think you sort of get the point, it still was really hard to find, like, what is really stressing the server out. So the way that I chose to go about trying to fix this now is by sort of doing like the Pomodo method, but with a matrix.

So the whole theory that, you know, 80% of your blank is, you know, caused by 20% of your blank rule of things, right? Like the 80-20 rule effectively, right? Except I want, except, you know, if you do that across just one resource, you’re like, well, okay, CPU, fine, right? So I chose to use a group of resources that I care about, which are CPU, duration, physical reads, not logical reads, logical reads are for people using SQL Server 2008, we’re not there anymore, writes, and executions.

So I’m taking those five metrics, and I’m finding things that have a meaningful impact on your server across those. So that’s what we’re going to talk about today, and I’m going to show you kind of what that looks like. Down in the video description, oh boy, all sorts of useful links, you can hire me for consulting still, you know, before the robots put every independent consultant out of business, you can still talk to me.

You can purchase my training before, you know, sort of get the human angle on all this stuff, not just the, you’re absolutely right, LLM angle on any questions you might have. You can become a supporting member of the channel, wouldn’t that be just grand if you gave me about $4 a month to, you know, pay just the tax on a cappuccino. You can also, for free, ask me office hours questions, and even more for free, like, subscribe, and assuming that you’ve got any friends who are still living people and not large language models, well, you can tell them about the channel and maybe they can subscribe too.

So that’d be, wouldn’t that be just fantastic? We could all be friends. Speaking of friends, I’m your friend. And because I’m your friend, and I want you to be happy, and I want your life to be better, I have made a free open source SQL Server performance monitoring tool. No complicated installations, no licensing, no email needed, no phone home telemetry, no nothing.

Just a bunch of T-SQL collectors running on a schedule, populating dashboards, basically all the stuff that I would look at as a consultant if I were trying to keep a SQL Server happy from a performance point of view. And also, the thing that I think is pretty neat is that I have built in to the monitoring tool, it is opt-in, of course, it is not monitored by default, MCP servers, so if you want to have your robot friends look at just your performance data, only the collected performance data, you want to ask them questions specifically about that nicely aggregated, collected, you know, gentrified data that I get for you, you can do that, and they can summarize very nicely all the stuff that has happened with your server that is making it unhappy.

I will be out and about in the world, I guess by the time this goes live, I forgot to take off DataTune Nashville, but I like DataTune Nashville, maybe there will be one next year and I’ll go back for that. So, if there is, DataTune Nashville, that’s a great conference. Data Saturday Chicago coming up, well, when this publishes, it’ll be this week.

SQL Day Poland in May, woohoo, and Data Saturday Croatia also, woohoo, June 12th and 13th. And if this slide will go on, you will see that I have at least updated the slide now that we have left the murderous winter, we have survived this Donner Party winter, and we are now entering spring, and the hills are once again alive with the sound of database music, which is quite harmonious if you ask me.

Anyway, let’s go do this thing. Let’s go over to SQL Server Management Studio, something that we haven’t seen in one of these videos in a little bit, not since the Get AI Ready with Erik series, a triumphant series of videos. So, let’s talk about these results a little bit. So, before I even do this, what I want to point out here is that if you use the at help parameter in SP Quickie Store, what you’ll get is in the results, this text right here.

So, like, I made sure that this all had a very good explanation so that it would be durable for you to refer to should you choose to use the new Find High Impact parameter and look at Query Store in this way. So, we’ve got, you know, sort of like some normal stuff that you would expect to see, database name, start date and end date, that’s the window of time that we looked at. And then the sort of primary window where things all come up will tell you sort of when the query runs most, right?

So, like, I just chose to sort of separate the day in two. So, 7 a.m. to 7 p.m. is working, 7 p.m. to 7 a.m. is off hours. I also have some stuff in there for weekends and I also have some designations in there for things whose execution profile sort of spans everything.

Well, I’ll show you what that looks like in a moment. Then we have some of the normal identifying characteristics you would expect to see about a query. Things like the object name, the SQL text, the query plan, the top weights if they’re available for it, the query hash.

And something that I want to point out here is that Query Store has query hashes in it. And I chose to use those as identifiers instead of query IDs for the sort of, because of what I, the problem that I was talking about earlier in the video where, you know, if you have unparameterized queries or something, you might not look like these are significant when in fact they are significant.

So we’re doing this by query hash. I do show you the query IDs and query plans that those query hashes are associated with. But, but like my default grouping is query hash.

There’s also a count of how many distinct query IDs that hash generated, which gives you a good idea if the code is unparameterized. You might want to look at it that way. And of course, a count of the plans.

Then there is an impact score, which is a number from zero to one. So the closer this number is to one, sort of like with the, actually am I, sort of like the reverse of the vector search stuff, or the semantic search stuff that I was talking about, which closer to zero was better.

I guess closer to zero means your query is less impactful. So that is good. But closer to one means you have something very impactful about this query. And that’s, and that score is generated by effectively getting the average percent rank across all of the active dimensions that we’re looking at.

And then there’s also stuff like total executions. So tell you how many, showing how many times that query has executed. And then what its share of CPU duration, physical reads, writes, and memory are.

I think I forgot to mention memory before, but that’s okay. I’m mentioning it now because I see it and we’re good. So after that, we then have some diagnostics, which is sort of a rule-based layer that is layered on top of the score, which kind of tells you like some stuff that we found that I found while looking at your query sort of execution patterns, not like the query plan or anything, just the, like the way it runs and stuff.

There’s also the, I also include weight stats for the query. So like, so like what we look at is, you know, like any like blocking resource contention IO. We try to figure out if it’s parameter sensitive, right?

If there’s plan instability. And then if like, you know, like, let’s say that if we have a select query that runs and it’s causing writes, then we might say, Hey, this thing might be spilling, or this thing might have spools in it that you could fix and take a look at.

And then we also flag like metrics by volatility. So if there are really, really big swings in the minimum or maximums for things compared to the averages, which can be signs of either parameters, like parameter sensitivity or something, then that means that you might have a situation like that, that you need to take a look at, right?

Cause it’s sometimes this query is very fast. Sometimes this query is very slow, but like, but like it might look okay on average, but it’s got a wide swing to it.

So what this ends up looking like in a result form, and you’ll have to sort of excuse that a lot of this stuff says off hours because, you know, my, I can’t remember how many times I’ve said this now, but all my, all my windows VMs and SQL servers installed on are in Pacific time.

And it seems like no matter when I run a workload, it’s always off hours. And maybe it’s just a California thing. They’re all on vacation. But this looks like this. This is the, of course, the database name that we ran it in.

Uh, I I’ve been using hammer to be a little bit more recently because I can just like run a workload and it causes a whole bunch of stuff to happen, right? Like high CPU blocking deadlocks.

And I can just come back and say, Oh, cool. You did all this stuff without me having to write like a synthetic stack overflow workload, which would be annoying and time consuming. And not something I really want to do right now with my time.

I got a lot of other stuff going on. Um, so then you see like the period that we ran this for. And then the, this is where we start getting into the stuff about the actual query. So, um, you see stuff that executed during business hours.

Again, this would be the 7 a.m. to 7 p.m. Right now this is hard coded, but well, you know, I’ll, I’ll, I’ll open this up a little bit more. You know, people start making requests about it.

Um, off hours would mean 7 p.m. to 7 a.m. And then down here a little bit, we have one that actually says there’s actually spread. And that means that it is spread across both off hours and business hours.

Um, if there were this, any of this were coming from a store procedure, that name would be there. We have the query text and then we have the query plan. Um, we have all of the aggregated weights for that query.

So you can kind of get a sense of like what that thing is up to. Um, again, these come from query source. So they’re the high level bucketed weights. They’re not the granular weights that you might be useful, used to, uh, if you’re looking at weight stats.

And then down here we have, um, so like, oh, like just, uh, you know, a couple of examples. I know it’s a little cut off on over my head, but that’s okay. Uh, so we have, so again, we’re grouped by query hash.

Because as you can see from all, just about all of these lines, one query hash can spawn a whole bunch of query IDs. If we were to do this by query ID, we would have a far less good view of which queries were most impactful in the aggregate, right?

Because like, look, there’s like seven different query IDs. It’d be like, oh, well, these all look fine. What could be going wrong here? Uh, and then we have the list of plan IDs, right? So, uh, you know, all kind of nice things to have in front of you.

Uh, where, you know, it’s like sort of interesting stuff where two different query IDs, like the group to the same hash only have one query plan ID, which is strange and bizarre. But, uh, then coming over this way, this is where we start to get into, um, uh, like which queries are most impactful.

So impact score, again, the close risk, I screen out anything that’s less than 0.5. Cause that’s where like, that’s where like stuff gets like pretty insignificant. So this is all, every impact score you see is going to be from 0.50, uh, and one, right?

With one being the highest you can get. So the closer this gets to one, that means it’s, that means it’s like, like this query isn’t like the 98th percentile of how much stuff it does, right?

So, um, like this one, like, you know, across, um, like all of these metrics, like all of this, all of this can sit, all of this like contributes to the, the general score of things. And then, you know, coming over a little bit further, um, you know, this is where the diagnostics come in.

Uh, the TPCC database is actually a little bit more interesting in here. So let’s switch over to that and let’s just run this for that one. So, uh, this, so this one’s a little bit different, right?

I mean, like still it’s all off hours cause stupid West Coast VMs, but we actually get a bunch of store procedure names in here. And we get slightly more interesting numbers.

I think anyway, um, over in this section. So if you look at just this top query where the impact score is 0.91%, and you kind of look across like, you know, that, holy cow, that’s a lot of executions.

Uh, wow. Uh, that even took me by surprise. Uh, you can see that, you know, this query for this database, uh, for the timeframe that we looked at was 18% of the CPU, uh, 1.3% of the duration, 3% of the physical reads, 54.2% of the writes, and 28.6% of the executions.

So like this query is contributing a lot of load to the server. This is a query that you would want to look at. Now, based on like the execution count and all that, you know, like it’s unlikely that you could make this query a lot faster, but this is a, this is a query that you might want to look at and say like, Hey, like, do we need to run this this much?

Can we cache this somewhere? You know, like the usual, I mean, maybe it is something you could tune in, like, you know, make fat, like make fat. You’re not going to make it execute less, right? You might could probably bring down the CPU share.

You could probably bring down like some of the other numbers and make it less significant. But it also might be a query that you want to think about not running as much because that is a bonkers number of executions. And then over here, not really any diagnostics, but we get some volatility, volatility metrics in here, where sometimes the duration is 707 times the average.

And then sometimes the physical reads are 93,000 times the average. And sometimes this one’s duration is almost 11,000 times the average. So this is just a different way of sort of looking at the queries that run in query store, scoring them across a bunch of metrics and figuring out which ones are contributing the most across all of those, not just to one.

Because oftentimes people are not just concerned about one thing. They’re like, well, CPU is high and, you know, we’re constantly going to disk and, you know, like problems, chocolate, right? So like this is a good way for you to start figuring out like as a whole, like which queries are contributing across a variety of metrics so you can attack things that are most meaningful across all of those.

Anyway, this is right now in the dev branch for my GitHub repo. That is code.erikdarling.com. Shockingly, this is also the same repo that my monitoring tool is in.

So I’m just going to bring that up. If you go to code.erikdarling.com, it’s going to be in the Darling data repo. This is, of course, SP Quickie store.

If you can go check that out. Again, it’s on the dev branch. This is not live on the main branch yet. But after I have let this soak in a little bit and I’ve had a chance to kick it around a little bit more, it will make its way over there.

So again, just a different way of sort of looking at query performance, a different way of finding queries that are meaningful to go after. I think it’s pretty cool.

Hopefully you do too. Hopefully it helps you find queries that are meaningful on your servers in ways that, you know, it’s like any other tool you look at to sort of get this information from. It’s always just like you sort by one metric, right?

And even if you sort it by two metrics, like that one metric sort of dominates and sends a much higher signal. The other metrics sort of like wishy-washy, right?

So it’s like just to sort of talk about the problem in the opposite way. Like, you know, like I said, by default, Quickie store orders by average CPU. If you flip that and you say, I’m going to order by executions, you find something executes a lot, but it’s like stuff that’s going to have like zero millisecond, like CPU and duration because it’s like very fast code that’s not doing anything.

So you’re still finding a problem, but you’re not finding sort of things that are causing like more than one problem at once, right? So this is just one.

This is a good way to not just focus on one metric or one problem. This is a great way to look at your queries across a bunch of metrics that will cause you problems. Anyway, thank you for watching.

I hope you enjoyed yourselves. I hope you learned something. I hope you’ll check this thing out. That’s spquickiestore in my GitHub repo right there. Give it a shot. Let me know how that goes. And I don’t know. See you on the main branch someday soon.

All right. Thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

SQL Server Performance Office Hours Episode 55

SQL Server Performance Office Hours Episode 55



To ask your questions, head over here.

Summary

In this video, I dive into some common questions and issues faced in SQL Server performance tuning, particularly focusing on scalar functions causing lengthy execution times and the challenges of using extended events for monitoring. I also share insights on parameter sniffing and the limitations of the parameter sensitive plan optimization feature. Additionally, I discuss my free open-source SQL Server Monitoring tool, emphasizing its value as a cost-effective alternative to commercial tools. The video wraps up with an overview of upcoming events like Data Saturday Chicago and SQL Day Poland, inviting attendees to connect and collaborate in person. Lastly, I touch on the ongoing development of AI within SQL Server and Azure, sharing some use case scenarios from proof-of-concept projects.

Chapters

  • *00:00:00* – Introduction and Free SQL Server Performance Monitoring Tool Overview
  • *00:05:29* – Optimized Locking in SQL Server 2025
  • *00:14:18* – Conclusion and Future Plans

Full Transcript

Erik Darling here with Darling Data, and it is, oh, quite a Monday, isn’t it? I suppose we’re all going to have to deal with this in our own special way. But anyway, that means it’s time for Office Out. That’s how I’m going to deal with it. Answer some questions, because people seem to keep having them, and I seem to keep having answers, so I don’t know, maybe let’s keep this gravy train rolling. He-he! Alright. Down in the video descriptions. Well, actually, there’s only one of them. I don’t know why I pluralized that, but there’s all sorts of helpful links. If you would like to hire me for consulting, purchase my training, become a supporting with money member of the channel, you can do that. You’ll also find links to continue to ask me office hours questions, so this gravy train will stay on the tracks. And as always, if you enjoy this content in any way, shape, or form, and you feel that it may enrich the lives of those around you, please do like, subscribe, tell a friend, all that good stuff, so that I can continue racking up these ground-breaking, record-setting view and subscriber counts.

I’m hot on the heels of that Amiga Repair channel. I swear, one of these days, so close. I’m going to overtake that thing. I also have, I mean, semi-recently, free SQL Server Monitoring. You ever want to have performance monitoring for your SQL Server, but, you know, all the other paid tools out there kind of suck, and, you know, aren’t worth the money, and, you know, I’ll probably be in jail.

Well, I got a free one. It is free. It is open source. There’s no email sign-up.

I’m not phoning home with any information from you. It’s just a bunch of T-SQL collectors running on a schedule, getting the most important performance metrics that your server has to offer. All sorts of stuff that you would want to see in a monitoring tool, nicely grouped together, colorful charts and graphs, help you tell a nice story about what’s going on.

Weight stats, blocking, deadlocks, top queries, all sorts of, you know, cool internally stuff that fancy-pants consultants like me have figured out how to collect data for over the years in a meaningful way. Nox-style dashboard, so you can see what’s up, what’s down, what’s hot, what’s not. And if you are, you know, in favor of our new robot overlords, there are also a number of built-in MCP tools so that you can ask questions of your performance data.

And just the collected performance data, nothing else. And you can have LLMs kind of tell you a little bit about what’s going on, where and when with your performance. You can just talk to your data naturally, which is, I think, kind of cool.

And I don’t think anyone else is really doing that. So, if you want to get that for free, it’s all right there. On my GitHub repo, code.erikdarling.com, or if you are bold and brave enough to remember that link or just click the link down in the video description, you can find all that stuff.

Oh, well, you know what? Data 2 Nashville already happened, which means Data Saturday Chicago is this weekend. And, well, that certainly snuck up on us, didn’t I?

I feel like it was just yesterday I was saying February is never going to end. So, now I got to make a March picture. I’m going to have a database.

Databases are going to play baseball this March, I think. But I also will have SQL Day Poland coming up May 11th and 13th. And Data Saturday Croatia coming up June 12th and 13th. So, you know, if you are in Chicago, Poland, or Croatia, stop by and see me.

I would love to see you. Maybe we’ll get along. Maybe we’ll be best friends. You never know. I could use more friends. But, yeah, this is February, February. It’s done.

It’s done. I got to fix this one. All right. I’ll put that on my to-do list. February, over. All right? No more of this. This was a terrible winter. I hated it.

All of it. Anyway, we have questions which have been asked. And so we have questions which will be answered here. Let’s see what the first one we got.

Recently encountered a query that used scalar functions, which was taking 50 minutes to complete. Not an unnatural state of things, from my experience.

After hours of aimless clicking. Well, don’t click aimlessly. Click with purpose and meaning. Why would you aimlessly click? I trace the issue back to an extended event.

Is there a specific metric which I could have used to indicate performance issues from extended events? So I think within the query that you were running itself, likely not. You may have noticed either a session reading from an extended event, if someone were actively monitoring it.

Or you may notice other extended event-related weights on the server go up around when you were executing this query. But in general, no. I haven’t found a great way to track this one down.

And I have seen it happen a couple of times. I think early on with SP human events, before I added the skip plans thing. Funny story behind that.

Someone wanted me to watch a load test. And I was like, well, you can use extended events for that and capture some stuff. And so we set it up. And they started running stuff.

And they were like, this is taking like four times as long. And I was like, yeah, that’s extended events. So I don’t know. I wish I had a good way to say, lesson learned.

Always check extended events first. But it’s just so rarely a thing. I’m glad you figured it out. That’s good for you.

But no, I don’t have a great way of, you know, predetermining if there is a harmful extended event running aside from, you know. Well, I mean, like most client environments I walk into, like they have like some script from 2003 that looks at weight stats that they run once in a while. You know, the fact that like there are store procedures that do all this stuff a little bit more surgically is big news.

So most extended events are not up and running that you, you know, at least ones that you wouldn’t expect to see. For you, apparently you have someone to watch out for. And that someone might be you.

No, you never said who created that extended event. You just said you traced it back to an extended event. So I don’t know. You or someone you know is the murderer. We don’t know who.

I see that PSPO, that is the parameter sensitive plan optimization, works for update in SQL Server 2025. Have you had any good results with it? I’ve not really had all that many great results with the parameter sensitive plan optimization generally.

No, Microsoft, you know, did not think through bucketing things well, I don’t believe. And my gripes with the parameter sensitive plan optimization remain in place even through SQL Server 2025. There are certainly times when it works.

I can be perfectly OK. But yeah, there’s a there’s a there’s a lot to be desired with that one. You know, like the the most common and least common top bottom buckets and then everything else in the middle bucket.

Kind of like this, this thing in the middle can can span a lot of different row counts and you can end up with sort of ineffective plan variants for that. So I don’t know. Another another could have been, but probably ain’t going to be.

Hey, I always tell you, we got fabric. Lucky people. All right. Your new monitoring tool is sick.

Call the doctor. However, I am also working on a small niche commercial SQL Server monitoring tool similar to the big hitters, centralized repo, but much cheaper as a passion project. So I don’t think it’s my tool that’s cooking you.

DBA dash is free and already does that. Like, you know, I don’t know what you are a bit vague for what what is small and niche about your monitoring tool. I don’t know what specifically you you what you have niched yourself in.

But I think, you know, you like depending on like what your niche is, who that market is and how underserved they are. It might be sort of tough to get like software as a service off the ground is like, you know, I mean, you said it’s a passion project. So maybe the money doesn’t matter, right?

Like maybe if you can just get it out to some people and they’re happy with it, that’s good enough for you. But since you’re asking people to pay for it, I assume that you expect to make money from it. And that that might be a tough proposition in this day and age.

Unless you are already a fairly large, well-established enterprise vendor, you may have a hard time getting people to pay for something. You know, like like I mean, like obviously the thing that I’m doing is totally free, right? Like download it, take it, use it.

But, you know, it’s an open source project. And I always say, you know, hey, if you like this, you can contribute to it. If you find this useful, you can contribute to it. It’s been a very, very small portion of anyone caring enough to contribute to it. Thank you very much to anyone who has.

You’re beautiful, wonderful people, but you are the vast minority on that. So I don’t know who you’re going after with this tool. I don’t know what your I don’t know what your hook is, but good luck. I think it’s going to be a tough time for that sort of thing.

Let’s see. Could you share a few concrete use case scenarios where you have implemented AI within SQL Server 2025 in a production environment? Is this an interview or any in energy or banking sector?

Awful specific. So not SQL Server 2025, but Azure SQL, where the vector stuff has been around a little bit longer. I have helped people with some proof of concept stuff to for them to figure out how they could fit SQL Server’s vector search stuff in to various things that they are interested in trying, whether it whether that is, you know, like sort of like a chat body thing or sort of a developer assistance thing, you know, sort of getting into, you know, like, finding related issues, things like that.

I’ve certainly helped people with that sort of stuff, but I don’t know at this point how far those projects made it along. You know, people were fairly happy with the proof of concept, but, you know, the vector stuff in SQL Server is kind of dead in the water until vector indexes become writable. I don’t know when that’s going to be.

I don’t know what Microsoft is busy with, aside from Fabric, because, you know, hey, we got that. Remember that one? Remember that joke? So, I mean, like, a lot of people, I just think, aren’t going to be able to use it well until Microsoft actually finishes the features around it.

Vector indexing being one of them, getting a lot of that stuff out of the preview program and into GA. You know, until that happens, I think a lot of people are just going to say, that’s nice. Okay.

Cool. I’ll be over here waiting. You take your time. So, you know, I wish that it were further along. I wish that it were a little bit more well-developed and a little bit more, I don’t know, production-ready in a lot of ways.

But here we are. Here we are. That is not where we are today. We are here today, but that is not where Vector Search and SQL Server 2025 or Azure is today.

We are standing by. Been impressed by optimized locking yet? No.

Seen it? Looks cool? Been impressed? Nah. It’s all right. Anyway, thank you for watching. Hope you enjoyed yourselves.

Hope you learned something. And I will see you in tomorrow’s video where I will undoubtedly talk more about my free SQL Server performance monitoring tool a little bit more. Because I would desperately love to spread the word of that thing further and hope that more people get out there, try it, you know, in a perfect world, more people would get involved with the project, perhaps even sponsor the project.

But, you know, reporting issues, contributing code or ideas, asking questions, all that stuff helps me move things forward in a way that makes things better for everyone. So, if you’re into those things, stop by code.erikdarling.com. All right.

See you tomorrow.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Free SQL Server Performance Monitoring System Health Extended Event

Free SQL Server Performance Monitoring System Health Extended Event


Summary

In this video, I delve into the OCRAP section of my free SQL Server performance monitoring tool, Darling Data, which surfaces critical issues often hidden in the System Health extended event. This section is like a red flag area where you’ll find nasty things such as bad pages, memory dumps, suspect pages, non-yielding tasks, sick spin locks, and severe errors—basically everything that can go wrong with your SQL Server health. I walk through how to access this information via my GitHub repo at code.erikdarling.com, highlighting the importance of paying attention to these warnings as they can indicate serious underlying issues that might require immediate professional intervention or even a complete system overhaul.

Chapters

Full Transcript

Monitoring tool mogul Erik Darling here with Darling Data. And in today’s video, I want to go over what I call the OCRAP section. The OCRAP section are the real nasty things that can come out of the system health extended event. Normally, you might find that my free store procedure SP health parser would surface these things. move to sit there and run that and deal with the outcome. Well, you can use my free SQL Server performance monitoring tool and you can get all of this goodness done for you. Now, the reason why I call this the OCRAP section is because this is really is where a lot of like you’ll see like this server is not doing well type stuff. So, you know, bad pages, memory dumps, suspect pages, non yielding tasks, sick spin locks, severe errors, all sorts of information about CPU scheduler health, IO warnings, and memory conditions, like stuff that you don’t want to see, but stuff that if it’s happening, you should pretty, you should know about pretty quickly. So if you want to check this out, it’s all at code.erikdarling.com. That’s the shortcut to my GitHub repo. You go there, go to the performance monitor repo, there’s releases with all all of the zip files that contain all of the executables you need to get up and running with this. There’s also a wonderful readme file. That wonderful readme file contains all sorts of pertinent information to how you can get up and running with this without sending me emails asking me about why it’s saying that your server certificate isn’t trusted because you didn’t say to trust the server certificate. So with that out of the way, let’s jump over here. And let’s look at the system events tab. Now, again, like, this is the Oh, crap section. And if you see anything in here, you should call a professional doesn’t have to be me. I might not even be the right professional to talk to. But if you see stuff getting populated in here, you might want to, you know, show some concern with this stuff. So let me just quickly do that. So see some stuff. But so under here are corruption events. So if we detect bad pages, if you have any dump requests, no jokes, right access violations or general access violations. If you have non yielding schedulers, latch warning, six spin locks, or anything like that, that will show up in here and you will, you know, undoubtedly cry. I don’t know, switch to Postgres, whatever it is people do these days. Under severe errors, you know, like I, obviously, obviously, every single server has some errors on it, you know, mostly, you know, it’s like, it’s kind of like me, it’s either like either deadlocks or me killing off queries that I’m annoyed with. So there are some severe errors in here, which are mostly one of those two things. So let’s not worry about the video crap section is when stuff happens in here. Yeah, you run run for the run for the jungle, I guess. What is it? I don’t know. Anyway, scheduler issues, this will tell you tell you not about non yielding schedulers over over time. There is a handy little grid down a little status bar down here that will tell you about any current stuff that is going on.

There is a lot of pressure stuff in here. If this graph populates, you are having some memory issues. CPU tasks. Well, I mean, this is this is sort of a running trend of like how many workers you’re allowed to have and how many workers you’ve created. So if any blocking occurred, so like if you see that blue line, get up real close to that dotted sort of like, I don’t know, like Marigold line or something, that might like, like that might be something you care about, because that would mean that you are running out of worker threads. And that would mean, you know, thread pool and potentially other unfortunate circumstances. So worth looking in there. Memory broker stuff, you know, sort of like, you know, what’s consuming memory, things like that. And then memory node OOM, which is something no one wants to see. Again, I don’t have any of this bad stuff going on on my servers. So you know, we’re kind of lucky there. We did see like, at least an alert pop up. So you know, there’s a sign of life, I can acknowledge that alert, and things will mostly go away. But anyway, just, you know, again, this is all from the system health extended event, this is all running on most versions of SQL Server, and my free monitoring tool goes and collects it and will warn you about the old crap stuff. Right. So like important stuff for you to know about if you are dealing with SQL Server, because if you are dealing with a SQL Server that has like a lot of stuff going on in here, like a lot of stuff going on in here, you might might need to just find a new job, right? Fill out a resume, I’m done, quit.

We’re through here. Sorry about everything. I don’t know. I’d be scared seriously.

Anyway, thank you. Thank you. I hope you enjoyed this. I hope you enjoyed this. I hope you enjoyed this.

Bye-bye. Bye-bye. Bye-bye. Bye-bye. Bye-bye.

Bye-bye. Thank you so much.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.