Free SQL Server Performance Monitoring Troubleshooting Live Problems
Summary
In this video, I delve into the intricacies of monitoring SQL Server performance using HammerDB TPC-C workloads and the Light Dashboard MCP Server. I showcase how the dashboard provides real-time insights into server performance, including CPU usage, memory contention, and blocking issues, all while highlighting the unique capabilities of the MCP Server. The MCP Server offers a powerful analysis layer that can automatically detect and summarize performance issues, making it easier to identify and address bottlenecks without the need for manual digging through complex data. I also demonstrate how to use the dashboard to run historical TPC-C and TPC-H workloads, providing detailed insights into query performance and server behavior, and how to leverage the live dashboard for troubleshooting ongoing performance issues in real-time.
Chapters
- *00:00:00* – Introduction
- *00:00:31* – Monitoring Tool Overview
- *00:01:17* – Tool Special Features
- *00:01:42* – Analysis Layer
- *00:02:23* – MCP Tool Benefits
- *00:03:06* – Dashboard Overview
- *00:04:33* – Historical Workload Data
- *00:05:38* – TPC-H Workload Analysis
- *00:07:17* – Light Dashboard Overview
- *00:08:24* – Wait Stats Analysis
- *00:10:38* – Live Dashboard Troubleshooting
Full Transcript
Erik Darling here, the one, the only, the monitoring tool mogul, he of reasonable rates, the best SQL Server consultant, for now, outside of New Zealand, at least until the robots eat SQL Server consultant alive completely. Anyway, in this video, I’ve got a HammerDB TPC-C workload running in the background currently, sort of deals with stuff, and we’re going to see what a little chat about a historical TPC-C-C and TPC-H workloads look like while talking to the Light Dashboard MCP Server. So, like any monitoring tool, you have a lot of visual indicators as to what is going wrong with SQL Server.
You’ll have, you know, your typical overview of things, you’ll be able to see spikes in CPU, spikes in weights, you’ll be able to see if memory is causing any memory contention on the server, and of course, all the usual blocking and deadlocking stuff. Nothing about what I collect in that is special. What makes this tool special are some of the other things that go on in there, and also some of the things that I’ll be adding into there.
I’m adding an analysis layer on top of collected data as we speak, so that’ll start rolling out, and you’ll be able to start getting SQL Server consultant expert advice, just sort of running rules and checks over the collected data in the dashboard. Both dashboards, full and light. So, that’ll be coming out soon.
And also, the MCP stuff. I have seen some monitoring tools have like an AI window in there where you can ask… It’s not good. It’s actually quite laughable. You should feel ashamed and go home if that’s what you’ve done. The MCP stuff is much more powerful because it gives well-defined tools to the robots to access just the monitoring data with and give you much more insight about than just asking some random GPT sitting there floating in the nether.
So, you can do, you know, the sort of standard investigation. You can look through the graphs and dashboards and you can be quite happy that you have all that and that you can take screenshots and do all that stuff with. But, what the MCP tool brings to the equation is doing a bunch of analysis on stuff for you, right? And it’s like, you know, you can get the screenshots and you can be the big hero and you can say, hey, like, I see this, this and this. Like, you know, when I saw these spikes, I went and I dug in and I found all this stuff going on.
Or you can just have a robot look at that same raw performance data and summarize it all for you, right? There’s all sorts of neat stuff that you can do that makes your job and your workflow a bit easier. What’s also nice is that if you run into something, you can say, hey, go look at this and then go do something else, right?
It’s like a couple of minute context switch to be like, yeah, that spikes bad. And hey, hey robot, tell me what was going on during the spike. You can go do something else. Come back to a summary of what happened during the spike without you having to dig through all 15 different layers of stuff.
So, we got the overview tab and the weights tab, and that’ll show you information about sort of how the server is currently doing and what things are currently waiting on. Then you can, you know, look through the standard expensive queries, look at blocking, stuff like that. And then if you find that, rather if you don’t find your answer there, you can look through deeper metrics that will tell you more about what’s going on on the server.
Stuff like memory, blocking, deadlocking, all the other things. And then, you know, you can just put that all sort of together in the way that you would put together any sort of like RCA report. Again, if you want to check this stuff out, this is all available completely for free on GitHub.
There’s no email signup required. There’s no telemetry. There’s no phoning home. If you like this project a lot, you can always choose to sponsor it. It’s a nice thing to do. Anyway, that’s about that.
Let’s take a look at, you know, let’s look at the MCP stuff first, because this is me asking the robots to run the TPCC workload. So you caught me there. But this is information from a historical TPCC and TPCH run on March 4th and 5th. And we can see all this stuff going on in here, right?
SQL Server hitting a high CPU window, right? Average CPU 33, peak 81, right? All this stuff kind of nicely just broken down for you. And then it, you know, it’ll break down all the weight stats that happened in there and like sort of the important stuff.
And, you know, give you just sort of a nice way of sort of framing up everything that went on during that. And then it will start telling you about queries that suffered these problems, right? So we have some updates that were blocking. We had a select with an updelock hint.
My goodness, why would you do that? Serializable and updelock. It’s still my heart. And then we have other queries that did a lot of work in other ways, right?
We have some CPU stuff, some long running stuff. So, you know, just things in there that we would, you know, again, all stuff that you like good to include in any sort of RCA report. And then we have stuff about the TPC-H workload.
And look at this thing, knowing exactly which queries are which from that workload. Isn’t that beautiful? It knows what all of them are, right? I mean, there’s a comment in there that says it, but at least it’s nice about that.
And it’s just like, hey, I can identify these things a little bit better, right? So like you get all the stuff in there that you would expect to see. Executions, average CPU, elapsed, total CPU, all this other good stuff.
And then we get some information. Apparently, one of those things gave TempDB some fits. Look at that.
Hoo-wee. We did some work. And then, you know, we get some other stuff there and a nice little summary. So this is stuff that you can do if you opt in to having the MCP servers available for the monitoring tool. You can just go in, start asking questions about your data, and they only look at the performance data.
So you can get just information about what is already collected and it’s already nice and spread out and like completely correlated in very nice ways so that there’s no guesswork. It’s like with SQL Server, everything is aggregate since the server started up.
Or it gets lost on restart or, you know, like the plan cache and stuff will get lost. So like this is just a no-brainer way to collect all that data. Make your life real, real easy, right?
You can do all this analysis very quickly. And this analysis is, you know, maybe not exactly what I would say, but it’s still pretty good, right? For just waiting a couple minutes for this thing to go through some historical stuff. Anyway, let’s go to the light dashboard.
Let’s bring that up. And you can see exactly where I left off last time. And I think we should probably go back to the wait stats tab here. And let’s just do the last hour so that we start to see just what we have for the HammerDB workload that is currently running, right?
So we’ll just apply that to all tabs. So we’re only looking at one hour across all of them. So this stuff going on behind my head, this is the TPC C workload currently running, right?
And if we look at this over here, we’ll see that we are waiting a whole lot on write log. My goodness, right? And we have a bunch of CX consumer down here.
I’m surprised that we haven’t started hitting the lock weights yet because there’s been a bunch of blocking and deadlocking. But one thing that you can always do is flip this to average wait time. And you start to see weights in a slightly different way, like this transaction mutex.
That’s fun. Async network IO, also fun, right? So just a couple of different views of the wait stats data that allow you to sort of analyze things in slightly different ways. Under queries, now this is where things get kind of neat, is if we go to the active queries tab, we’ll have this stuff that sort of aggregates cumulatively as we look at things.
But we also have this live snapshot tab. And the live snapshot tab will tell you what’s running at the exact moment that you push the button or hit refresh. And you’ll get all this stuff back where, you know, we can see that there is a whole bunch of blocking going on.
So this is excellent to have at your disposal if you are curious. And like, you know, let’s say that maybe you have this tool installed somewhere else and you don’t have management studio, maybe looking at the server or you don’t have management studio at all.
You can just see what’s running in a smart way without having to go do anything. The CPU graph is going to start to tick up a little bit. It’s not very interesting at the moment.
We have only hit about 17% CPU and we’ve screwed up the green screen a little bit because I moved, which is always fun, right? It’s nice to be able to count on technology when you need it most. And then if we go over to the blocking tab, we should see now a little bit more blocking going on.
We see that the lock wait time has bumped up now, right? We have some stuff going on in here and we have had a little spike in deadlocks. And so, oh, look, look over here.
There is a deadlock detected. We got a pop-up notification during the video. It’s wonderful, right? And the blocking and deadlock stuff doesn’t always populate immediately because we do have to parse a little bit of XML in here, but this is showing all the deadlocks that we’ve hit so far.
And this is giving us a pretty good view of what queries are having problems. And from here, we can look at the deadlock graph. We can do all sorts of stuff to start troubleshooting exactly why the deadlocks are occurring and maybe even start fixing them on the fly.
Because, you know, one thing that I love doing is adding indexes and stuff on the fly when there’s already a server having a bunch of problems under load. And it’s good fun. Good fun.
Anyway, just a little overview of how you can use the live dashboard to troubleshoot live performance problems. You start seeing spikes. You start getting warning notifications.
You can chat with the MCP server about it. We got another deadlock detected, so we are piling them on. This TPCC workload is no joke. All right.
Thank you for watching. I hope you enjoyed yourselves. I hope you’ll try this tool. Again, this is all available at code.erikdarling.com. If you want to check it out, totally free, totally open source. Start monitoring your SQL servers, finding professional level SQL Server performance problems very, very quickly, very, very easily.
Alright, thank you for watching.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

