Advanced T-SQL Triage: I’m Running a Full-Day Pre-Con in Jacksonville

Advanced T-SQL Triage: I’m Running a Full-Day Pre-Con in Jacksonville


On Friday, May 1st, I’ll be in Jacksonville for Day of Data’s pre-con day, running an all-day session called Advanced T-SQL Triage: The Art of Fixing Terrible Code.

If you’ve ever inherited a stored procedure that looks like someone lost a bet with the optimizer — MERGE statements, RIGHT JOINs nobody can explain, CTEs stacked eight deep, scalar functions wrapping scalar functions — this one is for you.

All attendees get free access to Learn T-SQL With Erik.

What we’re doing all day


We’re not going to sit through slides about what a clustered index is. We’re going to look at queries that are actually broken and fix them. On the table:

– Paging logic that scans when it should seek
– Window functions that spool and spill because nobody gave them a sort to work with
– Indexed views that look clever on paper and lock everything in practice
– Data modifications that block like linebackers
– Dynamic SQL that’s parameterized, fast, and doesn’t make your DBA cry
– When CROSS APPLY is the right tool (and when you’re reaching for it because you don’t want to think)
– Views vs. inline TVFs vs. scalar UDFs, and why the optimizer treats them very differently
– Why RIGHT JOIN is not simply LEFT JOIN spelled backwards
– Rewriting scalar UDFs so they stop wrecking your execution plans

You’ll leave with a cheat sheet, a working mental model for diagnosing slow queries without guessing, and — if the day goes the way it usually does — a short list of queries at work you’re itching to go refactor on Monday morning.

Details


When: Friday, May 1, 2026, all day
Where: University of North Florida, Jacksonville, FL
Register: Here

The free Day of Data event is the next day, Saturday May 2nd, same venue. Come for the pre-con, stay for the community event. The hotel discount at the Hilton Tru cuts off April 22, so if you need a room, don’t sit on it.

If your queries scare you, come fix them with me.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Free SQL Server Performance Monitoring Troubleshooting Live Problems

Free SQL Server Performance Monitoring Troubleshooting Live Problems


Summary

In this video, I delve into the intricacies of monitoring SQL Server performance using HammerDB TPC-C workloads and the Light Dashboard MCP Server. I showcase how the dashboard provides real-time insights into server performance, including CPU usage, memory contention, and blocking issues, all while highlighting the unique capabilities of the MCP Server. The MCP Server offers a powerful analysis layer that can automatically detect and summarize performance issues, making it easier to identify and address bottlenecks without the need for manual digging through complex data. I also demonstrate how to use the dashboard to run historical TPC-C and TPC-H workloads, providing detailed insights into query performance and server behavior, and how to leverage the live dashboard for troubleshooting ongoing performance issues in real-time.

Chapters

Full Transcript

Erik Darling here, the one, the only, the monitoring tool mogul, he of reasonable rates, the best SQL Server consultant, for now, outside of New Zealand, at least until the robots eat SQL Server consultant alive completely. Anyway, in this video, I’ve got a HammerDB TPC-C workload running in the background currently, sort of deals with stuff, and we’re going to see what a little chat about a historical TPC-C-C and TPC-H workloads look like while talking to the Light Dashboard MCP Server. So, like any monitoring tool, you have a lot of visual indicators as to what is going wrong with SQL Server.

You’ll have, you know, your typical overview of things, you’ll be able to see spikes in CPU, spikes in weights, you’ll be able to see if memory is causing any memory contention on the server, and of course, all the usual blocking and deadlocking stuff. Nothing about what I collect in that is special. What makes this tool special are some of the other things that go on in there, and also some of the things that I’ll be adding into there.

I’m adding an analysis layer on top of collected data as we speak, so that’ll start rolling out, and you’ll be able to start getting SQL Server consultant expert advice, just sort of running rules and checks over the collected data in the dashboard. Both dashboards, full and light. So, that’ll be coming out soon.

And also, the MCP stuff. I have seen some monitoring tools have like an AI window in there where you can ask… It’s not good. It’s actually quite laughable. You should feel ashamed and go home if that’s what you’ve done. The MCP stuff is much more powerful because it gives well-defined tools to the robots to access just the monitoring data with and give you much more insight about than just asking some random GPT sitting there floating in the nether.

So, you can do, you know, the sort of standard investigation. You can look through the graphs and dashboards and you can be quite happy that you have all that and that you can take screenshots and do all that stuff with. But, what the MCP tool brings to the equation is doing a bunch of analysis on stuff for you, right? And it’s like, you know, you can get the screenshots and you can be the big hero and you can say, hey, like, I see this, this and this. Like, you know, when I saw these spikes, I went and I dug in and I found all this stuff going on.

Or you can just have a robot look at that same raw performance data and summarize it all for you, right? There’s all sorts of neat stuff that you can do that makes your job and your workflow a bit easier. What’s also nice is that if you run into something, you can say, hey, go look at this and then go do something else, right?

It’s like a couple of minute context switch to be like, yeah, that spikes bad. And hey, hey robot, tell me what was going on during the spike. You can go do something else. Come back to a summary of what happened during the spike without you having to dig through all 15 different layers of stuff.

So, we got the overview tab and the weights tab, and that’ll show you information about sort of how the server is currently doing and what things are currently waiting on. Then you can, you know, look through the standard expensive queries, look at blocking, stuff like that. And then if you find that, rather if you don’t find your answer there, you can look through deeper metrics that will tell you more about what’s going on on the server.

Stuff like memory, blocking, deadlocking, all the other things. And then, you know, you can just put that all sort of together in the way that you would put together any sort of like RCA report. Again, if you want to check this stuff out, this is all available completely for free on GitHub.

There’s no email signup required. There’s no telemetry. There’s no phoning home. If you like this project a lot, you can always choose to sponsor it. It’s a nice thing to do. Anyway, that’s about that.

Let’s take a look at, you know, let’s look at the MCP stuff first, because this is me asking the robots to run the TPCC workload. So you caught me there. But this is information from a historical TPCC and TPCH run on March 4th and 5th. And we can see all this stuff going on in here, right?

SQL Server hitting a high CPU window, right? Average CPU 33, peak 81, right? All this stuff kind of nicely just broken down for you. And then it, you know, it’ll break down all the weight stats that happened in there and like sort of the important stuff.

And, you know, give you just sort of a nice way of sort of framing up everything that went on during that. And then it will start telling you about queries that suffered these problems, right? So we have some updates that were blocking. We had a select with an updelock hint.

My goodness, why would you do that? Serializable and updelock. It’s still my heart. And then we have other queries that did a lot of work in other ways, right?

We have some CPU stuff, some long running stuff. So, you know, just things in there that we would, you know, again, all stuff that you like good to include in any sort of RCA report. And then we have stuff about the TPC-H workload.

And look at this thing, knowing exactly which queries are which from that workload. Isn’t that beautiful? It knows what all of them are, right? I mean, there’s a comment in there that says it, but at least it’s nice about that.

And it’s just like, hey, I can identify these things a little bit better, right? So like you get all the stuff in there that you would expect to see. Executions, average CPU, elapsed, total CPU, all this other good stuff.

And then we get some information. Apparently, one of those things gave TempDB some fits. Look at that.

Hoo-wee. We did some work. And then, you know, we get some other stuff there and a nice little summary. So this is stuff that you can do if you opt in to having the MCP servers available for the monitoring tool. You can just go in, start asking questions about your data, and they only look at the performance data.

So you can get just information about what is already collected and it’s already nice and spread out and like completely correlated in very nice ways so that there’s no guesswork. It’s like with SQL Server, everything is aggregate since the server started up.

Or it gets lost on restart or, you know, like the plan cache and stuff will get lost. So like this is just a no-brainer way to collect all that data. Make your life real, real easy, right?

You can do all this analysis very quickly. And this analysis is, you know, maybe not exactly what I would say, but it’s still pretty good, right? For just waiting a couple minutes for this thing to go through some historical stuff. Anyway, let’s go to the light dashboard.

Let’s bring that up. And you can see exactly where I left off last time. And I think we should probably go back to the wait stats tab here. And let’s just do the last hour so that we start to see just what we have for the HammerDB workload that is currently running, right?

So we’ll just apply that to all tabs. So we’re only looking at one hour across all of them. So this stuff going on behind my head, this is the TPC C workload currently running, right?

And if we look at this over here, we’ll see that we are waiting a whole lot on write log. My goodness, right? And we have a bunch of CX consumer down here.

I’m surprised that we haven’t started hitting the lock weights yet because there’s been a bunch of blocking and deadlocking. But one thing that you can always do is flip this to average wait time. And you start to see weights in a slightly different way, like this transaction mutex.

That’s fun. Async network IO, also fun, right? So just a couple of different views of the wait stats data that allow you to sort of analyze things in slightly different ways. Under queries, now this is where things get kind of neat, is if we go to the active queries tab, we’ll have this stuff that sort of aggregates cumulatively as we look at things.

But we also have this live snapshot tab. And the live snapshot tab will tell you what’s running at the exact moment that you push the button or hit refresh. And you’ll get all this stuff back where, you know, we can see that there is a whole bunch of blocking going on.

So this is excellent to have at your disposal if you are curious. And like, you know, let’s say that maybe you have this tool installed somewhere else and you don’t have management studio, maybe looking at the server or you don’t have management studio at all.

You can just see what’s running in a smart way without having to go do anything. The CPU graph is going to start to tick up a little bit. It’s not very interesting at the moment.

We have only hit about 17% CPU and we’ve screwed up the green screen a little bit because I moved, which is always fun, right? It’s nice to be able to count on technology when you need it most. And then if we go over to the blocking tab, we should see now a little bit more blocking going on.

We see that the lock wait time has bumped up now, right? We have some stuff going on in here and we have had a little spike in deadlocks. And so, oh, look, look over here.

There is a deadlock detected. We got a pop-up notification during the video. It’s wonderful, right? And the blocking and deadlock stuff doesn’t always populate immediately because we do have to parse a little bit of XML in here, but this is showing all the deadlocks that we’ve hit so far.

And this is giving us a pretty good view of what queries are having problems. And from here, we can look at the deadlock graph. We can do all sorts of stuff to start troubleshooting exactly why the deadlocks are occurring and maybe even start fixing them on the fly.

Because, you know, one thing that I love doing is adding indexes and stuff on the fly when there’s already a server having a bunch of problems under load. And it’s good fun. Good fun.

Anyway, just a little overview of how you can use the live dashboard to troubleshoot live performance problems. You start seeing spikes. You start getting warning notifications.

You can chat with the MCP server about it. We got another deadlock detected, so we are piling them on. This TPCC workload is no joke. All right.

Thank you for watching. I hope you enjoyed yourselves. I hope you’ll try this tool. Again, this is all available at code.erikdarling.com. If you want to check it out, totally free, totally open source. Start monitoring your SQL servers, finding professional level SQL Server performance problems very, very quickly, very, very easily.

Alright, thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Free SQL Server Performance Monitoring: Lite Dashboard

Free SQL Server Performance Monitoring: Lite Dashboard


Summary

In this video, I dive into the Lite dashboard of my free open-source monitoring tool, which has garnered significant attention with over 10,000 installs based on GitHub repo stats. I highlight its user-friendly nature, especially for consultants or those who can’t install software on client servers, as it allows you to collect a wide range of performance metrics without the need for a separate database. I also showcase how DuckDB, an embedded analytics database, powers the Lite dashboard, ensuring fast query performance and efficient storage through Parquet file compression, making it an ideal solution for monitoring Azure SQL databases and other environments.

Chapters

Full Transcript

ErikMonitoringToolMogulDarling here. I suppose my rates are reasonable for this free open source monitoring tool as well. So, you know, that’s nice. But I want to talk about the Lite dashboard a little bit more. This is far and away the most popular download based on GitHub repo stats. I do keep an eye on those because I care and I want to make sure that people are using this, getting stuff. I want to say that we’re very close to 10,000 installs, or at least that’s what it would seem like based on the numbers that I’ve seen over the past week or so. So that’s pretty exciting. Big monitoring. I’m coming for you. Coming for your customers. I’m going to eat you alive. Anyway, the Lite dashboard is fun because you don’t have to install anything. You just open it up, point it at a SQL Server, even Azure SQL database, and it’ll just start chewing away collecting data and making your performance tuning life easier. It’s got a lot of the same tabs and collectors that the full dashboard does, but it’s also got some kind of neat stuff internally.

And the thing that I really love about it is, you know, I’m a big fan of DuckDB. I think it’s one of the coolest things that’s happened in databases in a long time. And so what I wanted to, I wanted, when I was first working on this, you know, doing this sort of design spec, you know, almost anything where it’s like embedded database, you’re like, oh, SQLite. The thing is, I’m not doing SQLite work here. All right. I’m not doing tiny little transactional things. I’m like inserting bunches of rows and stuff. And then I need to like, you know, run reporting type queries off them. And DuckDB is a no brainer for that. So I have DuckDB embedded in this thing. And what I do is I load up DuckDB until it has about 512 megs in the tables. And then when it, as soon as I hit 512 megs, I archive all that stuff out to Parquet files, which have insane compression on them. And that helps to keep the, helps to control the size of the database over time. The Parquet files are very, very well compressed. It’s a very cool little thing that goes on in there.

So like I said, it’s got just about all the same tabs and whatnot that you would expect to see in the full dashboard. I tried to replicate as much of it as possible. But there are a few things that I’m just not quite keen to do, like run SP health parser to look at the system health extended event. And what’s the other one, I don’t set up a custom trace in this one. So there are a couple things that I left out of light just to make it a little bit lighter. But hopefully, the set of things that I collect in here is useful. enough for you that it sort of makes up for the few things that are missing. So like I said, there’s no system events tab. That’s SP health parse or looking at the system health event. I don’t look too much at the default trace stuff. But there are some very cool things that I do have in here that I think make this a unique sort of proposition. One is Azure SQL database support. The full dashboard obviously doesn’t work there because it creates a database and store procedures and all the other stuff. So it just can’t function with Azure SQL database. Whereas the light one absolutely can. But they both have the MCP servers built in those are opt in only so if you if you want to have one of your robot friends talk to you your performance data and just your performance data, you can have a chat away with the light database and that I think that’s even kind of in a way better than the full dashboard because it is confined to what has been collected in DuckDB in the parquet files and it doesn’t have to go like run any queries against the database where the data lives. Right. So like with the full dashboard, you have the performance monitor database and that’s collecting stuff. And the MCP servers talk to just that. But this is just talking to DuckDB wherever it lives. So that’s kind of neat. It has the same alerts and email notifications system tray stuff like that.

CSV export dark theme all the things that I have in there. My dear contributor Claudio added some light theme support recently. I haven’t used light themes yet because you know I’m a dark mode kind of guy but it’s cool that they’re in there just in case you want them. So DuckDB, I chose DuckDB because it’s an embedded analytics database. I don’t need a separate server process. It lives inside its own executable. There’s no external dependencies and it is by default columnar storage. So it’s a So all of the stuff that I put in there, all the queries are very fast against it. And like when they go out to parquet, there’s like almost no hit on query performance.

When it, when my date, when the, when the, my, my DuckDB, when your DuckDB database gets to be about 512 megs, I bump everything out to parquet files and like just sort of empty the tables out and start over again. Uh, and the compression on those parquet files is amazing. Uh, the footprint on them is very, very small. Um, it’s all very portable. Uh, you can, you know, take it wherever you want. And, uh, there’s no SQL Server dependency for storing historical data. It’s all captured within the executable.

So, uh, I’m going to show you a few things poking around the, um, the, uh, the light dashboard. Uh, this is primarily aimed at, um, you know, just anyone. I mean, it says consultants, but if you can, if you’re not allowed to install stuff on client servers, um, or like, you know, your servers at work and you don’t want to like have a database sitting on there collecting stuff. This is really perfect. Just a way to like spin it up, collect, collect a whole bunch of great performance metrics.

And especially if you’re on Azure SQL database, and this is the one that supports it. Uh, but you know, you can literally run it from anywhere. Uh, if you’re in a locked down environment, we really needs minimal permissions for stuff. So it makes life a lot easier. Um, uh, that’s not what I wanted. All right. Let’s go to the light dashboard. There we go. All right. So this is connected to SQL Server 2022.

Uh, like I said, it has just about all the stuff that you would want a dashboard to collect. Uh, you start off on the wait stats tab. Um, over here, we’ve got the list of servers and their sort of general, um, you know, presence in the world. Uh, we’ve got an overview tab showing all the usual stuff where, uh, you know, what CPU is, you know, if there’s any blocking or deadlocking, what memory’s up to, uh, if there are any alerts that have been fired off.

I don’t think I’ve had any alerts recently. Um, I’m working on this FinOps tab because, uh, a lot of people have been like sort of behind the scenes asking me about this because apparently a lot of managers and bosses really care about this. So I’m trying to get this stuff going, uh, so that, um, I can sort of make, uh, on top of it being a free monitoring tool, uh, it can be something that points you to where you can save money, especially with Azure SQL database cloud stuff. Um, if things are, you know, over provisioned, under provisioned, all that.

So, uh, I’m working on this, uh, it’s a work in progress. Uh, it’ll be out soon, but it’s, you know, gonna be a few versions before I think it really finds its footing, but I got to start somewhere so I can get some feedback and start getting some, uh, real sort of real, um, you know, scenarios from it other than what’s going on in my local test servers. But just, just to make sure it all works, we get this stuff first, but anyway, we get weight stats.

Uh, again, it’s all the weights that, uh, we, that we look at in the same way as the full dashboard. So like your top 10 weights, plus the sort of usual suspects, plus the poison weights. So if anything terrible is happening, you can see it pretty quickly.

Uh, under the queries tabs, again, I’m such a developer. I don’t have any stored procedures running, but, uh, you know, you sort of get a look at query duration from, uh, the plan cache, from query store, and then, uh, executions is what’s behind me.

Um, you can get an active queries tab. This is not run SP who is active. This just runs something. This just runs a simple script from, uh, SP pressure detector, uh, to get a whole bunch of information about what’s currently running on the server.

Uh, there’s a top queries by duration. And, you know, just like with everything else, um, you know, you can grab the execution plans, you can see the queries, you can open them in the plan viewer. Um, you know, everything that you would come to expect from other, other, other parts of the tool.

And then there’s the same thing for query store. Um, the plan viewer, I’ve talked about that in a few videos. Um, we just say view plan that pops up in plan viewer and you get, you know, sort of, you get the, this is just the estimated execution plan for this, which is not terribly interesting, but you know, that’s okay.

Uh, estimated plans are not meant for real, real use anyway. Uh, we have a CPU tab that will show, uh, CPU usage, um, sort of, you know, over time, just, you know, same thing as in full. Uh, we have a memory usage tab up here, uh, where we have the overview.

We have memory clerks. We have memory grants, uh, file IO, uh, broken down, uh, first by latency, right? So we have read latency and write latency, and then we have throughput numbers.

So, uh, latency is time and throughput is like data. So these, these are all measured in megabytes and you can see like over time, just sort of, you know, when things spiked up and when you might want to take a look at things. Uh, we have a temp DB tab, of course, uh, we have blocking and deadlocking and all the other stuff.

Uh, I don’t think there’s really been any on this system lately. We have some lock weights, but nothing has really been getting jammed up. Uh, so this section is not going to be terribly interesting right here, but, you know, on your systems, it might be much more interesting.

Uh, we collect perfmon counters, just like in the full dashboard. Uh, we collect information about running jobs. Uh, I tell you about your server, database, server and database configuration, database scope configurations, and the active trace flags.

Uh, sort of a daily summary of what’s been going on on the server. For this server, it’s been pretty quiet today because I’ve been working on other stuff and, you know, trying to get ready to record these videos. And then finally, there is a collection health tab and the collection health tab is nice because this will tell you, um, if, uh, anything has been going wrong with, um, your servers.

When thing, when, when the collector has been running now, you’ll notice that there are a bunch of errors in here. And that’s because I had a wifi blip earlier and everything went weird for a minute. So the, but the wifi blip has since been, uh, since been fixed.

So you can just ignore these errors for now. Uh, there’s also a collection log that has sort of like, um, more specific breakdown of, um, like when things failed and all that other stuff. And then, uh, a duration trends tab, which will tell you, uh, which collectors, uh, sort of take the longest.

And if you look at this one here, apparently our query store query could use a bit of tuning. Cause at times it takes 1.5 seconds to grab that query store data. So I’ll be having a look at that when I’m done recording this video.

Anyway, this is just a short tour of the light dashboard. Uh, we’re going to go more in depth on, uh, other parts of this over the next couple of videos. But, uh, anyway, um, again, this is all at code.erikdarling.com.

Let me get to a slide where that actually shows up. There we go. That’s wonderful. Uh, code.erikdarling.com. Uh, it’s totally free open source.

Uh, it doesn’t do anything weird or funky. Um, you know, no requirements on email, no phone home. No telemetry, nothing like that. Uh, just wanted to put something kind of cool out there in the world to help you find folks, uh, with your SQL Server performance tuning efforts.

Anyway, thank you for watching and I will see you over in tomorrow’s video. All right. Goodbye.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Introducing Performance Studio Free SQL Server Query Plan Analysis Tool

Introducing Performance Studio Free SQL Server Query Plan Analysis Tool


Summary

In this video, I introduce my latest tool, Performance Studio, which I’ve been working on and refining for a while now. Performance Studio is a powerful, free tool designed to help SQL Server professionals analyze and optimize their queries more effectively. I walk through some of its key features, including a query editor that allows you to run arbitrary queries against a SQL Server, and an execution plan viewer that provides detailed insights into query performance. I also highlight the integration with SQL Server’s Query Store, which helps track and analyze query performance over time. Throughout the video, I demonstrate how Performance Studio surfaces important information like runtime summaries, weight stats, and parallelism effectiveness, making it easier to identify areas for optimization. I encourage viewers to check out the tool on my GitHub repo at code.erikdarling.com and provide feedback to help it continue to evolve.

Chapters

Full Transcript

Erik Darling here with Darling Data, and in today’s video we are going to debut on YouTube, it’s been debuted in other venues for a little bit now, my new tool called Performance Studio. Now, this sort of spawned from the plan viewer that I added to the monitoring tool stuff, but I didn’t want to add all this stuff in there. I think that sort of focused, and it seemed like this would be a nice sort of spiritual successor to Plan Explorer, which has been sort of left on the shelf for a very long time, and has not seen a lot of investment or work or new development. So, we’re going to go through some of the sort of top-level features here today. Hopefully this will convince you to go get it. Now, this is all available on my GitHub repo. It’s at code.erikdarling.com. So, if you’re interested in this, you can go there, check out the releases section, and you will find the latest and greatest from what I have here. Now, the first thing that you’ll see usually when you open this is a query editor. I do allow you to run arbitrary queries against a SQL Server of your choice, so don’t do anything here that you wouldn’t do in SSMS. You run the query, it’ll discard all the results and just give you an execution plan back. You can get either an estimated or an actual plan, depending on what your needs are at the moment. But here we have a simple query, just select, just connected to Stack Overflow 2013. If you get an actual plan, we get this back. Some of the stuff that I surface, you know, sort of top-level is, you know, things that hopefully help you figure out if the query in front of you is, you know, worth dealing with. So, you know, it gives you some server context, gives you a runtime summary, and over here it’ll show you weight stats for the query in front of you.

So, you can sort of graph those out. There’s not a lot too much, there’s not too much interesting in this plan, so I’m not going to spend too much time here. We have some other interesting plans loaded up that have a little bit more to them, so we’ll do that. Another thing that’s built into this is query store integration. So, you can connect this. Right now, I just have it set up to grab the top, you know, like some number of plans, some number of hours back from query store, and you can either load all those selected plans into new sort of plan view or windows, or if you just right click, you can go grab one and say load plan, and it will bring the plan from query store up in there.

So, we do have a little bit of stuff in there. On the query store thing, I am going to add a bit more to that. Some filtering mechanisms, query ID, plan ID, procedure name, query hashes, and whatnot. Stuff that you would expect to see in my store procedure SP quickie store, to sort of help you locate the query plans that you care about. But, I have some actual plans with some actually interesting things in them.

And before I get too far along in this, I want to show you some of the buttons. So, we have this bad time SQL plan up here, and if I hit human advice, I’ll get this tab that comes up, and this tab will give me a sort of human readable analysis of what went on in the query plan, things that you might care about, you know, how long different operators ran for. One thing I’m particularly proud of in this is that, and it’s probably not spot on perfect just yet, there is going to be some edge cases and things that I need to deal with, but in Management Studio, we’re in row mode, and if you get an actual execution plan, the operator times are cumulative going from right to left.

I’ve done my best to sort that out, so you just see the per operator times like you would with batch mode operations. But, down in here, you just, you know, you get a, this thing, you get a breakdown of, you know, operators in the plan that ran for a long time, you get information about weight stats for the plan, assuming that, you know, you’ve run the plan, you’ve gotten the plan in a way that collects those things, and then we’ll give, then I break down all the plan and operator warnings, and if there are any missing indexes, we’ll surface those. One feature I’m working on right now is to, if we have an actual plan, I’m going to be getting rid of that silly impact percent number, and I’m going to replace it and try to match the missing index request to the operator that it happened on, and sort of give you the operator CPU and duration for that.

That’s another thing that I added to this, that SSMS is sadly lacking, is if you look over here, you know, SSMS only shows you the duration number. I’m also showing the CPU number for this, so you can not only see how long something took, but you can also see how much CPU it burned. So there’s a lot of good stuff in here that, you know, I think, surfacing extra good information about the query that ran, so that you can get sort of a good, sort of high-level visual, like, understanding of where you need to focus.

Up in the runtime summary for this one, I got a little tooltip on that, up in the runtime summary for this one, there are some things in here that I’m going to show you in other plans where it’ll change, but, you know, stuff that I wish I kind of knew about when I was looking at a plan, you know, 24 of 32 threads used, we had 8 inactive CPU threads on this. We got granted 1.89 gigs, and it calls it out explicitly, like, hey, we only use 4% of our memory grant.

I’m going to probably tweak that rule a little bit for, but I just wanted to start with, like, a 1-gig grant to get started. But then there’s this other thing, and actually, let me show you this in a different query in a minute. But the parallelism effectiveness is another cool sort of thing that I look at.

And a lot of, some of my videos where I talk about, like, parallel plans that actually just run on a single thread, I actually measure stuff out to give you warnings about that if you run into this situation. If you hover over the select or the root operator in the plan, you’ll get a sort of high-level overview of all the sort of warnings that got called out in the plan.

Stuff like, you have a large memory grant, apparently, and an excessive memory grant. We currently have two giant forms of memory grant in this. Scans with predicates, so indicating that, you know, maybe an index would help you out somewhere.

Things like that. And, you know what, let’s go to some other plans in here that have other stuff in them. Oh, wait, I was promising to show you some buttons. Under robot advice, if you want to paste this to the robot of your choice, there is a JSON schema of all the stuff that ended up in the human advice.

And there’s also two buttons over here. One to copy a repro script. So if we hit that and we paste that in here, you’ll see that, like, we got the full query so that we could run this, or we could edit this query and try to tune it.

And also, if we hit just run repro script, then this thing would run and, well, we have to get it connected first. But if we run this, then we’ll go get an actual execution plan for it. But while we’re waiting on that thing, right, because who knows how long that’ll take, there are some other things in here that are useful for us.

So where I was talking about the sort of parallelism effectiveness, note this thing ran at DOP 8, right? But the CPU was only about 3x or a little bit more than 3x the elapsed time. So DOP 8 was not terribly effective, right?

We had a lot of parallelism, but not a lot of parallelism helped, right? Because in an ideal world, elapsed time, or rather CPU should be like, if it ran at DOP 8, elapsed time should be like 8x, or rather CPU time should be like 8x elapsed time. So we did not quite get there, right?

We only got to about 3x. So, I mean, not saying this plan shouldn’t have gone parallel, but maybe like even like DOP 4 would have been just about the same there. Some other neat things that go on in here.

You know what, let me get to this one. Something that I have wished SSMS would yell about a lot more loudly is when you have a many-to-many merge join that actually uses a work table and does a bunch of stuff, because this is not a good time. This is a real performance nightmare.

So this is one thing that I decided I really wanted to show you in query plans. If we look at the parallel skewed plan, this is another thing that I wish SSMS would have been more loud about. But if we hover over this index seek, we’ll see this warning here.

Ah, there we go. Parallel skew. I’m all in the way. Let’s see if we can get that to a slightly better place on the screen. All right.

There we go. So parallel skew. Come on. Zoom it. Cooperate with me, would you? All right. Parallel skew. 100% of the rows. All right. All went to one thread.

So if this happens in a parallel plan, I will be loud and clear about it. So another nice thing that you can use to sort of figure out where things are going wrong in a plan. Some other stuff that I wish SSMS was louder about.

Really surfacing when we have a non-parallel plan reason. So this will tell you if it can exactly why a query was not allowed to go parallel. And then if we hover over the clustered index insert right here, which is to a table variable, we’ll see this lovely warning down at the bottom here that says table variable.

Modifying a table variable forces the entire plan to run single threaded, replaced with the temp table to allow parallel execution. All right.

So there’s all sorts of helpful stuff for you as query tuners to just very quickly get a lot of information without having to do a lot of digging into query plans. Now, the last thing that I’m going to show you here today is just a quick plan comparison thing. So if we hover, if we, rather, if we select this query plan and we hit compare plans and we change this to do-do-do-do and you and we hit compare, rather than have like a big complicated like picture of what’s different, this just breaks down all the stuff that changed in a plan and got better.

Right. So, you know, runtime went and CPU time went from like two and a half seconds to 639 milliseconds. And, you know, you get, you know, sort of good information about what improved in a query.

So, you know, that’s like a nice textual breakdown to give you like, oh, when I did this, when I made these changes, here’s how the plan got better. Anyway, this is, you know, this hit v1.1.

So there’s still a lot of work that I’m going to be doing on this alongside the work that I’m doing on the performance monitoring tools. Hopefully you will try these things out.

So stick around and see what I have in store for them. I’ve got a lot of neat stuff that I want to do, but I was rather proud of where I got this to in a pretty quick set of time. So, you know, I wanted to get it out there in the world and hopefully get some people trying it and giving me feedback and, you know, or maybe just telling me that it’s cool or, I don’t know, maybe telling me that it sucks.

I don’t care. The lovely thing about free software, it costs just as much to use it as it does to not use it. So there you go. Anyway, thank you for watching.

I hope you enjoyed yourselves. I hope you learned something. I hope you’ll try this tool out. Again, it’s at my GitHub repo. If you want to get there, it’s code.erikdarling.com. And if you run into any issues or problems, questions, anything like that, that’s the place to leave me feedback.

Thank you for watching. And I will see you over in tomorrow’s video where I suppose we will talk more about SQL Server performance monitoring issues. All right.

Thank you.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Free SQL Server Query Plan Analysis, Right In Your Browser

Free SQL Server Query Plan Analysis, Right In Your Browser

Free SQL Server Query Plan Analysis
Free SQL Server Query Plan Analysis

One of my goals when I first started putting together the Performance Studio application was to make it available as a standalone web tool.

It turned out to be way easier than I thought it was going to be, and now you can use it here.

You can either paste plan XML, or upload an execution plan, and all of the analysis that would take place in Performance Studio happens client-side in you browser.

Why? Because it seemed like a nice thing to do for the community. Not everyone can download and run new applications when they want, but they should be able to get quality plan analysis done when they need it.

Client Side?


Yes, client side. By default, I didn’t want any plans to get uploaded to me.

I don’t need to hoard XML, and I don’t want to get random emails asking me to delete or remove things.

This was built to work without having to store files anywhere. Yay.

Consent screen
Consent screen

You can even export the HTML results of the analysis to use in whatever HTML-y ways you want. I don’t care.

There is an option to share, but you have to choose it, consent to it, and choose a TTL on the plan (1 day to 1 year, with some stops in between). There’s even a delete button, in case you go through all that blissfully unaware of your actions. In which case, you should probably unplug.

But anyway, the whole point is that your query plan doesn’t go anywhere unless you want it to. It does not get fed into AI anywhere, nor does it persist anywhere after you close your browser tab. And it sure as heck doesn’t end up on my server.

Do You Collect Anything?


Only some vanity metrics about page views and plans shared. But really just the bare minimum to see how much people are using it.

I have no use for your IP address, to feed you cookies, or to know where you’re visiting from geographically.

If you choose to share a plan with my website, your plan gets stored in a pretty well locked down cloud hosted environment, and only for as long as you choose to leave it there. I’m not, like, daring hackers to try to hack me, but I’ve taken all the precautions I can to make sure your XML stays safe as long as I have it.

Your plans are also not really discoverable in any real way, unless and until you share the link to them. The URLs are all randomly generated, and I don’t expose shared plans in any way. I don’t even look at them, because it’s sort of a pain in the butt for me to look at them.

The next time you need a query plan analyzed, try it out.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Trying Something New With sp_QuickieStore: 80/20 Query Scoring Across Multiple Metrics

Trying Something New With sp_QuickieStore: 80/20 Query Scoring Across Multiple Metrics


Summary

In this video, I dive into an old-school approach to identifying SQL Server performance issues using SP Quickie Store and a novel method inspired by the Pomodoro technique. Traditionally, Query Store surfaces queries that consumed a lot of CPU over the last seven days, but often these results are too broad for practical use. To address this, I’ve developed a multi-dimensional scoring system that evaluates queries based on their impact across several key metrics: CPU usage, duration, physical reads, writes, and executions. This approach helps pinpoint the most problematic queries more accurately, even when they run outside of typical working hours or are unparameterized. By sharing these insights, I hope to provide a practical tool for SQL Server administrators looking to optimize their databases without relying solely on modern monitoring tools.

Chapters

Full Transcript

Erik Darling here with Darling Data, taking a break from my monitoring tool mogul personhood to talk about some old-fashioned stuff, you know, some old-fashioned ways of finding SQL Server problems that isn’t my new fancy free open source, better than all the paid monitoring tools, monitoring tool. And we’re going to talk about doing something a little bit different with SP Quickie Store. So let me start by sort of defining the problem I want to solve, and then tell you and then show you a little bit about how I chose to go about solving it. So when I look at Query Store traditionally, and by default, SP Quickie Store looks at data from the last seven days and sorts by average CPU descending, which does have a great way of surfacing things that burned a lot of CPU.

Often when you find those things and show them to people, though, they say, well, that runs at night, it doesn’t matter. And at first, I tried to cope with that by adding in a few parameters, like say, workdays, when set to one that will, by default, look for everything between 9am and 5pm on weekdays only, and show you that. Of course, of course, the hours, work start and work end, serve to define your work day to your needs. But still, you needed a lot of other sort of filtering in order to come to an agreement on which queries mattered.

For example, some queries would have a, you know, would still execute during the day, and would burn a lot of CPU on average, but people would say things like, well, it appears to have only executed once or twice. Or they would say things like, oh, you know, that’s not user facing, I don’t know, like, you know, it’s a lot of stuff that would come up and be like, okay, well, you know, we can filter, we can say, set a minimum number of executions. What makes sense? How many executions would you start to care about? I don’t know. And then, you know, there’s all sorts of other things that sort of act as traps in there, like to say, unparameterized queries, where they generate a new query ID for every new literal value that gets passed in.

And so it looks like they like something is maybe executed, like, you know, a few times, maybe, but there’s real but like, if you look at like the aggregate for that query hash, which is something I added in also was include query hash totals. Well, you know, I think you sort of get the point, it still was really hard to find, like, what is really stressing the server out. So the way that I chose to go about trying to fix this now is by sort of doing like the Pomodo method, but with a matrix.

So the whole theory that, you know, 80% of your blank is, you know, caused by 20% of your blank rule of things, right? Like the 80-20 rule effectively, right? Except I want, except, you know, if you do that across just one resource, you’re like, well, okay, CPU, fine, right? So I chose to use a group of resources that I care about, which are CPU, duration, physical reads, not logical reads, logical reads are for people using SQL Server 2008, we’re not there anymore, writes, and executions.

So I’m taking those five metrics, and I’m finding things that have a meaningful impact on your server across those. So that’s what we’re going to talk about today, and I’m going to show you kind of what that looks like. Down in the video description, oh boy, all sorts of useful links, you can hire me for consulting still, you know, before the robots put every independent consultant out of business, you can still talk to me.

You can purchase my training before, you know, sort of get the human angle on all this stuff, not just the, you’re absolutely right, LLM angle on any questions you might have. You can become a supporting member of the channel, wouldn’t that be just grand if you gave me about $4 a month to, you know, pay just the tax on a cappuccino. You can also, for free, ask me office hours questions, and even more for free, like, subscribe, and assuming that you’ve got any friends who are still living people and not large language models, well, you can tell them about the channel and maybe they can subscribe too.

So that’d be, wouldn’t that be just fantastic? We could all be friends. Speaking of friends, I’m your friend. And because I’m your friend, and I want you to be happy, and I want your life to be better, I have made a free open source SQL Server performance monitoring tool. No complicated installations, no licensing, no email needed, no phone home telemetry, no nothing.

Just a bunch of T-SQL collectors running on a schedule, populating dashboards, basically all the stuff that I would look at as a consultant if I were trying to keep a SQL Server happy from a performance point of view. And also, the thing that I think is pretty neat is that I have built in to the monitoring tool, it is opt-in, of course, it is not monitored by default, MCP servers, so if you want to have your robot friends look at just your performance data, only the collected performance data, you want to ask them questions specifically about that nicely aggregated, collected, you know, gentrified data that I get for you, you can do that, and they can summarize very nicely all the stuff that has happened with your server that is making it unhappy.

I will be out and about in the world, I guess by the time this goes live, I forgot to take off DataTune Nashville, but I like DataTune Nashville, maybe there will be one next year and I’ll go back for that. So, if there is, DataTune Nashville, that’s a great conference. Data Saturday Chicago coming up, well, when this publishes, it’ll be this week.

SQL Day Poland in May, woohoo, and Data Saturday Croatia also, woohoo, June 12th and 13th. And if this slide will go on, you will see that I have at least updated the slide now that we have left the murderous winter, we have survived this Donner Party winter, and we are now entering spring, and the hills are once again alive with the sound of database music, which is quite harmonious if you ask me.

Anyway, let’s go do this thing. Let’s go over to SQL Server Management Studio, something that we haven’t seen in one of these videos in a little bit, not since the Get AI Ready with Erik series, a triumphant series of videos. So, let’s talk about these results a little bit. So, before I even do this, what I want to point out here is that if you use the at help parameter in SP Quickie Store, what you’ll get is in the results, this text right here.

So, like, I made sure that this all had a very good explanation so that it would be durable for you to refer to should you choose to use the new Find High Impact parameter and look at Query Store in this way. So, we’ve got, you know, sort of like some normal stuff that you would expect to see, database name, start date and end date, that’s the window of time that we looked at. And then the sort of primary window where things all come up will tell you sort of when the query runs most, right?

So, like, I just chose to sort of separate the day in two. So, 7 a.m. to 7 p.m. is working, 7 p.m. to 7 a.m. is off hours. I also have some stuff in there for weekends and I also have some designations in there for things whose execution profile sort of spans everything.

Well, I’ll show you what that looks like in a moment. Then we have some of the normal identifying characteristics you would expect to see about a query. Things like the object name, the SQL text, the query plan, the top weights if they’re available for it, the query hash.

And something that I want to point out here is that Query Store has query hashes in it. And I chose to use those as identifiers instead of query IDs for the sort of, because of what I, the problem that I was talking about earlier in the video where, you know, if you have unparameterized queries or something, you might not look like these are significant when in fact they are significant.

So we’re doing this by query hash. I do show you the query IDs and query plans that those query hashes are associated with. But, but like my default grouping is query hash.

There’s also a count of how many distinct query IDs that hash generated, which gives you a good idea if the code is unparameterized. You might want to look at it that way. And of course, a count of the plans.

Then there is an impact score, which is a number from zero to one. So the closer this number is to one, sort of like with the, actually am I, sort of like the reverse of the vector search stuff, or the semantic search stuff that I was talking about, which closer to zero was better.

I guess closer to zero means your query is less impactful. So that is good. But closer to one means you have something very impactful about this query. And that’s, and that score is generated by effectively getting the average percent rank across all of the active dimensions that we’re looking at.

And then there’s also stuff like total executions. So tell you how many, showing how many times that query has executed. And then what its share of CPU duration, physical reads, writes, and memory are.

I think I forgot to mention memory before, but that’s okay. I’m mentioning it now because I see it and we’re good. So after that, we then have some diagnostics, which is sort of a rule-based layer that is layered on top of the score, which kind of tells you like some stuff that we found that I found while looking at your query sort of execution patterns, not like the query plan or anything, just the, like the way it runs and stuff.

There’s also the, I also include weight stats for the query. So like, so like what we look at is, you know, like any like blocking resource contention IO. We try to figure out if it’s parameter sensitive, right?

If there’s plan instability. And then if like, you know, like, let’s say that if we have a select query that runs and it’s causing writes, then we might say, Hey, this thing might be spilling, or this thing might have spools in it that you could fix and take a look at.

And then we also flag like metrics by volatility. So if there are really, really big swings in the minimum or maximums for things compared to the averages, which can be signs of either parameters, like parameter sensitivity or something, then that means that you might have a situation like that, that you need to take a look at, right?

Cause it’s sometimes this query is very fast. Sometimes this query is very slow, but like, but like it might look okay on average, but it’s got a wide swing to it.

So what this ends up looking like in a result form, and you’ll have to sort of excuse that a lot of this stuff says off hours because, you know, my, I can’t remember how many times I’ve said this now, but all my, all my windows VMs and SQL servers installed on are in Pacific time.

And it seems like no matter when I run a workload, it’s always off hours. And maybe it’s just a California thing. They’re all on vacation. But this looks like this. This is the, of course, the database name that we ran it in.

Uh, I I’ve been using hammer to be a little bit more recently because I can just like run a workload and it causes a whole bunch of stuff to happen, right? Like high CPU blocking deadlocks.

And I can just come back and say, Oh, cool. You did all this stuff without me having to write like a synthetic stack overflow workload, which would be annoying and time consuming. And not something I really want to do right now with my time.

I got a lot of other stuff going on. Um, so then you see like the period that we ran this for. And then the, this is where we start getting into the stuff about the actual query. So, um, you see stuff that executed during business hours.

Again, this would be the 7 a.m. to 7 p.m. Right now this is hard coded, but well, you know, I’ll, I’ll, I’ll open this up a little bit more. You know, people start making requests about it.

Um, off hours would mean 7 p.m. to 7 a.m. And then down here a little bit, we have one that actually says there’s actually spread. And that means that it is spread across both off hours and business hours.

Um, if there were this, any of this were coming from a store procedure, that name would be there. We have the query text and then we have the query plan. Um, we have all of the aggregated weights for that query.

So you can kind of get a sense of like what that thing is up to. Um, again, these come from query source. So they’re the high level bucketed weights. They’re not the granular weights that you might be useful, used to, uh, if you’re looking at weight stats.

And then down here we have, um, so like, oh, like just, uh, you know, a couple of examples. I know it’s a little cut off on over my head, but that’s okay. Uh, so we have, so again, we’re grouped by query hash.

Because as you can see from all, just about all of these lines, one query hash can spawn a whole bunch of query IDs. If we were to do this by query ID, we would have a far less good view of which queries were most impactful in the aggregate, right?

Because like, look, there’s like seven different query IDs. It’d be like, oh, well, these all look fine. What could be going wrong here? Uh, and then we have the list of plan IDs, right? So, uh, you know, all kind of nice things to have in front of you.

Uh, where, you know, it’s like sort of interesting stuff where two different query IDs, like the group to the same hash only have one query plan ID, which is strange and bizarre. But, uh, then coming over this way, this is where we start to get into, um, uh, like which queries are most impactful.

So impact score, again, the close risk, I screen out anything that’s less than 0.5. Cause that’s where like, that’s where like stuff gets like pretty insignificant. So this is all, every impact score you see is going to be from 0.50, uh, and one, right?

With one being the highest you can get. So the closer this gets to one, that means it’s, that means it’s like, like this query isn’t like the 98th percentile of how much stuff it does, right?

So, um, like this one, like, you know, across, um, like all of these metrics, like all of this, all of this can sit, all of this like contributes to the, the general score of things. And then, you know, coming over a little bit further, um, you know, this is where the diagnostics come in.

Uh, the TPCC database is actually a little bit more interesting in here. So let’s switch over to that and let’s just run this for that one. So, uh, this, so this one’s a little bit different, right?

I mean, like still it’s all off hours cause stupid West Coast VMs, but we actually get a bunch of store procedure names in here. And we get slightly more interesting numbers.

I think anyway, um, over in this section. So if you look at just this top query where the impact score is 0.91%, and you kind of look across like, you know, that, holy cow, that’s a lot of executions.

Uh, wow. Uh, that even took me by surprise. Uh, you can see that, you know, this query for this database, uh, for the timeframe that we looked at was 18% of the CPU, uh, 1.3% of the duration, 3% of the physical reads, 54.2% of the writes, and 28.6% of the executions.

So like this query is contributing a lot of load to the server. This is a query that you would want to look at. Now, based on like the execution count and all that, you know, like it’s unlikely that you could make this query a lot faster, but this is a, this is a query that you might want to look at and say like, Hey, like, do we need to run this this much?

Can we cache this somewhere? You know, like the usual, I mean, maybe it is something you could tune in, like, you know, make fat, like make fat. You’re not going to make it execute less, right? You might could probably bring down the CPU share.

You could probably bring down like some of the other numbers and make it less significant. But it also might be a query that you want to think about not running as much because that is a bonkers number of executions. And then over here, not really any diagnostics, but we get some volatility, volatility metrics in here, where sometimes the duration is 707 times the average.

And then sometimes the physical reads are 93,000 times the average. And sometimes this one’s duration is almost 11,000 times the average. So this is just a different way of sort of looking at the queries that run in query store, scoring them across a bunch of metrics and figuring out which ones are contributing the most across all of those, not just to one.

Because oftentimes people are not just concerned about one thing. They’re like, well, CPU is high and, you know, we’re constantly going to disk and, you know, like problems, chocolate, right? So like this is a good way for you to start figuring out like as a whole, like which queries are contributing across a variety of metrics so you can attack things that are most meaningful across all of those.

Anyway, this is right now in the dev branch for my GitHub repo. That is code.erikdarling.com. Shockingly, this is also the same repo that my monitoring tool is in.

So I’m just going to bring that up. If you go to code.erikdarling.com, it’s going to be in the Darling data repo. This is, of course, SP Quickie store.

If you can go check that out. Again, it’s on the dev branch. This is not live on the main branch yet. But after I have let this soak in a little bit and I’ve had a chance to kick it around a little bit more, it will make its way over there.

So again, just a different way of sort of looking at query performance, a different way of finding queries that are meaningful to go after. I think it’s pretty cool.

Hopefully you do too. Hopefully it helps you find queries that are meaningful on your servers in ways that, you know, it’s like any other tool you look at to sort of get this information from. It’s always just like you sort by one metric, right?

And even if you sort it by two metrics, like that one metric sort of dominates and sends a much higher signal. The other metrics sort of like wishy-washy, right?

So it’s like just to sort of talk about the problem in the opposite way. Like, you know, like I said, by default, Quickie store orders by average CPU. If you flip that and you say, I’m going to order by executions, you find something executes a lot, but it’s like stuff that’s going to have like zero millisecond, like CPU and duration because it’s like very fast code that’s not doing anything.

So you’re still finding a problem, but you’re not finding sort of things that are causing like more than one problem at once, right? So this is just one.

This is a good way to not just focus on one metric or one problem. This is a great way to look at your queries across a bunch of metrics that will cause you problems. Anyway, thank you for watching.

I hope you enjoyed yourselves. I hope you learned something. I hope you’ll check this thing out. That’s spquickiestore in my GitHub repo right there. Give it a shot. Let me know how that goes. And I don’t know. See you on the main branch someday soon.

All right. Thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

SQL Server Performance Office Hours Episode 55

SQL Server Performance Office Hours Episode 55



To ask your questions, head over here.

Summary

In this video, I dive into some common questions and issues faced in SQL Server performance tuning, particularly focusing on scalar functions causing lengthy execution times and the challenges of using extended events for monitoring. I also share insights on parameter sniffing and the limitations of the parameter sensitive plan optimization feature. Additionally, I discuss my free open-source SQL Server Monitoring tool, emphasizing its value as a cost-effective alternative to commercial tools. The video wraps up with an overview of upcoming events like Data Saturday Chicago and SQL Day Poland, inviting attendees to connect and collaborate in person. Lastly, I touch on the ongoing development of AI within SQL Server and Azure, sharing some use case scenarios from proof-of-concept projects.

Chapters

  • *00:00:00* – Introduction and Free SQL Server Performance Monitoring Tool Overview
  • *00:05:29* – Optimized Locking in SQL Server 2025
  • *00:14:18* – Conclusion and Future Plans

Full Transcript

Erik Darling here with Darling Data, and it is, oh, quite a Monday, isn’t it? I suppose we’re all going to have to deal with this in our own special way. But anyway, that means it’s time for Office Out. That’s how I’m going to deal with it. Answer some questions, because people seem to keep having them, and I seem to keep having answers, so I don’t know, maybe let’s keep this gravy train rolling. He-he! Alright. Down in the video descriptions. Well, actually, there’s only one of them. I don’t know why I pluralized that, but there’s all sorts of helpful links. If you would like to hire me for consulting, purchase my training, become a supporting with money member of the channel, you can do that. You’ll also find links to continue to ask me office hours questions, so this gravy train will stay on the tracks. And as always, if you enjoy this content in any way, shape, or form, and you feel that it may enrich the lives of those around you, please do like, subscribe, tell a friend, all that good stuff, so that I can continue racking up these ground-breaking, record-setting view and subscriber counts.

I’m hot on the heels of that Amiga Repair channel. I swear, one of these days, so close. I’m going to overtake that thing. I also have, I mean, semi-recently, free SQL Server Monitoring. You ever want to have performance monitoring for your SQL Server, but, you know, all the other paid tools out there kind of suck, and, you know, aren’t worth the money, and, you know, I’ll probably be in jail.

Well, I got a free one. It is free. It is open source. There’s no email sign-up.

I’m not phoning home with any information from you. It’s just a bunch of T-SQL collectors running on a schedule, getting the most important performance metrics that your server has to offer. All sorts of stuff that you would want to see in a monitoring tool, nicely grouped together, colorful charts and graphs, help you tell a nice story about what’s going on.

Weight stats, blocking, deadlocks, top queries, all sorts of, you know, cool internally stuff that fancy-pants consultants like me have figured out how to collect data for over the years in a meaningful way. Nox-style dashboard, so you can see what’s up, what’s down, what’s hot, what’s not. And if you are, you know, in favor of our new robot overlords, there are also a number of built-in MCP tools so that you can ask questions of your performance data.

And just the collected performance data, nothing else. And you can have LLMs kind of tell you a little bit about what’s going on, where and when with your performance. You can just talk to your data naturally, which is, I think, kind of cool.

And I don’t think anyone else is really doing that. So, if you want to get that for free, it’s all right there. On my GitHub repo, code.erikdarling.com, or if you are bold and brave enough to remember that link or just click the link down in the video description, you can find all that stuff.

Oh, well, you know what? Data 2 Nashville already happened, which means Data Saturday Chicago is this weekend. And, well, that certainly snuck up on us, didn’t I?

I feel like it was just yesterday I was saying February is never going to end. So, now I got to make a March picture. I’m going to have a database.

Databases are going to play baseball this March, I think. But I also will have SQL Day Poland coming up May 11th and 13th. And Data Saturday Croatia coming up June 12th and 13th. So, you know, if you are in Chicago, Poland, or Croatia, stop by and see me.

I would love to see you. Maybe we’ll get along. Maybe we’ll be best friends. You never know. I could use more friends. But, yeah, this is February, February. It’s done.

It’s done. I got to fix this one. All right. I’ll put that on my to-do list. February, over. All right? No more of this. This was a terrible winter. I hated it.

All of it. Anyway, we have questions which have been asked. And so we have questions which will be answered here. Let’s see what the first one we got.

Recently encountered a query that used scalar functions, which was taking 50 minutes to complete. Not an unnatural state of things, from my experience.

After hours of aimless clicking. Well, don’t click aimlessly. Click with purpose and meaning. Why would you aimlessly click? I trace the issue back to an extended event.

Is there a specific metric which I could have used to indicate performance issues from extended events? So I think within the query that you were running itself, likely not. You may have noticed either a session reading from an extended event, if someone were actively monitoring it.

Or you may notice other extended event-related weights on the server go up around when you were executing this query. But in general, no. I haven’t found a great way to track this one down.

And I have seen it happen a couple of times. I think early on with SP human events, before I added the skip plans thing. Funny story behind that.

Someone wanted me to watch a load test. And I was like, well, you can use extended events for that and capture some stuff. And so we set it up. And they started running stuff.

And they were like, this is taking like four times as long. And I was like, yeah, that’s extended events. So I don’t know. I wish I had a good way to say, lesson learned.

Always check extended events first. But it’s just so rarely a thing. I’m glad you figured it out. That’s good for you.

But no, I don’t have a great way of, you know, predetermining if there is a harmful extended event running aside from, you know. Well, I mean, like most client environments I walk into, like they have like some script from 2003 that looks at weight stats that they run once in a while. You know, the fact that like there are store procedures that do all this stuff a little bit more surgically is big news.

So most extended events are not up and running that you, you know, at least ones that you wouldn’t expect to see. For you, apparently you have someone to watch out for. And that someone might be you.

No, you never said who created that extended event. You just said you traced it back to an extended event. So I don’t know. You or someone you know is the murderer. We don’t know who.

I see that PSPO, that is the parameter sensitive plan optimization, works for update in SQL Server 2025. Have you had any good results with it? I’ve not really had all that many great results with the parameter sensitive plan optimization generally.

No, Microsoft, you know, did not think through bucketing things well, I don’t believe. And my gripes with the parameter sensitive plan optimization remain in place even through SQL Server 2025. There are certainly times when it works.

I can be perfectly OK. But yeah, there’s a there’s a there’s a lot to be desired with that one. You know, like the the most common and least common top bottom buckets and then everything else in the middle bucket.

Kind of like this, this thing in the middle can can span a lot of different row counts and you can end up with sort of ineffective plan variants for that. So I don’t know. Another another could have been, but probably ain’t going to be.

Hey, I always tell you, we got fabric. Lucky people. All right. Your new monitoring tool is sick.

Call the doctor. However, I am also working on a small niche commercial SQL Server monitoring tool similar to the big hitters, centralized repo, but much cheaper as a passion project. So I don’t think it’s my tool that’s cooking you.

DBA dash is free and already does that. Like, you know, I don’t know what you are a bit vague for what what is small and niche about your monitoring tool. I don’t know what specifically you you what you have niched yourself in.

But I think, you know, you like depending on like what your niche is, who that market is and how underserved they are. It might be sort of tough to get like software as a service off the ground is like, you know, I mean, you said it’s a passion project. So maybe the money doesn’t matter, right?

Like maybe if you can just get it out to some people and they’re happy with it, that’s good enough for you. But since you’re asking people to pay for it, I assume that you expect to make money from it. And that that might be a tough proposition in this day and age.

Unless you are already a fairly large, well-established enterprise vendor, you may have a hard time getting people to pay for something. You know, like like I mean, like obviously the thing that I’m doing is totally free, right? Like download it, take it, use it.

But, you know, it’s an open source project. And I always say, you know, hey, if you like this, you can contribute to it. If you find this useful, you can contribute to it. It’s been a very, very small portion of anyone caring enough to contribute to it. Thank you very much to anyone who has.

You’re beautiful, wonderful people, but you are the vast minority on that. So I don’t know who you’re going after with this tool. I don’t know what your I don’t know what your hook is, but good luck. I think it’s going to be a tough time for that sort of thing.

Let’s see. Could you share a few concrete use case scenarios where you have implemented AI within SQL Server 2025 in a production environment? Is this an interview or any in energy or banking sector?

Awful specific. So not SQL Server 2025, but Azure SQL, where the vector stuff has been around a little bit longer. I have helped people with some proof of concept stuff to for them to figure out how they could fit SQL Server’s vector search stuff in to various things that they are interested in trying, whether it whether that is, you know, like sort of like a chat body thing or sort of a developer assistance thing, you know, sort of getting into, you know, like, finding related issues, things like that.

I’ve certainly helped people with that sort of stuff, but I don’t know at this point how far those projects made it along. You know, people were fairly happy with the proof of concept, but, you know, the vector stuff in SQL Server is kind of dead in the water until vector indexes become writable. I don’t know when that’s going to be.

I don’t know what Microsoft is busy with, aside from Fabric, because, you know, hey, we got that. Remember that one? Remember that joke? So, I mean, like, a lot of people, I just think, aren’t going to be able to use it well until Microsoft actually finishes the features around it.

Vector indexing being one of them, getting a lot of that stuff out of the preview program and into GA. You know, until that happens, I think a lot of people are just going to say, that’s nice. Okay.

Cool. I’ll be over here waiting. You take your time. So, you know, I wish that it were further along. I wish that it were a little bit more well-developed and a little bit more, I don’t know, production-ready in a lot of ways.

But here we are. Here we are. That is not where we are today. We are here today, but that is not where Vector Search and SQL Server 2025 or Azure is today.

We are standing by. Been impressed by optimized locking yet? No.

Seen it? Looks cool? Been impressed? Nah. It’s all right. Anyway, thank you for watching. Hope you enjoyed yourselves.

Hope you learned something. And I will see you in tomorrow’s video where I will undoubtedly talk more about my free SQL Server performance monitoring tool a little bit more. Because I would desperately love to spread the word of that thing further and hope that more people get out there, try it, you know, in a perfect world, more people would get involved with the project, perhaps even sponsor the project.

But, you know, reporting issues, contributing code or ideas, asking questions, all that stuff helps me move things forward in a way that makes things better for everyone. So, if you’re into those things, stop by code.erikdarling.com. All right.

See you tomorrow.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Free SQL Server Performance Monitoring System Health Extended Event

Free SQL Server Performance Monitoring System Health Extended Event


Summary

In this video, I delve into the OCRAP section of my free SQL Server performance monitoring tool, Darling Data, which surfaces critical issues often hidden in the System Health extended event. This section is like a red flag area where you’ll find nasty things such as bad pages, memory dumps, suspect pages, non-yielding tasks, sick spin locks, and severe errors—basically everything that can go wrong with your SQL Server health. I walk through how to access this information via my GitHub repo at code.erikdarling.com, highlighting the importance of paying attention to these warnings as they can indicate serious underlying issues that might require immediate professional intervention or even a complete system overhaul.

Chapters

Full Transcript

Monitoring tool mogul Erik Darling here with Darling Data. And in today’s video, I want to go over what I call the OCRAP section. The OCRAP section are the real nasty things that can come out of the system health extended event. Normally, you might find that my free store procedure SP health parser would surface these things. move to sit there and run that and deal with the outcome. Well, you can use my free SQL Server performance monitoring tool and you can get all of this goodness done for you. Now, the reason why I call this the OCRAP section is because this is really is where a lot of like you’ll see like this server is not doing well type stuff. So, you know, bad pages, memory dumps, suspect pages, non yielding tasks, sick spin locks, severe errors, all sorts of information about CPU scheduler health, IO warnings, and memory conditions, like stuff that you don’t want to see, but stuff that if it’s happening, you should pretty, you should know about pretty quickly. So if you want to check this out, it’s all at code.erikdarling.com. That’s the shortcut to my GitHub repo. You go there, go to the performance monitor repo, there’s releases with all all of the zip files that contain all of the executables you need to get up and running with this. There’s also a wonderful readme file. That wonderful readme file contains all sorts of pertinent information to how you can get up and running with this without sending me emails asking me about why it’s saying that your server certificate isn’t trusted because you didn’t say to trust the server certificate. So with that out of the way, let’s jump over here. And let’s look at the system events tab. Now, again, like, this is the Oh, crap section. And if you see anything in here, you should call a professional doesn’t have to be me. I might not even be the right professional to talk to. But if you see stuff getting populated in here, you might want to, you know, show some concern with this stuff. So let me just quickly do that. So see some stuff. But so under here are corruption events. So if we detect bad pages, if you have any dump requests, no jokes, right access violations or general access violations. If you have non yielding schedulers, latch warning, six spin locks, or anything like that, that will show up in here and you will, you know, undoubtedly cry. I don’t know, switch to Postgres, whatever it is people do these days. Under severe errors, you know, like I, obviously, obviously, every single server has some errors on it, you know, mostly, you know, it’s like, it’s kind of like me, it’s either like either deadlocks or me killing off queries that I’m annoyed with. So there are some severe errors in here, which are mostly one of those two things. So let’s not worry about the video crap section is when stuff happens in here. Yeah, you run run for the run for the jungle, I guess. What is it? I don’t know. Anyway, scheduler issues, this will tell you tell you not about non yielding schedulers over over time. There is a handy little grid down a little status bar down here that will tell you about any current stuff that is going on.

There is a lot of pressure stuff in here. If this graph populates, you are having some memory issues. CPU tasks. Well, I mean, this is this is sort of a running trend of like how many workers you’re allowed to have and how many workers you’ve created. So if any blocking occurred, so like if you see that blue line, get up real close to that dotted sort of like, I don’t know, like Marigold line or something, that might like, like that might be something you care about, because that would mean that you are running out of worker threads. And that would mean, you know, thread pool and potentially other unfortunate circumstances. So worth looking in there. Memory broker stuff, you know, sort of like, you know, what’s consuming memory, things like that. And then memory node OOM, which is something no one wants to see. Again, I don’t have any of this bad stuff going on on my servers. So you know, we’re kind of lucky there. We did see like, at least an alert pop up. So you know, there’s a sign of life, I can acknowledge that alert, and things will mostly go away. But anyway, just, you know, again, this is all from the system health extended event, this is all running on most versions of SQL Server, and my free monitoring tool goes and collects it and will warn you about the old crap stuff. Right. So like important stuff for you to know about if you are dealing with SQL Server, because if you are dealing with a SQL Server that has like a lot of stuff going on in here, like a lot of stuff going on in here, you might might need to just find a new job, right? Fill out a resume, I’m done, quit.

We’re through here. Sorry about everything. I don’t know. I’d be scared seriously.

Anyway, thank you. Thank you. I hope you enjoyed this. I hope you enjoyed this. I hope you enjoyed this.

Bye-bye. Bye-bye. Bye-bye. Bye-bye. Bye-bye.

Bye-bye. Thank you so much.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Free SQL Server Performance Monitoring Blocking and Deadlocking

Free SQL Server Performance Monitoring Blocking and Deadlocking


Summary

In this video, I delve into the blocking and deadlock monitoring capabilities of FreeSQL Server Performance Monitoring, a tool I’ve developed and made available on GitHub. With a focus on practicality and ease-of-use, I explain how we leverage extended events for both blocking and deadlock scenarios, ensuring that you can identify and address performance issues efficiently. Whether you’re using system health or prefer to rely on the block process report, my monitoring tool streamlines the process by automatically setting up necessary configurations and providing easy-to-understand visualizations of your SQL Server’s performance trends over time.

Chapters

Full Transcript

Erik Darling here with Darling Data, and in this video I want to go over how, or rather, how and what we collect in my FreeSQL Server performance monitoring tool, available on GitHub for free, way better than all the crap you pay for, around blocking and deadlocks, and although this can help you identify when you had terrible, problems. So for blocking in deadlocks, we use extended events for both. The block process report will send data to an extended event, as well as the deadlock stuff in there. Now for the deadlocks, if you don’t have a dedicated event set up to capture deadlocks, the monitoring tool will set one up for you. But if you would prefer to go to system health, then we can, then we will fall back to system health. There are also some charts that trend deadlocking and blocking activity over time. You know, like it’ll tell you all the sort of, you know, normal stuff that you would expect to see from looking at, you know, blocked process report or deadlock reports. If you were, you know, using one of, like, you know, say a free community script, like mine, a human events block viewer or SP blitz lock, it’s a very commensurate experience because a lot of the stuff is powered by community tools. That is why I am rather that is part of the open source ethos that I am hoping to instill and hope also hoping to, you know, you know, call broader attention to with people. There’s a lot of great stuff out there. But some people are afraid to run it, don’t know how to use it. And this sort of just encapsulates it and makes a lot easier. So blocked process reports go into the extended event, they are all XML, which is a damn shame because XML is an awful pain in the butt to deal with. But I try to do try to make that as easy as possible for you by doing all the shredding and picking apart that you would that you would need done. I do set up you set that up with a five second SP configure where I can. The deadlock monitor stuff just runs automatically. You know, you don’t really need to set a threshold there. If SQL Server hits a deadlock and it can log it, it will. That’s pretty much it. There are some platform differences. In AWS RDS, SP configure is not directly available the way that it is in many other SQL Server platforms. It is available via an RDS parameter group.

So if you are using this to monitor RDS, you will need to set things there. In Azure SQL DB, it is fixed at 20 seconds. I don’t know what Microsoft’s scared of, but they set it up pretty high. So the block process report will sweep through every 20 seconds, look for blocking. And if it happens to catch any, it’ll show it in there. With both of these, though, wherever the reports are collected, you can download them and do whatever you want with them. You can send them to someone. You can, you know, put them in another tool that parses them out, whatever you want to do. If you want to check out this awesome free SQL Server monitoring tool, again, go to code.erikdarling.com. It is in the performance monitor repo.

And if you go to the releases section, you’ll find nightly releases and the latest sort of stable build release. So whichever one you’re feeling, you’re feeling cool with testing out, you can, you can grab and start monitoring your SQL Server’s performance for free. So let’s look at these tabs, just so you get sort of a sense of what you’re dealing with here.

So I set this back 30 days because apparently I haven’t done anything interesting on my server in a little bit. But if you look in the sort of locking trends tab, this will show you lock, just lock weights as they occurred. This will give you sort of the count of blocking and deadlocking events.

And this will, and these two graphs down here will sort of start to show you the durations of them. So where these durations get higher and higher, you had bigger and bigger problems. I’ll try to move out of the way of the deadlock one.

So you can see in here, this thing spikes up when we had a bunch of deadlocks. This current weights tab is sort of an interesting one. So I’m working on sort of how to get this across the best. Because like if you look at this, you know, you have some, like you have lock weights down here that are hitting, you know, about 45 seconds.

But that’s very low. And then over here, you have some absolutely tremendous lock weights, right? I mean, look at this, right?

That’s 974,000 milliseconds. And then this thing up here, LCKMIX for 2 million milliseconds. So some of this data is a little hard to get, like, good perspective on. So I’m working on how to best visualize that.

I need to figure out maybe a little bit better way of doing that. But for now, if you see lines go up very high and you see numbers into the millions here, that’s probably not a good sign, right? I think it’s a challenge for anyone designing charts and graphs is how to deal with these extreme outliers, like 975,000 milliseconds and 2.6 million milliseconds.

They are challenging to deal with from a visualization perspective. But I will try to figure something out. It’s like even if you look in here, where it’s like LCKMIS, 219,000 milliseconds, that’s a brave number of milliseconds.

But down in here, I break down blocked sessions by database, right? So, like, the primary ones in here that are – go away, Visual Studio. No one needs you.

So the primary ones in here are HammerDB. That’s in green, right? So HammerDB TPCC had a lot of blocking going on at some point. And then the HammerDB TPCH database also had a bunch of blocked sessions in it. So I do try to point out not only, like – I try to, like, give you better, more granular breakdown of which databases have the most blocking in them so you can make decisions about which ones to sort of troubleshoot.

Over here are the shredded block process and deadlock reports. So if we look under blocking here, you know, it’ll be all the sort of normal stuff that you would expect to see if you were to run SPHumanEventsBlockViewer. You know, you see the blocking chain in here.

You can figure – like, you know, it’s a very similar experience to running that tool, the query text. And if you want to download the block process report XML, you can do that very easily there. And then under deadlocks, you’ll see very, very similar to what you would get back from SPHumanEventsBlock because guess what?

I run SPHumanEventsBlock to do the parsing, right? Because it’s a lot easier than having to rewrite code that I wrote the first time again. So – oh, don’t run away from me.

Who do you think you are? Silly goose. Anyway, just a quick overview of the blocking and deadlocking stuff in my free SQL Server monitoring tool. If you want to get a hold of this and start monitoring your SQL Servers for free, again, that is all at code.erikdarling.com.

You can go in, start getting information, start troubleshooting SQL Server. And if you like this project, you appreciate this project, and you would like to sponsor it, you can absolutely do that. Or if you start looking at all this monitoring data and you say to yourself, hey, I’m in way over my head, you can always call me because I’m still – I still am doing a consult, and I can still help you with your SQL Server performance problems.

So thank you for watching. Hope you enjoyed yourselves. Hope you learned something, and I will see you in tomorrow’s video where we will dig a little bit more into some of the foundations and fundamentals in my free SQL Server performance monitoring tool.

Alright, thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Making A Website For Tracking My GitHub Stats

Making A Website For Tracking My GitHub Stats


GitHub only keeps 14 days of traffic data. If you don’t capture it, it’s gone. I got tired of losing that data, so I built a self-hosted dashboard that collects stats daily and keeps them forever. I also got tired of running reports every day.

It tracks stars, forks, traffic, clones, and release download counts across multiple repos, and displays everything on a single page with interactive charts. The whole thing is a Python script, a static HTML file, and a cron job. No framework, no database, no build step.

You can see my live version at stats.erikdarling.com, where I track PerformanceMonitor, PerformanceStudio, and DarlingData.

All the code is on GitHub: erikdarlingdata/github-stats-dashboard

What you need


  • A Linux server (any cheap VPS — mine runs on a $5/month Hetzner box alongside other stuff)
  • Python 3 (no pip packages, just the standard library)
  • Nginx
  • A GitHub fine-grained personal access token

Step 1: Create a GitHub token


Go to GitHub → Settings → Developer settings → Fine-grained tokens → Generate new token.

Scope it to only the repos you want to track. Under repository permissions, you need:

  • Metadata: Read-only (selected by default)
  • Administration: Read-only (required for the traffic and clones API)

Save the token somewhere safe. You’ll put it on your server in a moment.

Step 2: Set up the server


SSH into your server and install nginx and certbot:

sudo apt update
sudo apt install -y nginx certbot python3-certbot-nginx

Save your GitHub token:

echo "ghp_your_token_here" | sudo tee /etc/github-stats-token > /dev/null
sudo chmod 600 /etc/github-stats-token

Create the web root and grab the dashboard files:

sudo mkdir -p /var/www/stats/data

# Clone the repo (or just download the two files you need)
git clone https://github.com/erikdarlingdata/github-stats-dashboard.git /tmp/stats-dashboard
sudo cp /tmp/stats-dashboard/index.html /var/www/stats/
sudo cp /tmp/stats-dashboard/collect.py /opt/github-stats-collect.py
sudo chmod +x /opt/github-stats-collect.py

Step 3: Configure the collector


The collector needs to know which repos to track. You can either edit the DEFAULT_CONFIG dict at the top of collect.py, or create a config file:

sudo tee /etc/github-stats.json > /dev/null <<'EOF'
{
    "data_dir": "/var/www/stats/data",
    "token_file": "/etc/github-stats-token",
    "repos": [
        "your-org/repo-one",
        "your-org/repo-two",
        "your-org/repo-three"
    ]
}
EOF

Replace the repo list with your own. Use the full owner/repo format.

Step 4: Configure the dashboard


Open /var/www/stats/index.html and find the REPO_DISPLAY object near the top of the script block:

const REPO_DISPLAY = {
    "repo-one":   { short: "R1", color: "#58a6ff" },
    "repo-two":   { short: "R2", color: "#3fb950" },
    "repo-three": { short: "R3", color: "#d2a8ff" },
};

The keys must match the repo names from your config (the part after the /). The short value is what shows up in the tab buttons and chart legends. Pick whatever colors you like.

Step 5: Set up nginx and SSL


Point your domain’s DNS A record to your server’s IP address, then configure nginx:

sudo tee /etc/nginx/sites-available/stats.example.com > /dev/null <<'EOF'
server {
    listen 80;
    server_name stats.example.com;

    root /var/www/stats;
    index index.html;

    location / {
        try_files $uri $uri/ =404;
    }

    location /data/ {
        add_header Cache-Control "no-cache";
    }
}
EOF

sudo ln -s /etc/nginx/sites-available/stats.example.com /etc/nginx/sites-enabled/
sudo nginx -t && sudo systemctl reload nginx

Then get a free SSL certificate:

sudo certbot --nginx -d stats.example.com

Step 6: Run the first collection and schedule daily runs


# First run
python3 /opt/github-stats-collect.py --config /etc/github-stats.json

# Schedule daily at 6:15 AM UTC
(crontab -l 2>/dev/null; echo '15 6 * * * /usr/bin/python3 /opt/github-stats-collect.py --config /etc/github-stats.json >> /var/log/github-stats.log 2>&1') | crontab -

Visit your domain. You should see stat cards, charts, and a release download table. The charts will fill in over the coming days as snapshots accumulate. If you want it to run hourly, you can do this instead:

# Schedule hourly at :15 past the hour
  (crontab -l 2>/dev/null; echo '15 * * * * /usr/bin/python3 /opt/github-stats-collect.py --config
  /etc/github-stats.json >> /var/log/github-stats.log 2>&1') | crontab -

How it works


The collector is a single Python file with zero external dependencies. It calls the GitHub API for each repo and writes three JSON files:

  • data/YYYY-MM-DD.json — the full daily snapshot, kept forever
  • data/history.json — a rolling summary with one entry per day, used by the charts
  • data/current.json — the latest snapshot, used by the stat cards and traffic/clone bar charts

The dashboard is a single static HTML file. No build tools, no framework. It fetches the JSON files and renders everything client-side with Chart.js. When you click a repo tab, it swaps the stat cards and per-repo charts without reloading the page.

The daily traffic and clone charts show the 14-day rolling window from GitHub’s API. The stars and downloads charts plot from history.json, so they grow over time as you collect more snapshots.

Things worth knowing


Start collecting early. You can only capture traffic data that GitHub still has. If you set this up today, you get today’s 14-day window. Tomorrow you get tomorrow’s. But you can’t backfill what’s already gone.

Download counts are cumulative. GitHub reports total all-time downloads per release asset. The collector stores the current total each day, so the dashboard can plot the growth over time.

Safe to re-run. If you run the collector multiple times in a day, it replaces that day’s entry in history.json rather than duplicating it.

No database. Everything is flat JSON files. Easy to back up (just tar the data directory), easy to inspect, easy to move to a new server.

The traffic API requires admin-level access. With fine-grained tokens, that means the Administration: Read-only permission. Without it, you’ll still get stars, forks, and downloads — just not traffic or clones.

The code


Everything is at erikdarlingdata/github-stats-dashboard. MIT licensed. Four files that matter:

  • collect.py — the collector script
  • index.html — the dashboard
  • config.example.json — example config
  • nginx.example.conf — example nginx config

Fork it, change the repos, deploy it, done.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.