Trying Something New With sp_QuickieStore: 80/20 Query Scoring Across Multiple Metrics

Trying Something New With sp_QuickieStore: 80/20 Query Scoring Across Multiple Metrics


Summary

In this video, I dive into an old-school approach to identifying SQL Server performance issues using SP Quickie Store and a novel method inspired by the Pomodoro technique. Traditionally, Query Store surfaces queries that consumed a lot of CPU over the last seven days, but often these results are too broad for practical use. To address this, I’ve developed a multi-dimensional scoring system that evaluates queries based on their impact across several key metrics: CPU usage, duration, physical reads, writes, and executions. This approach helps pinpoint the most problematic queries more accurately, even when they run outside of typical working hours or are unparameterized. By sharing these insights, I hope to provide a practical tool for SQL Server administrators looking to optimize their databases without relying solely on modern monitoring tools.

Chapters

Full Transcript

Erik Darling here with Darling Data, taking a break from my monitoring tool mogul personhood to talk about some old-fashioned stuff, you know, some old-fashioned ways of finding SQL Server problems that isn’t my new fancy free open source, better than all the paid monitoring tools, monitoring tool. And we’re going to talk about doing something a little bit different with SP Quickie Store. So let me start by sort of defining the problem I want to solve, and then tell you and then show you a little bit about how I chose to go about solving it. So when I look at Query Store traditionally, and by default, SP Quickie Store looks at data from the last seven days and sorts by average CPU descending, which does have a great way of surfacing things that burned a lot of CPU.

Often when you find those things and show them to people, though, they say, well, that runs at night, it doesn’t matter. And at first, I tried to cope with that by adding in a few parameters, like say, workdays, when set to one that will, by default, look for everything between 9am and 5pm on weekdays only, and show you that. Of course, of course, the hours, work start and work end, serve to define your work day to your needs. But still, you needed a lot of other sort of filtering in order to come to an agreement on which queries mattered.

For example, some queries would have a, you know, would still execute during the day, and would burn a lot of CPU on average, but people would say things like, well, it appears to have only executed once or twice. Or they would say things like, oh, you know, that’s not user facing, I don’t know, like, you know, it’s a lot of stuff that would come up and be like, okay, well, you know, we can filter, we can say, set a minimum number of executions. What makes sense? How many executions would you start to care about? I don’t know. And then, you know, there’s all sorts of other things that sort of act as traps in there, like to say, unparameterized queries, where they generate a new query ID for every new literal value that gets passed in.

And so it looks like they like something is maybe executed, like, you know, a few times, maybe, but there’s real but like, if you look at like the aggregate for that query hash, which is something I added in also was include query hash totals. Well, you know, I think you sort of get the point, it still was really hard to find, like, what is really stressing the server out. So the way that I chose to go about trying to fix this now is by sort of doing like the Pomodo method, but with a matrix.

So the whole theory that, you know, 80% of your blank is, you know, caused by 20% of your blank rule of things, right? Like the 80-20 rule effectively, right? Except I want, except, you know, if you do that across just one resource, you’re like, well, okay, CPU, fine, right? So I chose to use a group of resources that I care about, which are CPU, duration, physical reads, not logical reads, logical reads are for people using SQL Server 2008, we’re not there anymore, writes, and executions.

So I’m taking those five metrics, and I’m finding things that have a meaningful impact on your server across those. So that’s what we’re going to talk about today, and I’m going to show you kind of what that looks like. Down in the video description, oh boy, all sorts of useful links, you can hire me for consulting still, you know, before the robots put every independent consultant out of business, you can still talk to me.

You can purchase my training before, you know, sort of get the human angle on all this stuff, not just the, you’re absolutely right, LLM angle on any questions you might have. You can become a supporting member of the channel, wouldn’t that be just grand if you gave me about $4 a month to, you know, pay just the tax on a cappuccino. You can also, for free, ask me office hours questions, and even more for free, like, subscribe, and assuming that you’ve got any friends who are still living people and not large language models, well, you can tell them about the channel and maybe they can subscribe too.

So that’d be, wouldn’t that be just fantastic? We could all be friends. Speaking of friends, I’m your friend. And because I’m your friend, and I want you to be happy, and I want your life to be better, I have made a free open source SQL Server performance monitoring tool. No complicated installations, no licensing, no email needed, no phone home telemetry, no nothing.

Just a bunch of T-SQL collectors running on a schedule, populating dashboards, basically all the stuff that I would look at as a consultant if I were trying to keep a SQL Server happy from a performance point of view. And also, the thing that I think is pretty neat is that I have built in to the monitoring tool, it is opt-in, of course, it is not monitored by default, MCP servers, so if you want to have your robot friends look at just your performance data, only the collected performance data, you want to ask them questions specifically about that nicely aggregated, collected, you know, gentrified data that I get for you, you can do that, and they can summarize very nicely all the stuff that has happened with your server that is making it unhappy.

I will be out and about in the world, I guess by the time this goes live, I forgot to take off DataTune Nashville, but I like DataTune Nashville, maybe there will be one next year and I’ll go back for that. So, if there is, DataTune Nashville, that’s a great conference. Data Saturday Chicago coming up, well, when this publishes, it’ll be this week.

SQL Day Poland in May, woohoo, and Data Saturday Croatia also, woohoo, June 12th and 13th. And if this slide will go on, you will see that I have at least updated the slide now that we have left the murderous winter, we have survived this Donner Party winter, and we are now entering spring, and the hills are once again alive with the sound of database music, which is quite harmonious if you ask me.

Anyway, let’s go do this thing. Let’s go over to SQL Server Management Studio, something that we haven’t seen in one of these videos in a little bit, not since the Get AI Ready with Erik series, a triumphant series of videos. So, let’s talk about these results a little bit. So, before I even do this, what I want to point out here is that if you use the at help parameter in SP Quickie Store, what you’ll get is in the results, this text right here.

So, like, I made sure that this all had a very good explanation so that it would be durable for you to refer to should you choose to use the new Find High Impact parameter and look at Query Store in this way. So, we’ve got, you know, sort of like some normal stuff that you would expect to see, database name, start date and end date, that’s the window of time that we looked at. And then the sort of primary window where things all come up will tell you sort of when the query runs most, right?

So, like, I just chose to sort of separate the day in two. So, 7 a.m. to 7 p.m. is working, 7 p.m. to 7 a.m. is off hours. I also have some stuff in there for weekends and I also have some designations in there for things whose execution profile sort of spans everything.

Well, I’ll show you what that looks like in a moment. Then we have some of the normal identifying characteristics you would expect to see about a query. Things like the object name, the SQL text, the query plan, the top weights if they’re available for it, the query hash.

And something that I want to point out here is that Query Store has query hashes in it. And I chose to use those as identifiers instead of query IDs for the sort of, because of what I, the problem that I was talking about earlier in the video where, you know, if you have unparameterized queries or something, you might not look like these are significant when in fact they are significant.

So we’re doing this by query hash. I do show you the query IDs and query plans that those query hashes are associated with. But, but like my default grouping is query hash.

There’s also a count of how many distinct query IDs that hash generated, which gives you a good idea if the code is unparameterized. You might want to look at it that way. And of course, a count of the plans.

Then there is an impact score, which is a number from zero to one. So the closer this number is to one, sort of like with the, actually am I, sort of like the reverse of the vector search stuff, or the semantic search stuff that I was talking about, which closer to zero was better.

I guess closer to zero means your query is less impactful. So that is good. But closer to one means you have something very impactful about this query. And that’s, and that score is generated by effectively getting the average percent rank across all of the active dimensions that we’re looking at.

And then there’s also stuff like total executions. So tell you how many, showing how many times that query has executed. And then what its share of CPU duration, physical reads, writes, and memory are.

I think I forgot to mention memory before, but that’s okay. I’m mentioning it now because I see it and we’re good. So after that, we then have some diagnostics, which is sort of a rule-based layer that is layered on top of the score, which kind of tells you like some stuff that we found that I found while looking at your query sort of execution patterns, not like the query plan or anything, just the, like the way it runs and stuff.

There’s also the, I also include weight stats for the query. So like, so like what we look at is, you know, like any like blocking resource contention IO. We try to figure out if it’s parameter sensitive, right?

If there’s plan instability. And then if like, you know, like, let’s say that if we have a select query that runs and it’s causing writes, then we might say, Hey, this thing might be spilling, or this thing might have spools in it that you could fix and take a look at.

And then we also flag like metrics by volatility. So if there are really, really big swings in the minimum or maximums for things compared to the averages, which can be signs of either parameters, like parameter sensitivity or something, then that means that you might have a situation like that, that you need to take a look at, right?

Cause it’s sometimes this query is very fast. Sometimes this query is very slow, but like, but like it might look okay on average, but it’s got a wide swing to it.

So what this ends up looking like in a result form, and you’ll have to sort of excuse that a lot of this stuff says off hours because, you know, my, I can’t remember how many times I’ve said this now, but all my, all my windows VMs and SQL servers installed on are in Pacific time.

And it seems like no matter when I run a workload, it’s always off hours. And maybe it’s just a California thing. They’re all on vacation. But this looks like this. This is the, of course, the database name that we ran it in.

Uh, I I’ve been using hammer to be a little bit more recently because I can just like run a workload and it causes a whole bunch of stuff to happen, right? Like high CPU blocking deadlocks.

And I can just come back and say, Oh, cool. You did all this stuff without me having to write like a synthetic stack overflow workload, which would be annoying and time consuming. And not something I really want to do right now with my time.

I got a lot of other stuff going on. Um, so then you see like the period that we ran this for. And then the, this is where we start getting into the stuff about the actual query. So, um, you see stuff that executed during business hours.

Again, this would be the 7 a.m. to 7 p.m. Right now this is hard coded, but well, you know, I’ll, I’ll, I’ll open this up a little bit more. You know, people start making requests about it.

Um, off hours would mean 7 p.m. to 7 a.m. And then down here a little bit, we have one that actually says there’s actually spread. And that means that it is spread across both off hours and business hours.

Um, if there were this, any of this were coming from a store procedure, that name would be there. We have the query text and then we have the query plan. Um, we have all of the aggregated weights for that query.

So you can kind of get a sense of like what that thing is up to. Um, again, these come from query source. So they’re the high level bucketed weights. They’re not the granular weights that you might be useful, used to, uh, if you’re looking at weight stats.

And then down here we have, um, so like, oh, like just, uh, you know, a couple of examples. I know it’s a little cut off on over my head, but that’s okay. Uh, so we have, so again, we’re grouped by query hash.

Because as you can see from all, just about all of these lines, one query hash can spawn a whole bunch of query IDs. If we were to do this by query ID, we would have a far less good view of which queries were most impactful in the aggregate, right?

Because like, look, there’s like seven different query IDs. It’d be like, oh, well, these all look fine. What could be going wrong here? Uh, and then we have the list of plan IDs, right? So, uh, you know, all kind of nice things to have in front of you.

Uh, where, you know, it’s like sort of interesting stuff where two different query IDs, like the group to the same hash only have one query plan ID, which is strange and bizarre. But, uh, then coming over this way, this is where we start to get into, um, uh, like which queries are most impactful.

So impact score, again, the close risk, I screen out anything that’s less than 0.5. Cause that’s where like, that’s where like stuff gets like pretty insignificant. So this is all, every impact score you see is going to be from 0.50, uh, and one, right?

With one being the highest you can get. So the closer this gets to one, that means it’s, that means it’s like, like this query isn’t like the 98th percentile of how much stuff it does, right?

So, um, like this one, like, you know, across, um, like all of these metrics, like all of this, all of this can sit, all of this like contributes to the, the general score of things. And then, you know, coming over a little bit further, um, you know, this is where the diagnostics come in.

Uh, the TPCC database is actually a little bit more interesting in here. So let’s switch over to that and let’s just run this for that one. So, uh, this, so this one’s a little bit different, right?

I mean, like still it’s all off hours cause stupid West Coast VMs, but we actually get a bunch of store procedure names in here. And we get slightly more interesting numbers.

I think anyway, um, over in this section. So if you look at just this top query where the impact score is 0.91%, and you kind of look across like, you know, that, holy cow, that’s a lot of executions.

Uh, wow. Uh, that even took me by surprise. Uh, you can see that, you know, this query for this database, uh, for the timeframe that we looked at was 18% of the CPU, uh, 1.3% of the duration, 3% of the physical reads, 54.2% of the writes, and 28.6% of the executions.

So like this query is contributing a lot of load to the server. This is a query that you would want to look at. Now, based on like the execution count and all that, you know, like it’s unlikely that you could make this query a lot faster, but this is a, this is a query that you might want to look at and say like, Hey, like, do we need to run this this much?

Can we cache this somewhere? You know, like the usual, I mean, maybe it is something you could tune in, like, you know, make fat, like make fat. You’re not going to make it execute less, right? You might could probably bring down the CPU share.

You could probably bring down like some of the other numbers and make it less significant. But it also might be a query that you want to think about not running as much because that is a bonkers number of executions. And then over here, not really any diagnostics, but we get some volatility, volatility metrics in here, where sometimes the duration is 707 times the average.

And then sometimes the physical reads are 93,000 times the average. And sometimes this one’s duration is almost 11,000 times the average. So this is just a different way of sort of looking at the queries that run in query store, scoring them across a bunch of metrics and figuring out which ones are contributing the most across all of those, not just to one.

Because oftentimes people are not just concerned about one thing. They’re like, well, CPU is high and, you know, we’re constantly going to disk and, you know, like problems, chocolate, right? So like this is a good way for you to start figuring out like as a whole, like which queries are contributing across a variety of metrics so you can attack things that are most meaningful across all of those.

Anyway, this is right now in the dev branch for my GitHub repo. That is code.erikdarling.com. Shockingly, this is also the same repo that my monitoring tool is in.

So I’m just going to bring that up. If you go to code.erikdarling.com, it’s going to be in the Darling data repo. This is, of course, SP Quickie store.

If you can go check that out. Again, it’s on the dev branch. This is not live on the main branch yet. But after I have let this soak in a little bit and I’ve had a chance to kick it around a little bit more, it will make its way over there.

So again, just a different way of sort of looking at query performance, a different way of finding queries that are meaningful to go after. I think it’s pretty cool.

Hopefully you do too. Hopefully it helps you find queries that are meaningful on your servers in ways that, you know, it’s like any other tool you look at to sort of get this information from. It’s always just like you sort by one metric, right?

And even if you sort it by two metrics, like that one metric sort of dominates and sends a much higher signal. The other metrics sort of like wishy-washy, right?

So it’s like just to sort of talk about the problem in the opposite way. Like, you know, like I said, by default, Quickie store orders by average CPU. If you flip that and you say, I’m going to order by executions, you find something executes a lot, but it’s like stuff that’s going to have like zero millisecond, like CPU and duration because it’s like very fast code that’s not doing anything.

So you’re still finding a problem, but you’re not finding sort of things that are causing like more than one problem at once, right? So this is just one.

This is a good way to not just focus on one metric or one problem. This is a great way to look at your queries across a bunch of metrics that will cause you problems. Anyway, thank you for watching.

I hope you enjoyed yourselves. I hope you learned something. I hope you’ll check this thing out. That’s spquickiestore in my GitHub repo right there. Give it a shot. Let me know how that goes. And I don’t know. See you on the main branch someday soon.

All right. Thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

SQL Server Performance Office Hours Episode 55

SQL Server Performance Office Hours Episode 55



To ask your questions, head over here.

Summary

In this video, I dive into some common questions and issues faced in SQL Server performance tuning, particularly focusing on scalar functions causing lengthy execution times and the challenges of using extended events for monitoring. I also share insights on parameter sniffing and the limitations of the parameter sensitive plan optimization feature. Additionally, I discuss my free open-source SQL Server Monitoring tool, emphasizing its value as a cost-effective alternative to commercial tools. The video wraps up with an overview of upcoming events like Data Saturday Chicago and SQL Day Poland, inviting attendees to connect and collaborate in person. Lastly, I touch on the ongoing development of AI within SQL Server and Azure, sharing some use case scenarios from proof-of-concept projects.

Chapters

  • *00:00:00* – Introduction and Free SQL Server Performance Monitoring Tool Overview
  • *00:05:29* – Optimized Locking in SQL Server 2025
  • *00:14:18* – Conclusion and Future Plans

Full Transcript

Erik Darling here with Darling Data, and it is, oh, quite a Monday, isn’t it? I suppose we’re all going to have to deal with this in our own special way. But anyway, that means it’s time for Office Out. That’s how I’m going to deal with it. Answer some questions, because people seem to keep having them, and I seem to keep having answers, so I don’t know, maybe let’s keep this gravy train rolling. He-he! Alright. Down in the video descriptions. Well, actually, there’s only one of them. I don’t know why I pluralized that, but there’s all sorts of helpful links. If you would like to hire me for consulting, purchase my training, become a supporting with money member of the channel, you can do that. You’ll also find links to continue to ask me office hours questions, so this gravy train will stay on the tracks. And as always, if you enjoy this content in any way, shape, or form, and you feel that it may enrich the lives of those around you, please do like, subscribe, tell a friend, all that good stuff, so that I can continue racking up these ground-breaking, record-setting view and subscriber counts.

I’m hot on the heels of that Amiga Repair channel. I swear, one of these days, so close. I’m going to overtake that thing. I also have, I mean, semi-recently, free SQL Server Monitoring. You ever want to have performance monitoring for your SQL Server, but, you know, all the other paid tools out there kind of suck, and, you know, aren’t worth the money, and, you know, I’ll probably be in jail.

Well, I got a free one. It is free. It is open source. There’s no email sign-up.

I’m not phoning home with any information from you. It’s just a bunch of T-SQL collectors running on a schedule, getting the most important performance metrics that your server has to offer. All sorts of stuff that you would want to see in a monitoring tool, nicely grouped together, colorful charts and graphs, help you tell a nice story about what’s going on.

Weight stats, blocking, deadlocks, top queries, all sorts of, you know, cool internally stuff that fancy-pants consultants like me have figured out how to collect data for over the years in a meaningful way. Nox-style dashboard, so you can see what’s up, what’s down, what’s hot, what’s not. And if you are, you know, in favor of our new robot overlords, there are also a number of built-in MCP tools so that you can ask questions of your performance data.

And just the collected performance data, nothing else. And you can have LLMs kind of tell you a little bit about what’s going on, where and when with your performance. You can just talk to your data naturally, which is, I think, kind of cool.

And I don’t think anyone else is really doing that. So, if you want to get that for free, it’s all right there. On my GitHub repo, code.erikdarling.com, or if you are bold and brave enough to remember that link or just click the link down in the video description, you can find all that stuff.

Oh, well, you know what? Data 2 Nashville already happened, which means Data Saturday Chicago is this weekend. And, well, that certainly snuck up on us, didn’t I?

I feel like it was just yesterday I was saying February is never going to end. So, now I got to make a March picture. I’m going to have a database.

Databases are going to play baseball this March, I think. But I also will have SQL Day Poland coming up May 11th and 13th. And Data Saturday Croatia coming up June 12th and 13th. So, you know, if you are in Chicago, Poland, or Croatia, stop by and see me.

I would love to see you. Maybe we’ll get along. Maybe we’ll be best friends. You never know. I could use more friends. But, yeah, this is February, February. It’s done.

It’s done. I got to fix this one. All right. I’ll put that on my to-do list. February, over. All right? No more of this. This was a terrible winter. I hated it.

All of it. Anyway, we have questions which have been asked. And so we have questions which will be answered here. Let’s see what the first one we got.

Recently encountered a query that used scalar functions, which was taking 50 minutes to complete. Not an unnatural state of things, from my experience.

After hours of aimless clicking. Well, don’t click aimlessly. Click with purpose and meaning. Why would you aimlessly click? I trace the issue back to an extended event.

Is there a specific metric which I could have used to indicate performance issues from extended events? So I think within the query that you were running itself, likely not. You may have noticed either a session reading from an extended event, if someone were actively monitoring it.

Or you may notice other extended event-related weights on the server go up around when you were executing this query. But in general, no. I haven’t found a great way to track this one down.

And I have seen it happen a couple of times. I think early on with SP human events, before I added the skip plans thing. Funny story behind that.

Someone wanted me to watch a load test. And I was like, well, you can use extended events for that and capture some stuff. And so we set it up. And they started running stuff.

And they were like, this is taking like four times as long. And I was like, yeah, that’s extended events. So I don’t know. I wish I had a good way to say, lesson learned.

Always check extended events first. But it’s just so rarely a thing. I’m glad you figured it out. That’s good for you.

But no, I don’t have a great way of, you know, predetermining if there is a harmful extended event running aside from, you know. Well, I mean, like most client environments I walk into, like they have like some script from 2003 that looks at weight stats that they run once in a while. You know, the fact that like there are store procedures that do all this stuff a little bit more surgically is big news.

So most extended events are not up and running that you, you know, at least ones that you wouldn’t expect to see. For you, apparently you have someone to watch out for. And that someone might be you.

No, you never said who created that extended event. You just said you traced it back to an extended event. So I don’t know. You or someone you know is the murderer. We don’t know who.

I see that PSPO, that is the parameter sensitive plan optimization, works for update in SQL Server 2025. Have you had any good results with it? I’ve not really had all that many great results with the parameter sensitive plan optimization generally.

No, Microsoft, you know, did not think through bucketing things well, I don’t believe. And my gripes with the parameter sensitive plan optimization remain in place even through SQL Server 2025. There are certainly times when it works.

I can be perfectly OK. But yeah, there’s a there’s a there’s a lot to be desired with that one. You know, like the the most common and least common top bottom buckets and then everything else in the middle bucket.

Kind of like this, this thing in the middle can can span a lot of different row counts and you can end up with sort of ineffective plan variants for that. So I don’t know. Another another could have been, but probably ain’t going to be.

Hey, I always tell you, we got fabric. Lucky people. All right. Your new monitoring tool is sick.

Call the doctor. However, I am also working on a small niche commercial SQL Server monitoring tool similar to the big hitters, centralized repo, but much cheaper as a passion project. So I don’t think it’s my tool that’s cooking you.

DBA dash is free and already does that. Like, you know, I don’t know what you are a bit vague for what what is small and niche about your monitoring tool. I don’t know what specifically you you what you have niched yourself in.

But I think, you know, you like depending on like what your niche is, who that market is and how underserved they are. It might be sort of tough to get like software as a service off the ground is like, you know, I mean, you said it’s a passion project. So maybe the money doesn’t matter, right?

Like maybe if you can just get it out to some people and they’re happy with it, that’s good enough for you. But since you’re asking people to pay for it, I assume that you expect to make money from it. And that that might be a tough proposition in this day and age.

Unless you are already a fairly large, well-established enterprise vendor, you may have a hard time getting people to pay for something. You know, like like I mean, like obviously the thing that I’m doing is totally free, right? Like download it, take it, use it.

But, you know, it’s an open source project. And I always say, you know, hey, if you like this, you can contribute to it. If you find this useful, you can contribute to it. It’s been a very, very small portion of anyone caring enough to contribute to it. Thank you very much to anyone who has.

You’re beautiful, wonderful people, but you are the vast minority on that. So I don’t know who you’re going after with this tool. I don’t know what your I don’t know what your hook is, but good luck. I think it’s going to be a tough time for that sort of thing.

Let’s see. Could you share a few concrete use case scenarios where you have implemented AI within SQL Server 2025 in a production environment? Is this an interview or any in energy or banking sector?

Awful specific. So not SQL Server 2025, but Azure SQL, where the vector stuff has been around a little bit longer. I have helped people with some proof of concept stuff to for them to figure out how they could fit SQL Server’s vector search stuff in to various things that they are interested in trying, whether it whether that is, you know, like sort of like a chat body thing or sort of a developer assistance thing, you know, sort of getting into, you know, like, finding related issues, things like that.

I’ve certainly helped people with that sort of stuff, but I don’t know at this point how far those projects made it along. You know, people were fairly happy with the proof of concept, but, you know, the vector stuff in SQL Server is kind of dead in the water until vector indexes become writable. I don’t know when that’s going to be.

I don’t know what Microsoft is busy with, aside from Fabric, because, you know, hey, we got that. Remember that one? Remember that joke? So, I mean, like, a lot of people, I just think, aren’t going to be able to use it well until Microsoft actually finishes the features around it.

Vector indexing being one of them, getting a lot of that stuff out of the preview program and into GA. You know, until that happens, I think a lot of people are just going to say, that’s nice. Okay.

Cool. I’ll be over here waiting. You take your time. So, you know, I wish that it were further along. I wish that it were a little bit more well-developed and a little bit more, I don’t know, production-ready in a lot of ways.

But here we are. Here we are. That is not where we are today. We are here today, but that is not where Vector Search and SQL Server 2025 or Azure is today.

We are standing by. Been impressed by optimized locking yet? No.

Seen it? Looks cool? Been impressed? Nah. It’s all right. Anyway, thank you for watching. Hope you enjoyed yourselves.

Hope you learned something. And I will see you in tomorrow’s video where I will undoubtedly talk more about my free SQL Server performance monitoring tool a little bit more. Because I would desperately love to spread the word of that thing further and hope that more people get out there, try it, you know, in a perfect world, more people would get involved with the project, perhaps even sponsor the project.

But, you know, reporting issues, contributing code or ideas, asking questions, all that stuff helps me move things forward in a way that makes things better for everyone. So, if you’re into those things, stop by code.erikdarling.com. All right.

See you tomorrow.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Free SQL Server Performance Monitoring System Health Extended Event

Free SQL Server Performance Monitoring System Health Extended Event


Summary

In this video, I delve into the OCRAP section of my free SQL Server performance monitoring tool, Darling Data, which surfaces critical issues often hidden in the System Health extended event. This section is like a red flag area where you’ll find nasty things such as bad pages, memory dumps, suspect pages, non-yielding tasks, sick spin locks, and severe errors—basically everything that can go wrong with your SQL Server health. I walk through how to access this information via my GitHub repo at code.erikdarling.com, highlighting the importance of paying attention to these warnings as they can indicate serious underlying issues that might require immediate professional intervention or even a complete system overhaul.

Chapters

Full Transcript

Monitoring tool mogul Erik Darling here with Darling Data. And in today’s video, I want to go over what I call the OCRAP section. The OCRAP section are the real nasty things that can come out of the system health extended event. Normally, you might find that my free store procedure SP health parser would surface these things. move to sit there and run that and deal with the outcome. Well, you can use my free SQL Server performance monitoring tool and you can get all of this goodness done for you. Now, the reason why I call this the OCRAP section is because this is really is where a lot of like you’ll see like this server is not doing well type stuff. So, you know, bad pages, memory dumps, suspect pages, non yielding tasks, sick spin locks, severe errors, all sorts of information about CPU scheduler health, IO warnings, and memory conditions, like stuff that you don’t want to see, but stuff that if it’s happening, you should pretty, you should know about pretty quickly. So if you want to check this out, it’s all at code.erikdarling.com. That’s the shortcut to my GitHub repo. You go there, go to the performance monitor repo, there’s releases with all all of the zip files that contain all of the executables you need to get up and running with this. There’s also a wonderful readme file. That wonderful readme file contains all sorts of pertinent information to how you can get up and running with this without sending me emails asking me about why it’s saying that your server certificate isn’t trusted because you didn’t say to trust the server certificate. So with that out of the way, let’s jump over here. And let’s look at the system events tab. Now, again, like, this is the Oh, crap section. And if you see anything in here, you should call a professional doesn’t have to be me. I might not even be the right professional to talk to. But if you see stuff getting populated in here, you might want to, you know, show some concern with this stuff. So let me just quickly do that. So see some stuff. But so under here are corruption events. So if we detect bad pages, if you have any dump requests, no jokes, right access violations or general access violations. If you have non yielding schedulers, latch warning, six spin locks, or anything like that, that will show up in here and you will, you know, undoubtedly cry. I don’t know, switch to Postgres, whatever it is people do these days. Under severe errors, you know, like I, obviously, obviously, every single server has some errors on it, you know, mostly, you know, it’s like, it’s kind of like me, it’s either like either deadlocks or me killing off queries that I’m annoyed with. So there are some severe errors in here, which are mostly one of those two things. So let’s not worry about the video crap section is when stuff happens in here. Yeah, you run run for the run for the jungle, I guess. What is it? I don’t know. Anyway, scheduler issues, this will tell you tell you not about non yielding schedulers over over time. There is a handy little grid down a little status bar down here that will tell you about any current stuff that is going on.

There is a lot of pressure stuff in here. If this graph populates, you are having some memory issues. CPU tasks. Well, I mean, this is this is sort of a running trend of like how many workers you’re allowed to have and how many workers you’ve created. So if any blocking occurred, so like if you see that blue line, get up real close to that dotted sort of like, I don’t know, like Marigold line or something, that might like, like that might be something you care about, because that would mean that you are running out of worker threads. And that would mean, you know, thread pool and potentially other unfortunate circumstances. So worth looking in there. Memory broker stuff, you know, sort of like, you know, what’s consuming memory, things like that. And then memory node OOM, which is something no one wants to see. Again, I don’t have any of this bad stuff going on on my servers. So you know, we’re kind of lucky there. We did see like, at least an alert pop up. So you know, there’s a sign of life, I can acknowledge that alert, and things will mostly go away. But anyway, just, you know, again, this is all from the system health extended event, this is all running on most versions of SQL Server, and my free monitoring tool goes and collects it and will warn you about the old crap stuff. Right. So like important stuff for you to know about if you are dealing with SQL Server, because if you are dealing with a SQL Server that has like a lot of stuff going on in here, like a lot of stuff going on in here, you might might need to just find a new job, right? Fill out a resume, I’m done, quit.

We’re through here. Sorry about everything. I don’t know. I’d be scared seriously.

Anyway, thank you. Thank you. I hope you enjoyed this. I hope you enjoyed this. I hope you enjoyed this.

Bye-bye. Bye-bye. Bye-bye. Bye-bye. Bye-bye.

Bye-bye. Thank you so much.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Free SQL Server Performance Monitoring Blocking and Deadlocking

Free SQL Server Performance Monitoring Blocking and Deadlocking


Summary

In this video, I delve into the blocking and deadlock monitoring capabilities of FreeSQL Server Performance Monitoring, a tool I’ve developed and made available on GitHub. With a focus on practicality and ease-of-use, I explain how we leverage extended events for both blocking and deadlock scenarios, ensuring that you can identify and address performance issues efficiently. Whether you’re using system health or prefer to rely on the block process report, my monitoring tool streamlines the process by automatically setting up necessary configurations and providing easy-to-understand visualizations of your SQL Server’s performance trends over time.

Chapters

Full Transcript

Erik Darling here with Darling Data, and in this video I want to go over how, or rather, how and what we collect in my FreeSQL Server performance monitoring tool, available on GitHub for free, way better than all the crap you pay for, around blocking and deadlocks, and although this can help you identify when you had terrible, problems. So for blocking in deadlocks, we use extended events for both. The block process report will send data to an extended event, as well as the deadlock stuff in there. Now for the deadlocks, if you don’t have a dedicated event set up to capture deadlocks, the monitoring tool will set one up for you. But if you would prefer to go to system health, then we can, then we will fall back to system health. There are also some charts that trend deadlocking and blocking activity over time. You know, like it’ll tell you all the sort of, you know, normal stuff that you would expect to see from looking at, you know, blocked process report or deadlock reports. If you were, you know, using one of, like, you know, say a free community script, like mine, a human events block viewer or SP blitz lock, it’s a very commensurate experience because a lot of the stuff is powered by community tools. That is why I am rather that is part of the open source ethos that I am hoping to instill and hope also hoping to, you know, you know, call broader attention to with people. There’s a lot of great stuff out there. But some people are afraid to run it, don’t know how to use it. And this sort of just encapsulates it and makes a lot easier. So blocked process reports go into the extended event, they are all XML, which is a damn shame because XML is an awful pain in the butt to deal with. But I try to do try to make that as easy as possible for you by doing all the shredding and picking apart that you would that you would need done. I do set up you set that up with a five second SP configure where I can. The deadlock monitor stuff just runs automatically. You know, you don’t really need to set a threshold there. If SQL Server hits a deadlock and it can log it, it will. That’s pretty much it. There are some platform differences. In AWS RDS, SP configure is not directly available the way that it is in many other SQL Server platforms. It is available via an RDS parameter group.

So if you are using this to monitor RDS, you will need to set things there. In Azure SQL DB, it is fixed at 20 seconds. I don’t know what Microsoft’s scared of, but they set it up pretty high. So the block process report will sweep through every 20 seconds, look for blocking. And if it happens to catch any, it’ll show it in there. With both of these, though, wherever the reports are collected, you can download them and do whatever you want with them. You can send them to someone. You can, you know, put them in another tool that parses them out, whatever you want to do. If you want to check out this awesome free SQL Server monitoring tool, again, go to code.erikdarling.com. It is in the performance monitor repo.

And if you go to the releases section, you’ll find nightly releases and the latest sort of stable build release. So whichever one you’re feeling, you’re feeling cool with testing out, you can, you can grab and start monitoring your SQL Server’s performance for free. So let’s look at these tabs, just so you get sort of a sense of what you’re dealing with here.

So I set this back 30 days because apparently I haven’t done anything interesting on my server in a little bit. But if you look in the sort of locking trends tab, this will show you lock, just lock weights as they occurred. This will give you sort of the count of blocking and deadlocking events.

And this will, and these two graphs down here will sort of start to show you the durations of them. So where these durations get higher and higher, you had bigger and bigger problems. I’ll try to move out of the way of the deadlock one.

So you can see in here, this thing spikes up when we had a bunch of deadlocks. This current weights tab is sort of an interesting one. So I’m working on sort of how to get this across the best. Because like if you look at this, you know, you have some, like you have lock weights down here that are hitting, you know, about 45 seconds.

But that’s very low. And then over here, you have some absolutely tremendous lock weights, right? I mean, look at this, right?

That’s 974,000 milliseconds. And then this thing up here, LCKMIX for 2 million milliseconds. So some of this data is a little hard to get, like, good perspective on. So I’m working on how to best visualize that.

I need to figure out maybe a little bit better way of doing that. But for now, if you see lines go up very high and you see numbers into the millions here, that’s probably not a good sign, right? I think it’s a challenge for anyone designing charts and graphs is how to deal with these extreme outliers, like 975,000 milliseconds and 2.6 million milliseconds.

They are challenging to deal with from a visualization perspective. But I will try to figure something out. It’s like even if you look in here, where it’s like LCKMIS, 219,000 milliseconds, that’s a brave number of milliseconds.

But down in here, I break down blocked sessions by database, right? So, like, the primary ones in here that are – go away, Visual Studio. No one needs you.

So the primary ones in here are HammerDB. That’s in green, right? So HammerDB TPCC had a lot of blocking going on at some point. And then the HammerDB TPCH database also had a bunch of blocked sessions in it. So I do try to point out not only, like – I try to, like, give you better, more granular breakdown of which databases have the most blocking in them so you can make decisions about which ones to sort of troubleshoot.

Over here are the shredded block process and deadlock reports. So if we look under blocking here, you know, it’ll be all the sort of normal stuff that you would expect to see if you were to run SPHumanEventsBlockViewer. You know, you see the blocking chain in here.

You can figure – like, you know, it’s a very similar experience to running that tool, the query text. And if you want to download the block process report XML, you can do that very easily there. And then under deadlocks, you’ll see very, very similar to what you would get back from SPHumanEventsBlock because guess what?

I run SPHumanEventsBlock to do the parsing, right? Because it’s a lot easier than having to rewrite code that I wrote the first time again. So – oh, don’t run away from me.

Who do you think you are? Silly goose. Anyway, just a quick overview of the blocking and deadlocking stuff in my free SQL Server monitoring tool. If you want to get a hold of this and start monitoring your SQL Servers for free, again, that is all at code.erikdarling.com.

You can go in, start getting information, start troubleshooting SQL Server. And if you like this project, you appreciate this project, and you would like to sponsor it, you can absolutely do that. Or if you start looking at all this monitoring data and you say to yourself, hey, I’m in way over my head, you can always call me because I’m still – I still am doing a consult, and I can still help you with your SQL Server performance problems.

So thank you for watching. Hope you enjoyed yourselves. Hope you learned something, and I will see you in tomorrow’s video where we will dig a little bit more into some of the foundations and fundamentals in my free SQL Server performance monitoring tool.

Alright, thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Making A Website For Tracking My GitHub Stats

Making A Website For Tracking My GitHub Stats


GitHub only keeps 14 days of traffic data. If you don’t capture it, it’s gone. I got tired of losing that data, so I built a self-hosted dashboard that collects stats daily and keeps them forever. I also got tired of running reports every day.

It tracks stars, forks, traffic, clones, and release download counts across multiple repos, and displays everything on a single page with interactive charts. The whole thing is a Python script, a static HTML file, and a cron job. No framework, no database, no build step.

You can see my live version at stats.erikdarling.com, where I track PerformanceMonitor, PerformanceStudio, and DarlingData.

All the code is on GitHub: erikdarlingdata/github-stats-dashboard

What you need


  • A Linux server (any cheap VPS — mine runs on a $5/month Hetzner box alongside other stuff)
  • Python 3 (no pip packages, just the standard library)
  • Nginx
  • A GitHub fine-grained personal access token

Step 1: Create a GitHub token


Go to GitHub → Settings → Developer settings → Fine-grained tokens → Generate new token.

Scope it to only the repos you want to track. Under repository permissions, you need:

  • Metadata: Read-only (selected by default)
  • Administration: Read-only (required for the traffic and clones API)

Save the token somewhere safe. You’ll put it on your server in a moment.

Step 2: Set up the server


SSH into your server and install nginx and certbot:

sudo apt update
sudo apt install -y nginx certbot python3-certbot-nginx

Save your GitHub token:

echo "ghp_your_token_here" | sudo tee /etc/github-stats-token > /dev/null
sudo chmod 600 /etc/github-stats-token

Create the web root and grab the dashboard files:

sudo mkdir -p /var/www/stats/data

# Clone the repo (or just download the two files you need)
git clone https://github.com/erikdarlingdata/github-stats-dashboard.git /tmp/stats-dashboard
sudo cp /tmp/stats-dashboard/index.html /var/www/stats/
sudo cp /tmp/stats-dashboard/collect.py /opt/github-stats-collect.py
sudo chmod +x /opt/github-stats-collect.py

Step 3: Configure the collector


The collector needs to know which repos to track. You can either edit the DEFAULT_CONFIG dict at the top of collect.py, or create a config file:

sudo tee /etc/github-stats.json > /dev/null <<'EOF'
{
    "data_dir": "/var/www/stats/data",
    "token_file": "/etc/github-stats-token",
    "repos": [
        "your-org/repo-one",
        "your-org/repo-two",
        "your-org/repo-three"
    ]
}
EOF

Replace the repo list with your own. Use the full owner/repo format.

Step 4: Configure the dashboard


Open /var/www/stats/index.html and find the REPO_DISPLAY object near the top of the script block:

const REPO_DISPLAY = {
    "repo-one":   { short: "R1", color: "#58a6ff" },
    "repo-two":   { short: "R2", color: "#3fb950" },
    "repo-three": { short: "R3", color: "#d2a8ff" },
};

The keys must match the repo names from your config (the part after the /). The short value is what shows up in the tab buttons and chart legends. Pick whatever colors you like.

Step 5: Set up nginx and SSL


Point your domain’s DNS A record to your server’s IP address, then configure nginx:

sudo tee /etc/nginx/sites-available/stats.example.com > /dev/null <<'EOF'
server {
    listen 80;
    server_name stats.example.com;

    root /var/www/stats;
    index index.html;

    location / {
        try_files $uri $uri/ =404;
    }

    location /data/ {
        add_header Cache-Control "no-cache";
    }
}
EOF

sudo ln -s /etc/nginx/sites-available/stats.example.com /etc/nginx/sites-enabled/
sudo nginx -t && sudo systemctl reload nginx

Then get a free SSL certificate:

sudo certbot --nginx -d stats.example.com

Step 6: Run the first collection and schedule daily runs


# First run
python3 /opt/github-stats-collect.py --config /etc/github-stats.json

# Schedule daily at 6:15 AM UTC
(crontab -l 2>/dev/null; echo '15 6 * * * /usr/bin/python3 /opt/github-stats-collect.py --config /etc/github-stats.json >> /var/log/github-stats.log 2>&1') | crontab -

Visit your domain. You should see stat cards, charts, and a release download table. The charts will fill in over the coming days as snapshots accumulate. If you want it to run hourly, you can do this instead:

# Schedule hourly at :15 past the hour
  (crontab -l 2>/dev/null; echo '15 * * * * /usr/bin/python3 /opt/github-stats-collect.py --config
  /etc/github-stats.json >> /var/log/github-stats.log 2>&1') | crontab -

How it works


The collector is a single Python file with zero external dependencies. It calls the GitHub API for each repo and writes three JSON files:

  • data/YYYY-MM-DD.json — the full daily snapshot, kept forever
  • data/history.json — a rolling summary with one entry per day, used by the charts
  • data/current.json — the latest snapshot, used by the stat cards and traffic/clone bar charts

The dashboard is a single static HTML file. No build tools, no framework. It fetches the JSON files and renders everything client-side with Chart.js. When you click a repo tab, it swaps the stat cards and per-repo charts without reloading the page.

The daily traffic and clone charts show the 14-day rolling window from GitHub’s API. The stars and downloads charts plot from history.json, so they grow over time as you collect more snapshots.

Things worth knowing


Start collecting early. You can only capture traffic data that GitHub still has. If you set this up today, you get today’s 14-day window. Tomorrow you get tomorrow’s. But you can’t backfill what’s already gone.

Download counts are cumulative. GitHub reports total all-time downloads per release asset. The collector stores the current total each day, so the dashboard can plot the growth over time.

Safe to re-run. If you run the collector multiple times in a day, it replaces that day’s entry in history.json rather than duplicating it.

No database. Everything is flat JSON files. Easy to back up (just tar the data directory), easy to inspect, easy to move to a new server.

The traffic API requires admin-level access. With fine-grained tokens, that means the Administration: Read-only permission. Without it, you’ll still get stars, forks, and downloads — just not traffic or clones.

The code


Everything is at erikdarlingdata/github-stats-dashboard. MIT licensed. Four files that matter:

  • collect.py — the collector script
  • index.html — the dashboard
  • config.example.json — example config
  • nginx.example.conf — example nginx config

Fork it, change the repos, deploy it, done.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Free SQL Server Performance Monitoring Memory Graphs

Free SQL Server Performance Monitoring Memory Graphs


Summary

In this video, I dive into the memory monitoring features of my free SQL Server performance monitoring tool, Darling Data. I cover how memory gets utilized in SQL Server, including an overview of memory grants and the plan cache, highlighting their importance and common issues. I also explain various memory clerks such as the buffer pool and lock manager, showing how they impact overall server performance. The video includes detailed explanations of the charts and graphs available for monitoring memory usage, helping you to better understand and manage SQL Server’s memory consumption. If you’re interested in checking out the tool or have any questions, head over to code.erikdarling.com where you can download it for free from my GitHub repository.

Chapters

Full Transcript

Erik Darling here with Darling Data, monitoring tool moguling my way to, well, fame and, well, I think I’ve made like 150 bucks so far, so that’s cool. If anyone out there is feeling generous and wants to sponsor open source projects, well, I’m wide open on that one, so I don’t know, maybe you’re listening, maybe you’re not, I don’t know, it’s hard to tell. Anyway, this video we’re going to talk a little bit about the memory monitoring stuff that’s available in my free SQL Server performance monitoring tool. We’ve got sort of an overview of how memory gets used generally, memory grants that your server has seen, how the plan cache is being used, which, you know, as much as I hate the plan cache, I do like pointing out how crappy the plan cache is.

So, seeing those numbers like rise and fall is always kind of like, ah, look at that, eh, that thing stinks, use query store. How various, so memory clerks, which is sort of various bibs and bobs inside of SQL Server that use memory. So, you know, you have your buffer pool, you have, you know, like lock manager, you have like different, like the plan cache, different things like that.

So, I like to surface those. Um, and also tell you if there are any memory pressure events going on, which is stuff that comes from ring buffer notifications. Uh, there are, of course, some MCP tools built in, so you can talk to all of this data with the LLM of your dreams.

Um, I find Claude the dreamiest, but you might have bad taste and like another one. Um, so all of these things sort of tie in together and I kind of group them all together because a lot of people, I think, overlook how important memory is and how it gets used in SQL Server is, um, often not, I would say not, not exactly straightforward to reason about. Um, you know, one of the biggest things that I see generally day to day is, I mean, like servers, like mostly don’t have enough memory in them.

Um, but I think what people kind of don’t understand is how vital having data cached in memory is and the tug of war that happens between your buffer pool and other memory consumers, particularly query memory grants. So having all this stuff sort of trended in, like obvious to you, makes it a little bit easier than like, you know, hitting F5 on a script, running it, seeing a snapshot of results and being like, oh, well, you know, you know, I mean, I guess it’s okay.

And, you know, like this all ties into, you know, like sort of like the weight stats graph as well, because if you’ve got a lot of page IO latch up there, then that’s going to be a sign that, you know, you’re constantly just reading pages from disk. And so there’s a lot to sort of get into and, um, there’s a lot that the graphing and the monitoring tool makes, uh, for a very easy and often compelling story to tell about what sort of problems a SQL Server is having. So, um, if you want to check it out, it’s all at code.erikdarling.com.

Um, that will take you to my GitHub repo. You can download, uh, the performance monitoring tool stuff there for free. Uh, it’s all going to be in the latest releases.

So if you go there, grab a zip file, uh, you can just open that zip file up and run the stuff you need and start monitoring your SQL Server. Very, very low friction stuff. But let’s talk a little bit about, uh, what’s going on with memory in SQL Server, which is, you know, a pretty smart thing to talk about.

So, uh, we have a few different things that we trend here. Um, looking for any sort of buffer pool pressure, which would be like, um, it’s a little hard to see, but if there were, there’s a little dotty line there. So if there were pressure, we would see some dots up on the screen.

Thankfully, I don’t have, uh, too much memory, too many, many, too many memory problems going on here. Um, I have total memory, uh, which is sort of like, you know, how much SQL Server has available to do stuff with. Um, the buffer pool, the plan cache, and available physical memory.

So, um, that tells you sort of like how much SQL Server has left to give. So if you see that start to like really drop off, then, you know, you might, you might get worried there. Uh, over in this tab, we have memory grants.

So this will tell you, um, sort of broken down by pool. Uh, so every SQL Server has at least, uh, an internal and a default pool. Um, some of them might have more pools if you set up resource governor.

Uh, but this will tell you, like, how much memory has been used from those various pools. Uh, down here, we just have memory grant activity over time. Um, so this will tell you sort of like, um, like when memory grants happened.

Up there, that’ll tell you sort of like, like, like how big they were when they happened. Down here just gives you like, you know, if you had any at all. And there are some like, you know, very moderate spikes in here.

There’s not, there’s like one there. There’s like two there. They weren’t very big memory grants. So the, the charts and stuff up here aren’t really like budging all that hard. That’s okay.

It’s just example data for you to get a feel for the tool with. Uh, memory clerks. Uh, so by default, um, I just grabbed the top five by usage. Um, this might not be like the most useful five that you will ever, ever see in your life, uh, in this example, but this is just what’s going on at my server at the time.

And if we look over here, we can see that this is the memory, memory clerk for the buffer pool, and it’s kind of going up and down a little bit as stuff gets read in and knocked out of memory and like the buffer pool shrunk down there and then it grew back there and then, uh, went up there and then it, uh, I guess it dropped off a bit there.

So sort of, uh, different ways to, um, like the little legend down here will line up with whatever is selected up there. You can, if there’s other stuff that you want to look at, you can, of course, click in, click boxes and get stuff and be happy.

And everyone, you know, knows more about SQL Server. Uh, then there’s the plan cache. Um, basically what I’m trending here is just single and multi-use plans. Um, you know, like, again, like my, my server isn’t crazy busy.

So, you know, um, you know, just like sort of like having this stuff in there, you can just kind of like, and this is in megabytes. Like you can just kind of see over time, like, you know, Oh, like what happened with my plan cache?

Why are things terrible? Why did, why did everything, why did everything get cleared out here? And now I needed to compile everything again. And that was, that was awful too. So, uh, just some useful things to help you sort of track down, uh, SQL Server issues. Lastly, over here, I have, uh, memory pressure events right now.

I don’t have anything. I don’t have any memory pressure on my server because I have a very well taken care of SQL server. I am so good at SQL Server, uh, running in a VM on my laptop that I don’t have any memory pressure at the moment.

But if I did, you would see some cool stuff in here. Um, anyway, uh, just a quick overview of. Sort of the, the memory, uh, graphs and what gets collected in the, uh, performance monitoring tools that I offer.

Uh, again, these are all free. They’re over on GitHub. You can go download them and use them. And, uh, if you have any problems or issues, uh, or questions or anything like that feedback, uh, just file a GitHub issue for it and I will get it taken care of just as soon as humanly possible.

All right. Thank you for watching.

 

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

bit OBSCENE Episode 12: Interesting Inserts

bit OBSCENE Episode 12: Interesting Inserts


Summary

In this video, Joe Obbish and I delve into a fascinating discussion with Joe Obisch from Someplace, Somewhere, about an intriguing question that popped up on Stack Exchange. The question revolves around generating sequences in SQL Server for partitioned data types, and it highlights the challenges of handling high concurrency while maintaining performance. We explore various approaches to solving this problem, including using sequence objects, which are a relatively new feature introduced in SQL Server 2012. While we discuss the pros and cons of these methods, we also touch on the broader context of research and how to approach similar problems when faced with limited information or outdated systems. This conversation not only provides valuable insights for database administrators but also offers a glimpse into the thought processes behind crafting effective questions and answers in the tech community.

Chapters

Full Transcript

Erik Darling here with Darling Data, joined by Joe Obbish of Someplace, Somewhere. Am I still in line, Joe? Shouldn’t I be? Oh, yeah. No, no, no. I’ll change it. God. You know, you interrupt me at the worst times. Let me insert Joe now.
All right. Before I was so rudely interrupted. Welcome back to the Bit Obscene radio program brought to you by Pistachio Poppy toothpaste. It is very, very pistachio flavored and has a very, very pleasant green color on the toothpaste, on the toothbrush, rather.
This is not my actual toothbrush. This is a demo toothbrush that I use for cleaning shoes and other things that I don’t want to put in my mouth. Anyway, thank you, Pistachio Poppy. I don’t think that was a very good endorsement, but I’m sure they’ll forgive us, right?
Pistachio Poppy. It tastes a lot like pistachios. All right. Anyway, Joe had an interesting question pop up on his Stack Exchange. He got it. He got it sent directly to his email from Stack Exchange.
That’s true. And he answered it. And Joe would like to talk about how interesting this insert question is. Well, it’s not only about my answer.
I felt like the question itself and some of the answers provided a good learning opportunity. Really? And considering, like, nowadays, all this stuff is just used by AI agents that are trying to destroy the world.
But at least, like, you know, get some positive value out of our questions and answers while we can. Sure. And everybody’s looking for easy content.
That’s actually the hardest part of this, not recording them, but figuring out what to talk about. Which is weird because we normally have no idea what to figure out talking about. I think I’m viewing my browser.
You sure are. Yeah. See, this is where you kill me. So this whole, like, I was like, when you said you wanted to talk about this, I was like, and I saw the open browser tab in the last video. I was like.
That was a preview. It was. And I was like, ah, Joe’s going to go look at it. And while he’s looking at it, I’m going to reward the bounty and his score is going to go up 500 bucks. But you ruined it. And now it’s not going to happen.
That’s okay. Because if you scroll up, you can see my generosity. I mean, I’m not a stressed out person because I’m using Microsoft Azure products. Oh, okay.
I’m just all like, I’m just calm. I’m in a Zen state. Yep. I’m not worried about it. This really does smell like pistachios. It’s crazy. It’s like pistachio frosting.
I hope one day in the future, we have podcasts that, that present smells too. Me too. Only. Me too.
Everyone should smell this is a delightful aroma. So I assume that some of our audience are like running marathons or shoveling snow or whatever.
They’re not actually reading the question. So I feel that, well, first let me follow good, good practices and make it bigger. Right? Yeah. That’s a good idea.
Because everyone, everyone listening to a radio show thinks, ah, I can’t wait to read this screen. That’s right. So, so really, I think it’s a somewhat common problem. The question asker effectively needs a sequence, but it’s partitioned by another common table.
So his two types as an example are invoice and the receipt type. And the sequence is one, two, three for invoice and one, two for receipt. So, you know, think of it as like many sequence objects in the same table with one per type.
And he says that procedures execute very frequently over a hundred thousand times an hour. And he’s, he’s very, very focused on it being concurrency safe and it performing well, which, you know, fair enough.
He, he also presented his attempt at solving the problem, which is not, which is not handle concurrency at all, which he confesses. Um, and then once again, he talks about how, you know, he, he has really heavy concurrency and he’s a scale well, and so on and so forth.
Now there are a few things that stand out to me here. Um, well, first of all, in terms of like, is this a good question? I mean, considering the typical question is how is Babby formed?
This is like an excellent question, right? It’s not a perfect question, but it’s an excellent question. Um, he has a table definitions and example data, the, uh, code he tried.
He explains the problem. He tries to lay out his requirements. You know, he’s very concerned about high throughput and high volume and, and so on.
Now, in terms of making this question better, it would be good to define, like, if gaps are okay. Like, um, I also think that, I mean, just speaking for myself, I usually Google a problem or search for it on the website before asking a question.
Some, some people don’t, and more part of them, questions have to come from somewhere. But it would be good to know why he’s not using, uh, sequence objects. And I can guess the reason, like maybe has a very large number of distinct document types.
Like he has a hundred different document types. He doesn’t want to create a hundred sequences, or maybe the list of document types is, is dynamic or unknown. And he doesn’t know all the lists and he doesn’t want to dynamically create sequence types.
Now the great thing about stack exchange as opposed to real problems is you can like pretend what the question is. Like you can make the question be whatever you want. And sometimes I will find some ambiguity in a question and tilt it towards being more interesting.
And I’ll answer what I think is interesting, which is probably helpful to someone, but not necessarily the person himself. One thing I would like Eric’s opinion on is. Oh boy.
Oh, he, he frequently talks about his scalability and, and one of the comments. Yeah. He says that he has a hundred thousand logged in users at the same time. Now I am not as promiscuous as you when it comes to SQL servers.
Have you ever experienced such a workload? Okay. So not on a single SQL Server.
Um, I’d also like to maybe different, maybe like, I don’t know, there, there’s some terminology in there that leads me to believe that there is perhaps some, I don’t know. Hmm. It feels misleading to me.
So a hundred thousand logged in users, right? Um, okay. So maybe there are a hundred thousand users logged into a website, like me, probably not all actively hitting the database server. If you have that many users, you probably have enough developers to set up a reasonably good caching layer.
You probably have other databases that have, that do other things in there. And SQL Server just handles the transactional end of it. When there is a transaction, you know, like Amazon has completely different stuff for like product search than it does for like shopping cart and checkout, right?
It’s like different systems. So they might have a hundred thousand users logged into like whatever web, whatever website hitting other database services, but a hundred thousand active users on a single SQL Server. Do you have any idea what thread pool is out of your mind?
If like, no, that is bonkers to me. Yeah. Like if you just take a literally, like, I don’t know how big of a server you need to even have a hundred thousand active connections, even if they aren’t doing anything.
All right. It feels like it would have to be pretty big. Um, I would personally be very interested in seeing that server.
Modern hardware is a lot better than I thought in terms of getting very wide sockets. Like you can have a, what was it like, I think like 256 physical core, two socket server nowadays, but divide that by a hundred thousand users. It’s everyone gets a very, very small, uh, portion of one CPU.
The other thing that’s worried about it is if you’re actually living in this business environment, wouldn’t you be like super good at handling concurrency? Right. Right.
And you wouldn’t be like asking questions to stack exchange or. Okay. So let me ask you, let me, let me, let me throw this one at you. Um, if like you look at that code and does that look like the code of someone who runs a SQL Server with a hundred thousand logged in users?
Can you imagine? Like if, if that’s an example of like the starting code that they’re like, this is how it’s going to work. Um, man, I, I weep.
I weep. Yeah. I could be making a lot of money off these people. Yeah. I was just taking that. I could be, I should be by all rights, making a lot of money off whatever this situation is. But, uh, you’ve got to, uh, you’ve got to try this guy down.
Yeah. Yeah. What, what is there? What’s in his profile? Like, is there an email address? Cause life is a journey.
Yeah, sure is. Yeah. Um, yeah, I mean, it, it does feel suspicious, but like I said before, you know, if we can, make questions more interesting for ourselves, it’s, it’s a bit more fun.
Right. So I don’t have a problem with that. Um, I don’t believe it. I don’t think it’s relevant. I, I will, I will give them credit for actually specifying a target, even if it’s wrong or misleading.
Cause you know, as, as, as you’re well aware, like it needs to be fast. It needs to perform. Well, it needs to scale.
Well, we’re running it super frequently, right? Like you’re often to get that level of detail as opposed to a number. So I will give props for that. And maybe we should be more considerate to this poor suffering developer who has a hundred thousand active users at all times.
Cause it sounds like his professional life is pretty tough. I’m, I’m ready to get to the answers. If, if you are.
Oh yeah. Let’s do it. Let’s see where I want to go to first. So first one, use a sequence object or type. It’s a, I mean, it’s, it’s, it’s an answer.
Um, I think it works well enough. If, if there’s a relatively small set list of types, which could be true is probably true. Um, could have been good to talk about the cash for, for sequences.
Do you remember sequences cash by default? Uh, it’s like a thousand, I think. Okay.
Well, a thousand or 10,000 or something. If there is a 10,000 default cash, then. No, you know what? It’s not. There’s no way it’s 10,000. Let me see. From what I remember.
I think it’s a thousand. I, I, I think it doesn’t cash by default, but I, I’m willing to be proved wrong in real time. Uh, okay. So if the cash option is enabled without specifying a cash size, the database engine selects a size.
However, users shouldn’t rely upon the selection being consistent. Microsoft might change the method of calculating the cash size without notice. Oh yes.
But it defaults to cash. So, okay. So it defaults. Yeah. So it says it defaults to cash being on. Uh, but then if you don’t say what cash size you want, Microsoft’s like, roll that dice. Yeah.
It’s, it’s the way it lived dangerously. You know, this, this with this hundred thousand active users, he should probably specify that a cash. Cause I, uh, do doubt that Microsoft is, is picking their cash with a hundred thousand active users in mind.
Right. Yeah, absolutely. I mean, think of the licensing. Yeah. Um, yeah, but you know, overall, like if, if the guy doesn’t know about sequences, this could be a very bad lines for him because he, he does specify in some of the comments that he’s okay with gaps.
So maybe the solution is good. Um, I think I would like put this in a variable or a case statement or something. So we’re not duplicating all the code, but I’m just kind of quibbling at this point.
Anything to say about the very boring, but safe answer here? Uh, no, I mean, it’s fine. I’ve, I’ve not, I’m not shamefully, but, um, I’ve, I’ve done similar things to give people a, you know, unique thing per thing in a table.
Um, you know, it’s, it’s, it’s a lot easier than a lot of like before sequences, you know, like a lot of this stuff had to be done either with a cursor or with a window function in ways that were very unfun. But, you know, um, uh, so I, I, I, when the sequence objects came out, you know, like this, that, like that was the one use that I ever found for them, at least in client work. Yeah. So maybe that was the solution.
Of course. Yeah. What’s annoying. So like in your, in your answer. Well, which is it, which is a great answer, by the way. Well, thank you.
Um, you, you linked to the Paul white post on sequence tables. And every time I’ve seen a client using sequence tables, it’s always been really messed up and like a lot of locking and blocking around them. And I would show, I would show them two things.
Paul white’s article. And the fact that sequence objects exist and then never do anything about it ever. It was like, no, we can’t touch that part of the code. It’s too scary.
It’s too old. Every, like if that breaks, like we’ll go out of business. Like this is all screwed. Like, no. I don’t know. Like, I guess it’s on the one hand, it’s nice that this person is asking before they screw everything up. On the other hand, I looking at their code.
I think it’s going to all be screwed up anyway. This, this does go back to what I wanted. One of the things I wanted to talk about, which is like how to research and answer your question.
So no shade at this guy. Maybe he’s didn’t know about sequence objects and had no way of knowing. Maybe his work VPN doesn’t let him Google things.
I don’t know. But if you’re aware of sequences and they don’t work for your use case, in my mind, a logical thing is like, okay, well, this was a feature added in SQL Server 2012. What did people do before the sequence was added?
Because the problem didn’t, you know, come into existence in 2012. Right? Like it was.
No. All the code that I see from this is from like the early 2000s. Yeah. Then I thought when I was trying to answer the question, I was trying to make them more interesting. Well, like maybe he has a thousand different types or an unknown number of types or he, you know, he has a good reason that you sequence.
Well, what did the old, you know, wizards, you know, back in the day, like, how, like, how did they solve this? Because I’m sure there are many bad solutions that are posted and with a couple of good solutions. And yeah, you can’t really go wrong with, with Paul White.
And there is a lot more in here that’s needed for the question. But sure. I used it as a starting point because I was too lazy to do anything else.
Honestly, like, you know, there are a select group of people who, every time I need to do something, I am like, I always Google their name and that thing. Because I know that it will, like, if I find anything that they did with it, it will give me a much better starting point than all of the stuff that I would have had to, like, break my brain on along the way. So 100% no problem with that, like, yeah, no, I’d rather, I’d rather start from like, the 10th floor than start from the ground floor and screw up for 10 floors.
Or you could be starting from the basement, depending on what you’re looking at. Yeah, you know, I don’t know that. I think it’s an insult to basements, Joe.
I don’t know, I think. Anyway, we have an answer that that comprimes it as the correct way to guarantee. With guarantee in bold, which sounds like a very strong guarantee, right?
Oh, I mean, it’s guaranteed to deadlock like a son of a bitch. Yeah, it feels a bit too narrow, like answering the literal question. And the guy was very concerned about scalability.
That’s what he said like five times. And, you know, serializable isn’t very scalable. I would say it’s anti-scalable.
Especially serializable with an up-delock on that. That’s gonna sting. Well, and presumably this table is being used for other things too, right? Yeah.
So you’re just, I mean. Documents sounds like a well-trafficked table to me. Yeah. I’m not much of a locking guy, but this means that no other session can even select from the query. Right?
I mean, depending on, you know, indexing and, you know, where other queries are hitting in the index, there could be a significant amount of locking there. But, you know, when you look at the query itself, right? Like, you know, you have, like, your where clause is on document type and you’re getting a max serial number every time.
If you don’t have very good indexes in place to support that, that’s not gonna be fun for you at all. Yeah. And to be fair, the index is mentioned here.
No, I know. So, like, I’m just saying. Just to clarify for the audience and not totally me. The serializable hint for this query is applied.
It’s not applied to a table level lock, but it’s applied to the locks that are otherwise taken. Right. Is that correct?
Correct. So serializable would be maybe one row or, like, not many rows? But, like, I think you have a hard time guaranteeing that.
Also, like, with that index in place, you would probably be okay. Assuming that document type is unique, which I don’t know. Because, like, if someone’s, like, storing a table of documents and they store a document type, then it’s usually not unique.
And, but, like, the combination of document type and serial number might be. So, like, maybe, but, like, I still think you’d have a real tough time, like, figuring out if you were getting, like, row level locks or page level locks. I don’t think this would be a situation where you’d escalate to a table lock, but, like, like, either, like, row or page, you would end up, you could end up jamming things up pretty good.
Especially if there’s, like, just a whole buttload of these running at once. Right? Yeah.
Or if you had, like, just this random reporting queries on the table that are looking at who knows. Yeah. Scanning. Scanning, right? Yeah. But you can always put no lockings on those.
I’m sure that the 100,000 active user database is very well tuned. So you’re not going to, you know, you’re not going to have those. One can only assume.
Yeah, I mean, like, maybe it works in practice. I would be very wary of this. You know, I would say this is, again, a non-fun answer to the question, which, to be fair, can be often useful. For people.
Yeah. But, you know. Whenever I go to stack exchange, I’m generally looking at. Yeah. But I think, you know, what’s interesting here is, like, you know, a serializable up-delock that does give you, like, a boatload of pretty reasonable guarantees under the default recommitted locking isolation level. If you’re using a different isolation level, like a row versioning isolation level, like RCSI or snapshot, you kind of lose out on a little bit of that with other select queries.
Like, maybe not like other ones of these because of the hints that are there, but other queries might, you know, start seeing some weird stuff. And what was the other thing I was going to say? It was a very useful point.
Paul White’s post with the sequence tables. He doesn’t even use serializable or up-delock. He only goes to read committed lock just in case you’re using RCSI, which I think is a very, very interesting thing. Like, like, because, you know, I assume whatever Paul says is right.
So, like, because what am I going to argue with? Yeah. And it makes sense to me, my very narrow understanding of lock ink. The fact that you could use read committed for Paul’s code and you need serializable.
Yep. I’m the next answer from Steve, who says, design-wise, that is quite clearly idiocy. How does Steve have no upvotes?
Hang on. Hang on. Hold up. Steve. Maybe his upvotes were canceled by downvotes. You know? Who knows? Steve, I’m sorry I missed you.
Hang on. We’re going to watch Steve get upvoted in real time. Bam. Yes. This is, you know, this is part of why it’s important to be as clear as you can about, oh, can I have gaps?
Like, where are my exact rules? And so on. Because we have to, you know, write conditional answers based on this.
I think the basic point here is if it’s going to show without gaps, it’s going to require like a resource which is being locked in some way and that limits scalability, which is true. Yeah. But there isn’t any way to really to get from that to this because computers are faster than some people think.
Yeah. And if you have the right design, you can quite easily pump out way more than a hundred thousand locks an hour. Yeah.
No, that’s not a problem. Even if you were using like SP get app lock to manage concurrency for this, like, I can’t imagine, like, that’s not hard. Yeah. Yeah.
SP app lock does a lot of overhead. It does. Compared to Segal locks. Yeah. But I think that you could get to that level. I agree with you. Yeah.
I wouldn’t use SP get app lock. Right. Just saying. But like I’m saying, even if you were like, like, I got to use SP get app lock for something. Here goes nothing. I think you could do it.
Like, you’d have to be pretty bad at stuff. And before, um, I, I think it’s good to, to keep your options in mind in terms of like, well, I’m trying to store something. They’re going to start going to calculate instead.
The main issue had what this answer is. So, so in this answer, he’s using a row number to calculate the serial number for the people who are, who are shoveling snow or doing whatever else while listening. If a row gets deleted from the table, like now your serial numbers are changing, which doesn’t feel like a good property to have for a serial number.
Yeah. So I do feel like it’s kind of done the water for this use case. I mean, another, I think another thing that’s missing is that, um, for, okay.
So maybe, maybe not here, but in a lot of cases where I have used row number for this sort of generation, uh, you always have to go find the current max anyway, and then add that to whatever row number you’re generating. Cause like, that might not be necessary here based on some of this stuff, but like a lot of the times, like you are like, you have to go find like what’s the current high one anyway. And then add that to whatever row number you’re starting at so that it gets the next bunch.
Um, like I realized this is a little bit different cause I think it’s a unique group for a new document ID going in there or something, or like, like something along those lines. But like a lot of times I have to go, a lot of times when I’ve done this, it’s been like, well, like we like lift off here. It’s like, you’re almost like remaking an identity, but like kind of faster.
There might be a use for this. If you wanted to be really slick in that, say you don’t calculate it on insert, have a background job, which updates it. Yeah. And then, and then you use this to calculate it for the rows that haven’t been processed yet by the background job.
So it’d be a very small number of rows, you’d remove the scalability bottleneck. So such as it is. And since the life cycle of, you know, going from null to assigned is pretty short, you’re probably getting away with the, well, if I delete a row with what else happens.
If you’re not doing something like that, then this also just gets slow over time because you’re going to have more and more rows. You know, like he was saying higher thousand executions an hour, that’s like almost a billion a year. If it’s 24 seven, you know, it’s questionable if that’s true, but as, as, as all of us know, of course, this type of query where you’re basically, you know, you’re forced to scan.
All of the relevant rows in order, if you have an index, great, but that’s still going to be a, probably just a serial scan of. Yeah. That index in order.
And as you’re getting into billions and billions of rows, it’s not going to be particularly quick or, or maybe the more important thing is to say that it is going to get slower and slower time, which. Right. Isn’t performance will degrade.
Yeah. It’s something that I like for the solutions that I’m designing. Yeah. But like, and then, and then, you know, if you, if you do something like that, you know, the, the place most people are going to go with how to, how to keep that up as quickly as possible is most likely going to be a trigger. And then you’re dealing with someone who maybe doesn’t like have a good grasp of concurrency and edge cases and other fun things might write some rather unsafe trigger code and screw some stuff up along the way.
But, you know, perhaps, perhaps everyone is, you know, much better educated and making much more well-informed decisions than I’m used to seeing out in the world. Everybody knows you only ever insert a single row at a time. That’s true.
Makes the, it makes the trigger code simple. So let me ask you a question. If any people only ever insert one row at a time, why is it always in a loop? Like, why does the trigger always have to have a loop in it?
It’s insane. It’s people. Anyway, go on. Time to get to the answer. The answer.
The answer that I scrolled past, which, which is my answer. This is my answer. Um, and like, like I said, I was trying to have fun here. Uh, this is a quote from, I forget the guy’s name.
The quote was more memorable than the name. That’s always a good, good, good thing. Uh, Slava. Slava Ox?
Yes. Oh, right. That was in your columnstore presentation. Yeah. Yeah. That’s right. Um, it’s, uh, still serving me. Yeah.
Like main point here being, if you say I need sequences and I can never have gaps and I must have a truly strict ascending order. Um, that’s harder to do that limit scalability because you have to hold locks for a longer period of time, right?
Like for that type of no gap approach, you can’t use sequences. Yeah. Yeah. So you’re going to need some kind of table based, uh, sequence table and the transaction has to include both getting the sequence number and the insert.
Yeah. Which is less scalable than being able to break those, those transactions up. So the thing I’m advocating for is like, you, like you probably don’t actually need, like the way out, the way out phrases a requirement is unique integers that start small and generally trend up with the occasional gap being acceptable.
And the reason that I, I did that switch is I want transactions that are as short as possible. And, you know, like I, I, I want to make my sequence table to be as scalable as possible. The reason I went to a sequence table is there was already an answer for sequences.
Sequences are, are, are boring, but maybe has 10,000 document types. Who wants to create 10,000 sequences? That’s not a good time.
Not me. Yeah. Not you. Yeah. Do you remember the time, what were you doing? You were, you were trying to create like a whole bunch of things all at once. And it just like dragged the server down.
You were trying to create like a thousand tables or something. And like, every time, every time you added, every time you created a new table, it was like 10 seconds slower after some point.
Was it tape? Was it schemas or tables? It was some, it was one of those things. I remember, I only remember it a little bit, but it was, it was, it was one of the most hysterical things. I think the thing that I’m thinking of is, which might be what you’re thinking of.
I had a very busy server and it would take like a minute to create a table just because of the latches that were needed. And there was just so much signal wave.
No, I remember that. Yeah. That was a, that was a different one. That was a different one. Yeah. I don’t know if I remember that one. Okay.
I get a little bit more coming back to me. You were building a dynamic string that would have created like 10,000 either schemas or tables, but like the bigger the string got, the longer the whole thing took or something like that.
And at some point, just building the string past a certain point started taking a long time. Does any of that ring a bell? I have no idea why I would be doing that or. Maybe it was like partitions or something like that.
I don’t know. Anyway. It could have been partitions. I think it was, maybe it was partitioning related. I might’ve been experimenting with like having 10,000 partitions actually on a table just to, just to see what would happen.
Cause I do remember there being a design discussion around like, cause I don’t, as I think they increased the limit and then the max partitions at some point making it 15,000. Yeah.
But that was a long time ago. Yeah. Right. But like I was new to partitioning. I was trying to find guides in terms of like how many partitions is too many. I think it went from a thousand to 15,000.
If I’m. Something like maybe like a thousand 20. And it was, it was surprisingly hard to find like any real data or like anything written about it.
So that, that might’ve been it, but I honestly don’t, don’t remember. Um, so in the transcript, I’m sure. Somewhere.
Going back to this. Clearly you’re, you’re, you’re, you’re going to want a big answer. Our. Biggest ins you can get. Very scalable, highly concurrent workload with many rows and many executions. So the thing that I did was using my relaxed problem statement.
I don’t think there’s anything too fancy here. Store procedure to give you the serial number based on document type update the sequence number table and increment by one.
If the update doesn’t do anything, then that document type has never been seen before. Right. So let me, let me ask you a question.
How come you didn’t use, um, like row count big or something instead of the serial number? You, you, uh, could, um, I don’t really have a, have a reason for it. I just didn’t know if like you tried it and it sucked or something.
Oh no, no. Just like. Serial number. Summer was right there. And I thought. Why not? Use it, abuse it, overload it. Yeah.
Like, uh, to your point, the more common pattern for this type of thing would be to use for a row number. But since we had this convenient variable right here, I just used it. Um, then I’m inserting into the table with using UP lock, which I think you need to actually make this work, uh, to prevent duplicates.
Yeah. Uh, yeah. So, uh, I think, I think that, I think you might actually need, you might need serializable for that, but.
Oh, well. I can leave a comment. Um, and in, uh, this case, I’d be fine with using serializable just because, you know, like the insert workload, the insert creation runs so infrequently for this.
Right. You would hope so. Yeah. I mean, well, like if that isn’t true, then you basically have a bunch of types to have like one row.
Well, if it, if it isn’t true, then you just, if it isn’t true, then you just reverse the order. Yeah. Yeah. Sure. That is also a fair point. Um, but well, wait, no, I’m totally right.
Cause you have a hundred thousand executions an hour. You’re not going to have like, like a billion document types a year. Right. Hmm. I don’t know what kind of documents they get. Hmm.
You never know. I suppose. Never know. Yeah. Is it advisable? Maybe it doesn’t. Who knows? Um, only one way to find out, run it. Once again, I am not, well, I, I ran it and worked fine at my machine.
So there you go. Uh, I think normally you have a transaction here. I don’t have one because I don’t actually care which session wins.
I think that like, you know, this is a bit of a technical point, but in terms of defining, like what the right order is, it’s kind of a meaningless statement. Mm hmm.
And that’s like, you could have session a call this procedure, you know, one microsecond before session B, but then session a immediately gets off the scheduler. And then like session B executes this code first.
And like that session wins. Like it ends up like not really being, no, they’re a million meaningful thing in terms of like, like, like, like, I feel like there isn’t really a well-defined order here that it’s meaningful.
So, so that’s why I don’t have any transactions here because it’s, it just seems it does not practically matter in like any business sense. Yeah. So then after square is run, we know that some session inserted, hopefully one row and there’s only one row.
And now we have our one row. We can just blindly update it. Yay. That’s how we’re getting our serial number document type. And on my low end machine, I have eight physical cores.
I ran 16 threads and I was getting about 40 million executions an hour. How many hours did you let it run for? 0.016 hours.
Ah, okay. So after scaled up. Okay. So, um, we don’t actually know in the, in the longterm that this would continue to be scalable.
I think we do, but Hey, if you, if you, if you want to run overnight and feel free. You have Azure. You’re the one who can just run things.
Do I have Azure? Yeah. You said you did. You said that was making it calm. That’s true. Yeah. You can’t take that back now, Joe. It’s true.
All right. It’s not like we can edit the video or anything. We must do our, we must do our closing statements here because I have to pick a kid up in about 10 minutes. So, um, I do have to finish this. All right.
So closing statements. Eric rudely challenged me to make this faster than a sequence, which I think is very unfair because I made you a better person. Sequences are custom designed to do one simple thing and very optimized for it.
Because, you know, tables have all kinds of baggage and locks and latches and all kinds of transaction logs and so on. Now I gave it a try.
I failed. I did get really close though. And the way I, I, I gave it a try was, you know, like given our very relaxed, uh, definition for how we’re assigning the ideas, we can actually hash the table.
So take the session ID. Modulo 24. So now instead of one row, which is being locked by everyone for the document type, we have up to 24 rows and it is way more concurrence.
In fact, it’s so concurrent to the point where if, if, if I changed the workload and only ran my new procedure, which uses the, the, the version that that has the hash key added to the primary key, there effectively is no waiting on latches or locks.
It is solely right log, which I can figure out how to do. Um, I, I tried some in-memory LTP and it ended up bombing out due to right conflicts. I feel, I feel like in-memory LTP isn’t really great here.
Cause you kind of, you kind of like need, you kind of need some lock or that latch. Yeah. Like it’s not like you want like snapshot isolation.
It doesn’t make any sense for this. No, no. Right. And, and, and the table didn’t needs to be written eventually anyway. Like it can’t be schema only or I don’t know what it’s called. Like it can’t be schema only.
So I don’t like, like maybe there’s some way to make a memory LTP worked. Uh, one thing I noticed in my very brief attempt was if you want it natively compiled, you, you like, can’t do like serializable hints or tab lock hints or it’s just not compatible.
Yeah. So it probably couldn’t be natively compiled and you had to figure out the right conflicts. And I don’t know, I couldn’t get that to work.
Maybe someone could, but with my row based attempts, I did get within like, what is that? Like, not like 93% of the. Buddy, you want me to figure that out?
Of the rate for sequences. Um, so 166 million per hour for my eight physical core machine. So I do think that we have achieved the scalability needed for the question and answer.
Well done, Joe. Well done. And on that note, uh, I’m going to go get a child from the outside world.
Uh, thank you all for listening and or watching to the bit obscene radio program. Uh, if you’re, if you’re sitting, uh, at home or in the car, uh, you can safely do it. Give, give Joe a round of applause for nearly beating sequence objects at their own game.
Well done, Joe. Do you have any, any parting words for us? Maybe, um, someone else’s hardware.
My code is, is actually faster. That’s what I’m, uh, coping with, you know, cool. Maybe, maybe it’s faster on your hardware. Maybe.
Well, we will never know because you have still haven’t put the table definition that you used in there. Uh huh. So as soon as, as soon as we have that information, the, the darling data foundry will, will put that code to the test. All right.
Thank you for watching. Thank you, Joe. And, uh, we will see you all next time. Uh, as long as our pistachio poppy sponsor continues sending us toothpaste so that, um, I don’t, I don’t know what I’m going to do with 10,000 tubes of pistachio toothpaste, Joe, but I’ll figure something out. All right.
Goodbye.

 

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

bit OBSCENE Episode 11: Scalar UDF Inlining

bit OBSCENE Episode 11: Scalar UDF Inlining


Summary

In this video, Joe Obbish and I delve into the topic of scalar UDF inlining, discussing its relevance and impact on SQL Server. After some contractual disputes that temporarily sidelined our Bit Obscene podcast, we revisited the subject with a fresh perspective. I share my initial impressions from five years ago, noting how the landscape has evolved significantly since then. The video explores the challenges and limitations of scalar UDF inlining, emphasizing its fragility and the complex rules developers must adhere to for successful inlining. We also touch on the broader implications for existing codebases versus new development, arguing that while it might offer a free performance win for legacy systems, it’s not as compelling for modern database development practices.

Chapters

  • *00:00:00* – Introduction
  • *00:01:33* – Sponsor Announcement
  • *00:02:46* – Scalar UDF and Compilation Issues
  • *00:07:12* – Reporting Privately vs Publicly
  • *00:10:54* – Query Optimization Rules
  • *00:18:04* – CTE and Inlineable Functions
  • *00:23:05* – Table Valued Functions
  • *00:26:14* – UDF Inlining Constraints
  • *00:29:32* – General Take on Scalar UDF and Inlining

Full Transcript

Hello, and welcome to another episode of the Bit Obscene podcast. There were some contractual disputes between myself and the temporary guest co-host that I stupidly let form a union, but we’re all settled, we’re all resolved now. The union bosses have driven off in their Cadillacs, and we are ready to record again. So with that out of the way, uh, welcome temporary guest co-host and union benefactor, Joe Obbish, uh, to the, to the show. I have to rename Joe in a moment. I just noticed that. And, uh, in this episode, we’re going to discuss Scalar UDF inlining. Uh, maybe, maybe, maybe not the hottest topic on the block, since it is about six years old at this point, but, you know, there’s always, always room to discuss. So here we are. And, uh, Joe, take it away. I’m sure there’s someone upgrading to SQL Server 2019, and it’s brand new technology to him. I did hear we got some complaints about the sponsor. I don’t know if you’ve been able to get the, uh, pistachio toothpaste as a sponsor yet.
Uh, oh, well, would you like to do your proud demonstration? Uh, no, you know what? I’ll, I’ll, I’ll save that. If you would warn me that you were going to make me like use props in this, I would have go gotten, gotten, gotten it. But here we are. So here you are sabotaging me when I’m not sabotaging you. We need to make sure our sponsors get their, get their money’s worth, right? So, okay. So we are brought to you by pistachio poppy toothpaste. Uh, it tastes like pistachios. Uh, I’ve heard that nine out of 10 DBAs say that it improves their, uh, query tuning.
Yeah. Uh, it also makes you a good kisser. So it depends on, depends on your, depends on your priorities there. I remember, uh, what do you remember, Joe? I remember my first impression was scalar UDS. And I think I remember you like being really excited about it and like reading like white papers, like the research papers that were, that really wasn’t for me. Um, uh, uh, uh, I don’t like white papers so much, but we had tried it on a preview version of SQL storage 2019.
Yeah. Yeah. I remember it used to be cool and get all sorts of preview stuff. Yeah. I used to be a big deal. Um, mighty have fallen. And the thing that we noticed was we had a function. It was about like 50 lines long. I think, uh, basically a series of if this return this, if this return this and so on.
Um, and it added 30 seconds of compile time with scalar UDF and lining being enabled. Yeah. So as, as, uh, good SQL Samaritans, we tracked down the, the, uh, scalar UDF guy.
Yeah. Who I don’t remember anymore. And we emailed them.
Power thing. And I, I, I think this was in like January of 2019. So about a year before release. And he said that they had much bigger issues to solve than that, which to be fair. Turned out to be true.
Yeah. Like, I mean, sure. You know, like that could definitely be true. Um, personally, I think 30 seconds of extra compile is pretty bad, but yeah, you know, it’s sure he had other things on his mind.
Um, I think we, I think we followed up six months later and didn’t get a response. And we were like, well, whatever we tried. Um, yeah.
And SQL Server 2019 has glorious debut. Yeah. It’s full release. And some of our customers were going to use it and they wanted our guidance on, you know, should this be turned on or not.
And we found that our 30 second compile issue had not been addressed because, you know, there are more important things to address. And my colleague at the time, who was also a good, who’s also a SQL good Samaritan or a good SQL Samaritan or whatever.
Um, he had like a, I think like a blog post about it, um, which, you know, like as a database professional, I feel that it is good for the community to let them know your learnings and, you know, to maybe issue some warnings like, Hey, this brand new feature might not be quite there yet because it wasn’t, you know, 30 seconds of compile time is a long time.
If Eric and I just stopped talking for 30 seconds to the demo of that, it would be really awkward and boring for everyone. Yeah. Um, so we’re not going to do that. Um, anyway, some overrated consultants took personal offense.
Ooh, at my colleague’s warning and accused him of SQL cloud chasing, whatever. Oh, right. Yeah.
I remember that. I don’t know what that means. I’ve, I, I, I, I, yeah, I don’t know. Well, there used to, there used to be this website called cloud that you could use to, uh, sort of get your social media credit score.
Perhaps that’s what they were referring to. Could be. Um, it was close with a K. Oh, well, surprisingly, he, he, he wasn’t the only one. And there was a bit of like a, like a mob with pitchforks.
Oh, I think. Really? Do they all have, do they all have MVP after their name? Cause if so, they can beat it. It’s, uh, it’s possible.
Um, yeah, I, I, I, I think the critique goes, Oh, well, you should have contacted Microsoft. Microsoft. Right. Yeah.
Let them know. And, you know, they totally would have fixed it. Like, why are you. Yeah. People act like, like that, that doesn’t happen. And it goes nowhere. Like a number of things I’ve like, okay. So like the number of things I’ve content contacted Microsoft about that, they’re like nothing.
Just like crickets on, um, you know, like, like, like when Joe Sack was there, it was different. When Joe Sack was there, it could be like, Hey Joe, this is weird and interesting. Check this out.
And like, it might not get like immediate traction and might not get like fixed in like the next CU. Uh, and to be fair, Joe did backlog me on a couple of things, but like, for the most part, when I was like, Hey, this is weird.
Joe would be like, like, it has an engineer. I’d be like, like that was, that was the, that was like Microsoft action. I don’t know who’s there now, but I don’t know. It’s all fabricating themselves.
Oh, I’m sure if you had an MVP by your name, they would take you more serious. Yeah. It’s possible. Not all of us do. Um, so that was a very weird thing.
I, I had to play peacemaker of all, like me, of all people. And I was like, Hey, um, Hey, Microsoft person who is in swept up. And outrage, we actually like did let, you know, let’s like, Oh, like calm down and be like good partners.
And I think it got fixed in a senior too, which, you know, if, if, if Microsoft adjacent people want to extol the virtues of reporting privately and not publicly, it’s not really a, it’s not really a good, uh, case study.
I also realized that I totally told that story in the wrong way. I could have had like, like the, the, the dramatic reveal, you know, if, if, if, if I had started with the 2019 release, but, uh, Oh, well, we’re, we’re, we’re, we’re not going to record this again.
It’s, you know, no, no, we’re, we’re, we’re, we’re too far into it now for sure. Plus if, if, if, if we, if you get overtime, the union is going to make me pay you double for it.
So I can’t, can’t afford that these days. AI is, AI is taking everyone’s money. I believe I am sharing my web browser. Do you, do you, I see, I see, I, I do think you are doing that now, you know, to be fair, I only gave a first impression, which is like five years ago.
Um, that list has grown significantly over the past five years. Yeah, it has. Cause like, I remember like doing this page a long time ago and I don’t remember it being very long and now it’s like.
Yeah. Much longer. And it’s not even a complete list. You have to go to some KB, which I’m surprised they haven’t like deleted yet. Yeah.
There are some things in this KB that aren’t in there and so on. Uh, so, so the thing that really, I know what you’re going for. Yeah.
The thing that really, uh, sticks out to me is I feel like general developers have like, uh, have like a small bucket that, that you can fill with database knowledge. Some of them have a very, very small bucket.
And you know, like if I can, if, if I can only teach a developer, like a few things about databases per year, one of the things that I don’t want to teach is, Hey, if you want to make UDF, which can be inlined, it’s simple.
You just have to like follow these like 30 rules and then, then, then, then the queries to call it have like 10 rules and these can change over time. So, you know, be sure to bookmark this page and, uh, it’ll be great.
Right. Like, like, like, I want them focusing on like big, impactful, wide ranging things and not, you know, correct.
This, um, is like, you know, even if someone does make a successfully inlineable UDF, which can also be successfully inline based on the query definition, which the optimizer also chooses to inline when it’s compiling the plan.
Like I hope, like I want developers thinking in, in terms of sets, right. Like that, you know, oh, I’m going to arrange my declares and sets and my ifs and my else’s correctly to get inline.
Right. Yeah. But inline tilde-like functions, if I can get a developer to create an inline tilde-like function and not a scalar UDF. You have won a battle.
Yes. Yeah. Yeah. The thing is that like, it’s real hard to get developers out of the, like, I need a loop for this mindset or like, I need to return one value. Like I’m just going to write, like, like it’s real hard to break them out of that.
Um, and like, I, I, like a lot of the UDFs that I see out there are not complicated. They’re mostly created out, like out of some weird convenience. So like, you know, uh, like, like the most common one that I see is like, like date formatting.
And I’m like, why, why, but you, but it doesn’t have to be a scalar UDF. And like, like, like, is that really so hard that you need to like encapsulate it? It’s very, very weird to me.
And even if someone does successfully make one of these inlineable scalar UDFs, you know, someone else might come along and make what they think is a small and consequential change. And they could be violating one of these rules or Microsoft could change one of the rules or one of my, my favorite rule on the list is the one about CTE.
Oh yeah. Cause it’s, I was just about to get there. Or, you know, like the UDF isn’t useful unless it’s being used by, by a query and their rules for the query.
So someone could change the query, like, I don’t know, adding a CTEand suddenly none of your scalar UDFs are getting inlined and performance tanks or this set of rules could change. And remember, like, this actually isn’t the full set of rules.
There’s a KB article too. There’s a really, so like, there’s a funny thing about all this too, where it like go up a little bit.
So like where it’s talking about, like, you can’t use string ag and you can’t build strings like, like with the assignment thing. So like the, but so like there’s, there’s actually misleading in here too, because it says that the UDF doesn’t reference XML methods.
So XML methods are things like value and exists and nodes, right? But if you just use, and so like, if you build a string, like a comma separated string in a UDF with just for XML path without like values or anything else in it, it’s in lineable.
But if you use string ag, or if you use the string builder method to do it, it’s not right. It’s like, and stupid. It’s like, just like inconsequentially stupid.
I can’t, I cannot believe some of the weird stuff that they were like, no, you can’t do that, but this is fine. Like, it’s just weird to me. Yeah.
I feel like some of these are reasonable or you get over it. Like, yeah, some of them just like, okay, you get, you’re getting what you deserve if you’re doing it. And I haven’t done any like research here, but based on how this list is arranged, it feels like the list only gets longer over time.
And there’s never an update where they’re like, hey, good news. You can now use XML methods in your UDFs, right? The UDF doesn’t contain a select with order by without a top one class.
Like, um, it’s all funny. So in terms of new developments. Oh, nothing’s getting it.
It would be nice to, well, like in, in terms of new development by database developers or people writing TC code. I certainly feel like I, I, I can’t get to a place where I would say, yeah, for, for, for, for new development, if you want to use a scalar UDF, go for it.
Cause it’s inlineable and it’s like super easy and convenient and it works the way you want to write your code. Like, no, it’s, this seems so fragile. Yeah.
I think a fair word for it. Yeah. No, it’s, it’s, it’s fragile. And, um, you know, uh, excuse me. Like when I first heard about the feature, this list didn’t exist. So like, I was a lot more excited about it because I, cause like, I mean, like really like, like investment into a problem that big, like it’s a big problem space for a lot of, a lot of people who have like, you know, at SQL Server, like code bases get developed over this point, like 20, 25, 30 years back when no one knew how bad this stuff was or like, there weren’t enough rows to see how bad this stuff was.
And so when I first heard about it, I was like, I mean, like a, like Microsoft invested like serious time in this and it’s like, that’s a crazy problem to solve. But then like, you know, like, like everything else, once, once the, once the fine print starts making it out into the, into the, the documentation, you’re like, oh, well, I can’t, I’m not, that’s not going to help any, like even when I see it kick in, like there’s a good chunk of the time where like, and this isn’t scalar UDF end linings fault, but there’s a good chunk of the time where like you had a UDF, it was slow, but like when inlining kicked in, it got slower because like all of a sudden your plan was full of eager index spools off like giant tables.
And it’s like, like, okay, well, like I don’t know if the optimizer in UDF inlining, like didn’t shake, couldn’t quite shake hands on like, Hey, can you make me look good? No, I’m just going to keep doing the eager index pool thing.
I think the point you’re making is where I wanted to get to next, which is, when we think about the, the tens of billions of lines of old TC code, which are never getting updated. Yeah.
Maybe when you upgrade the SQL service 2019, you know, if you have a hundred UDFs, half of them get inlined, your overall CPU goes down, a lot of queries run faster, a few run slower, like maybe it just feels like a free win in that sense, if no one’s touching the code anyway.
And that does feel like the sweet spot for this feature, which, I mean, you know, don’t get me wrong, making old code better for free. Yeah.
Sort of magically. This is certainly good. But if you’re thinking about all, all the new development you plan to do, this, this really doesn’t seem to be a good fit. Uh, cause looking at the rules for the queries, like, you know, I mean like, like some of these, some of these seem pretty weird or easy to avoid.
Like I’ve like, personally, I know that group grouping sets exist. I haven’t used it in like 10 years. I can go another 10 years without using it.
I think. Yeah. Didn’t you have a question about grouping sets on stack exchange? Isn’t that an early question? Like, yeah. Like ancient history. Yeah.
Yeah. That was very old. Um, or like, yeah, like don’t order by UDF. Okay. Yeah. Fair enough. But I mean, like this one, I don’t know what this means exactly. Cause I did try nesting a few UDFs and it worked.
Like, yeah, I don’t know if this is a return for a start procedure or if there’s a long time ago, I wasn’t sure about that. And I’m pretty sure Paul gave an example of it.
Um, but I, I, I, I, I didn’t follow it. Like, like the code was just so strange. I was like, all right, you win. That that’s, that just is what it is.
Um, but one of these things is not like the others. Yep. Cause what this actually means is if there’s, if there’s a CT in the query, even, even if it’s like doing nothing, you know, select one from nothing, the CT isn’t even used in the query, it could be, it could be simplified away by the query optimizer and it is simplified away that will block UDF inlining a totally useless simplified out CT is enough to block inline.
I am pretty sure that a lot of the UDF inlining rules are almost just like, they’re not implemented. Like, I’m pretty sure this is like T-SQL linting for some of them. Like, I don’t, I don’t think they actually like, look at what the resulting plan would be.
Yeah. Like if, if, if I, I tried to get more information about this and couldn’t, if I had to make a guess, someone had some weird recursive query with a UDF and it didn’t work and, you know, Microsoft wanted to fix it in a CU and the simple fix was okay.
If it has a CT, then no inlining and that, that, that, that was done in CU 11. So it certainly feels like we got away with using CTE for a while until someone ran the wrong bizarre query and now no one can ever use CTE with inlineable scalar UDFs ever again.
Like, you know, we were talking about things like you can add restrictions, but can you please eventually get rid of some of them? Like, well, I mean, the CTEthing. Could this be like, maybe it doesn’t work if you’re doing the recursive CTE or if the CTE don’t reference the UDFs, it’s fine.
This is just like, so broad. No, that’s a, that’s a giant one, but, but I can, I can actually sympathize a little bit because people abuse CTE to the nth degree in like, in like, you know, the, the regular work that I do, like they are just beat up.
So I can imagine cases where like either someone, um, so I imagine like three, three overarching cases that would make it reasonable to do this. One would be sort of your example with the recursive CTE, especially if other UDFs are called in the sort of like the body of one of the recursive parts of the CTE.
The other is just like a bunch of like highly stacked CTE with a rather like, um, maybe it’s like complicated set of joins at the end. And the third is someone like with one CTE that they referenced like 50 million times and all of that stuff would do kind of like the issue you ran into with the compilation time, just blowing up the plan.
So I can absolutely imagine situations where inlining like arbitrary CTE blocks in a UDF would be nuts. Cause remember like UD, UDFs aren’t, aren’t just like, you know, like, like, like inline table valued functions where you have like one simple select that just returns a thing.
People probably had all sorts of crazy procedural code and like, you know, like, like, like maybe could like imagine someone with like 10, like if blocks of code in a scalar UDF and they all have like crazy CTE in them to figure out some values, uh, life would be miserable.
So yeah, I would say in response that there is some kind of compile based check and like, I would like to see that type of thing. Oh, I have a UDF getting used a hundred times in the plan.
It has five joins. QueryAthema isn’t going to add 500 joins to the query. So this UDF isn’t getting in line for, for this query plan. Like that makes perfect sense to me.
Um, I, I, I think you’re being too generous here. Uh, I don’t know, man. It’s like, like, like, like, you know, it’s, it’s one thing to look at this from the perspective of like code we would write.
It’s another thing to look at this from the perspective of like the corpus of SQL Server installs that Microsoft has to deal with as support cases. Like I, like, cause I see some nutty stuff out there.
It’s not anything I would ever write, but I see some real nutty stuff out there. And I can only imagine what Microsoft support gets hit with on some of these where you’re like, Oh girl, like, I’m lucky.
Cause at some point I can say like, like, like that is way too much to deal with. Um, but, uh, Microsoft, you know, that’s a, it’s a paid support ticket there. They, they got to do something with it.
Let’s show speech. It’s true. You almost had 30 seconds. I was, uh, compiling my response.
Um, it’s, it’s somewhat similar, right? Cause I think this is really broad. I think the examples you’re given could be pretty easily included in whatever heuristic they would want to use to make this less broad.
Sure. But instead what we get is see you live in which one even was that like 2020, 2021? I don’t know.
Click on the link. There is no link. Oh, well, I don’t know. Yeah. They don’t, they don’t really believe in links. Um, who writes this documentation? Linkless.
There’s all, I think that’s also a typo in there. All of the following requirement of the execution context must be true. It’s true. You know, if you were a good citizen, you were a good citizen, you would open up a GitHub issue and add an S there.
Not a good citizen. I wouldn’t want to be accused of chasing clout. Ah, well, yeah. Your choice.
This is why you’ll never be an MVP, Joe. Yeah. That’s, that’s why. Speaking of arrogance, I feel like Microsoft did this totally wrong and I have a better design and they should have done it the way I’m picturing.
All right. They should have done is whenever you create or alter the function at like creation time, the engine tries to compile an inline televalued function version and then like saves that internally.
Because if, if you look at how, how narrow this is, it certainly does seem to, to fit within an inline televalued function. Like you only have one return.
If, if, if, if, if you put it, if you put a while loop in your UDF, it’s not inlineable anymore. Like, right. That just doesn’t work. So, and then when you have a query that’s using the UDF, replacing it with a inline function is more of a parsing thing.
Like, is it allowed to actually do the replacement? Cause in, in some like edge cases, you can have a UDF, but you can’t have a sub career for a different reason.
And if, if, if, if, if it’s allowed, put it in there and see what happens. Cause there’s just so many, it’s just like so many weird, like, like, like, like, how are these even like, like, like, like, how do you get to where, oh man, string ag.
Like, well, you can’t use that. I always wonder about that one too. Like, oh man. Like the, the, the, the, the, the technological issues of making get data inlineable.
We, we just don’t have the, the, the technology. Like, I don’t know, maybe this is something like if you’re executing a scalar UDF has get date will return different values as the, as the function that executes, like, is it trying to maintain that behavior?
Do you think? I am, I am like new sequential ID. I understand a little bit, but get date is just a very strange one. That’s the only thing I can think of. Cause if you have a non inline UDF, then that’s, it’s a different context when it executes.
So when you get like different get date values is the, um, so yes, so yes. And I think, um, I, I, I wrote a demo recently about, um, RCSI and UDFs where, um, like things just get extra weird with that.
I guess not, not a good time. Um, but I can, so I can imagine the separate context of get date presenting issues in, in some circumstances, I guess.
Um, if there’s anyone out there who has a scalar UDF and they’re relying on their get date function within that scalar UDF, returning different time values as the query executes, please leave a comment.
Cause I w I would like to scold you, but I’m also interested in, uh, whatever weird thing you’re doing. Yeah. But I don’t know. It just, well, oh no, that would be the opposite of fixing it.
Nevermind. I get, I said anything. It’s forgotten. All right. We can move on. Let’s move on. Uh, well, I don’t, I don’t know where you can move to. I’ve said just about everything I want to say on this.
All right. So, um, uh, yeah, I mean, you know, my, my general take on scalar UDF and lining is I think it was a very good investment in a very hard problem that the, the, the realities of things, um, kind of just killed it a little bit.
Uh, like just to, just to beef up the content a little bit, uh, I’m going to go inside baseball on scalar UDF and lining. So your friend who didn’t respond to you for months about the 32nd compilation stuff, uh, this guy Karthik, I mean, it was like very smart, very nice guy.
No, no beef with him, but the whole scalar UDF and lining thing was, um, was basically like written, uh, down the street from you at the university of Wisconsin and, uh, they, they did all the like Microsoft, like, cause like Karthik’s like in with Microsoft, they have like research stuff or whatever, but, um, like, uh, so Karthik and his team at, at, at, at Wisconsin, like basically built this whole thing with Microsoft source code sitting there and then, uh, like basically stapled it into the engine.
Um, and then, um, when Microsoft got it back, it was up to them to sort of do their millions and billions of unit tests on it. And I think just slowly over time is, you know, like either support cases or those millions and billions of unit tests finished, uh, a lot of stuff ended up getting like, sort of just like cutting this and like, especially a lot of security stuff, um, I would imagine posed a problem, but like, like really like the number of limitations on it, um, you know, really, really messes up the, the scope of usefulness for the feature.
But I do think, oh, like at the end of the day, you know, it was a very good investment in a very hard problem and it does solve a pretty good set of problems for people, but you know, the number of UDFs I see that are still not in lineable or, you know, even if someone’s on 2019 or better, um, cause this isn’t standard edition, even if they’re on 2019 or better, they’re not in a compatibility level where UDF and lighting kicks in.
And so there’s just a lot of stuff that like, is sort of unfortunate, um, that it got gated behind, but anyway, that’s about it there. Do you have anything else?
I do remember hearing about the university thing and I vaguely feel like there might’ve been an opportunity to say hello during the development process, which we didn’t do. And man, if only, if only, if only, if only I had gotten in there, I could have whipped them in the shape, you know, but missed opportunity.
I did try the UDF with get date and query and the value does change throughout the query execution as we thought. So I suppose that get date thing maybe isn’t a technical limitation, but, uh, you know, there’s some dumb ass out there who’s, who’s relying on the old behavior where, you know, I’m getting a new get date every time I execute that, that function and the rest of us suffer.
Yeah. Cause the thing with RCSI was that every time the function executed, it read a different set of row versions.
So it is really like a completely separate context. Didn’t like every, like every iteration of it just starts afresh. It’s wild in there. All right. Uh, do you want to, do you want to call this one and then talk about the next thing?
Final comment. Don’t forget. Uh, nine out of 10 DBAs get inlining more frequently than they use. Disashio.
Based text poppy to a four or five, four, three, four. I don’t know what that does, but it might be funny. All right.
We good. All right. Thank you for watching this episode of the bit obscene radio program. Uh, we will see you in the next episode in which Joe will talk about and interesting insertion problem.
So, uh, we will see you then. All right. Thank you for watching. Well, I want this.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Free SQL Server Performance Monitoring: Resource Usage Overview

Free SQL Server Performance Monitoring: Resource Usage Overview


Summary

In this video, I delve into the resource metrics tabs of the full dashboard from my free SQL Server performance monitoring tool. I explain how these tabs provide more detailed information compared to the lighter version, including CPU utilization, TempDB usage, memory usage, and a selection of perfmon counters. The goal is to help you quickly identify when your server might be experiencing issues by showing real-time data and historical trends that can pinpoint specific performance bottlenecks. I also highlight how the weight stats section helps in understanding wait statistics, making it easier to diagnose problems related to resource contention and query compilation.

Chapters

Full Transcript

Erik Darling here with Darling Data and returning to talk more about free, completely free SQL Server performance monitoring. We’ve got, today we’re going to talk about the resource metrics tabs from the full dashboard. These do have a bit more than the light dashboard has in them because there are some extra things that are a little bit easier to collect and grab the way that this is set up than it would be with the light dashboard. Again, the light dashboard serves most people’s needs most of the time, but if you need a little bit extra, the full dashboard has a little bit extra in it.

So, we get some server trends up here. We collect waitstats, file.io, tempdb stuff, perfmon stuff, and these DMVs, they’re all cumulative over time, and one of the most painful things for me about being a consultant is coming into some server that has either just been restarted or has been up for like a thousand hours, and you’re like, well, you know, like either the server just restarted, it’s like there’s nothing useful in there, or like, you know, the server’s been up forever, and you’re like, well, I don’t know if these waits happened yesterday, today, you know, 900 hours ago, like who can tell, right? What in here is meaningful? What in here is consistently happening?

And so, it becomes a little bit harder to, you know, give people good answers on the spot about what their current pain points are, because the cumulative metrics can not only lie to you about, or either disappear if the server restart, lie to you about when they happen, right, their significance, or worse, some of them just sort of smooth out over time.

You consider like a bursty workload where, you know, you have like big spikes of activity and then nothing for a long time. It’s like nothing tends to stick out terribly, so you have to go looking in different places, right? But all of the stuff in here that we collect is pretty nice and useful, and I don’t know, I like it.

So, if you want to check out the monitoring tool, it’s at code.erikdarling.com. It is completely free, it is completely open source, you can see everything that it does and is doing, and it is a step up above a lot of the paid stuff that you will see out there in many ways.

Let’s go look over at the resource metrics tabs now, though. So, what we collect up here is CPU utilization, TempDB usage, memory usage, and some limited perfmon counters over here. This is a pretty good way to just get an idea of like when stuff might have gotten a little wacky with the server.

You’ll notice that there’s a spike in some of the perfmon stats lined up with some of the TempDB stats up there, so something neat happened, right? Memory’s been generally okay, the buffer pool got a little hot data injection down here, and again, if you hover over any of these things, you can see the percentages and numbers and depending on what they are, and all of this stuff just kind of serves to make your life easier evaluating performance on a SQL Server at whatever time you’re looking at it.

The weight stats section over here, what I’m trying to do that, I think I’ve mostly got right now, is the, like, I wanted to call out all of the poison weights, so like resource semaphore, resource semaphore, query compiled, thread pool, and then sort of like the known, like, group of weights that most people commonly see, like SOS scheduler yield, the CX weights, stuff like that.

And then also, like, the top 10 weights other than those. So this starts off with a whole bunch of weight stats checked and lets you see sort of, like, what’s going on. A lot of people just, like, are, like, it’s, like, heartened to see when, like, you know, like thread pool and all the other stuff are zero, because if you see those ones go up, your server’s not having a good time.

But again, if you hover over any of these, you will see, like, actual values for them. So you can see which weight was weighting and for how long. You can pick different weights to put into here.

So, you know, that’s, that’s all hunky-dory. And over here is tempdb stats. tempdb not terribly busy on my server, but apparently there was a fun little spike up here.

This is tempdb latency and this is tempdb space used. So, you know, like, just kind of, like, like, I have, like, unallocated up here and then actual usage down here. The legends will tell you exactly what each line color means.

So if one of them sticks up above the rest, it should be pretty easy to figure out. Down here is just read and write latency. For file I.O. latency, you’ll have reads and writes, and you will have this per database, right? So, like, all the legends in here, all the databases I have on my server.

And we’ll see that tempdb had a tough time at some point yesterday, right? And stackoverflow had a tough time at some point yesterday. The perfmoncounters one is one that I actually just made an improvement to.

And one that I’m pretty happy, what I’m pretty psyched about is, like, I always forget sort of, like, which, which groups of perfmoncounters I want to look at for specific problems. So this one is locking and blocking, right? So, like, if I want to, if I select a different counter up here, like memory pressure, this will change and select all the counters that have to do with memory pressure.

So you can, you can start figuring out, like, oh, crap, when did, like, when did something terrible happen, right? So stolen server memory is spiking up at various points. The next tab over is session stats.

Now, what this is essentially collecting is aggregated session metrics. A lot of people who I work with have the big problem with sleeping sessions doing terrible things on their servers. So what this, these lines show you are the total number of sessions, how many of them were actually running, how many of them were sleeping, how many of them were background processes, and how many of them were just idle for more than 30 minutes.

So, like, like this, I mean, this is pretty crazy on my servers, like 45 of them, like, I guess that’s, I guess that’s mostly SQL Server doing stuff and me doing other things. But, like, you know, a lot of wacky stuff going on on here. And then we have latch stats.

This is just, like, the regular, like, DM latches stuff. So, like, you know, stuff that you would expect to see if you were doing, like, a deeper analysis on a server that’s having problems and you wanted to analyze latches. We got all those collected here.

And I know, I know, it’s never spin locks, never be spin locks, right, but it’s always nice to be able to prove that it wasn’t spin locks. So I have a graph that will let you very, very quickly absolve spin locks of any potential involvement in a query performance problem. That one time every 10 years when it is spin locks, you can also do that.

But the 999,999 times it is not spin locks, you can prove that, too. So this is just a quick overview of the stuff that you’ll find under the resource metrics tab in the full dashboard. Again, the Lite dashboard does not quite have all of that stuff.

You know, it does capture CPU, memory, file I.O., tempdb, but, like, you know, it just doesn’t capture the full spate of stuff. This is meant to be lighter weight and not have as much getting collected, but still give you all of the most important stuff that you would look at, right? There’s, like, not some, like, the deeper dive stuff, but all the really important stuff that you would care about is in here.

So, anyway, I hope you enjoyed yourselves. I hope you learned something. I hope that you will try out this amazing free, completely free monitoring tool that does way more than most paid monitoring tools ever did. It’s at code.erikdarling.com.

Again, go to the releases section. That’s where all the zips are. That’s where you can grab it. That’s where you can, you know, run everything from. And, again, if you open up a zip and you don’t see the executables in there, it might be some security software blocking them. You’re just going to have to check to unblock them.

Anyway, thank you for watching, and I’ll see you, well, I guess this might be Friday’s video, so I’ll see you Monday for Office Hours. Monday, Monday. All right.

Have a good weekend.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Free SQL Server Performance Monitoring: Query Performance Tabs, Including Query Store

Free SQL Server Performance Monitoring: Query Performance Tabs, Including Query Store


Summary

In this video, I dive into the query performance tabs within my full dashboard, a free and open-source SQL Server monitoring tool. I highlight key features such as the QueryStore subtab, which provides detailed insights into query execution plans and regressions over time, helping you identify and address performance issues more effectively. Additionally, I explain how the custom trace patterns tab offers valuable data for safeguarding against overlooked performance anomalies, ensuring a comprehensive view of your SQL Server’s health.

Chapters

  • *00:00:00* – Introduction to Query Monitoring Tool
  • *00:07:17* – Detailed Execution Plans and Metrics
  • *00:10:29* – Conclusion and Future Updates

Full Transcript

Erik Darling here with Darling Data, bringing that big monitoring tool mogul energy that the SQL Server world was so clearly missing. I want to talk today about some of the query performance tabs that are in play, at least in the full dashboard. The full dashboard actually has one extra tab that the light dashboard does not, and that’s the query trace patterns. So that’s, like, sort of a custom trace that I spin up in the background. It’s, of course, completely optional. You can turn it off, you can get rid of it if you don’t want it.

But something that I use to sort of, like, as a sort of, like, safeguard to, like, make sure that I’m actually getting some stuff that other things might be missing. But it collects from all the normal sort of data sources that you would look at when you’re doing a performance analysis. So, like, you know, like the query stats, procedure stats, trigger stats, function stats.

It goes to QueryStore. Now, a lot of paid monitoring tools out there do not touch QueryStore. And quite frankly, the vendor should be embarrassed, right?

It’s 10 years old and you haven’t put it in your product yet. Just close down, right? Just stop, right? Like, you don’t care, right? You’re coasting, right? It sucks and people pay you for this, right? So they, like, made me angry enough to do all these things.

So the tabs that you’ll see in the full dashboard are these. You’ll see all these except the query trace patterns one in the light dashboard. I mean, slight differentiation.

This is just something that’s a little bit easier to manage with the store procedures and stuff that I have in the full dashboard that run everything than it would be to do with the light version. So just a small difference there.

But like I said, like, all the query grids that you see will have a pop-up, like a drill down menu. So if you double click, you can get more information from them. There’s also all sorts of stuff in there that you can do that will give you sort of an experience, like looking at the QueryStore GUI, where you can, like, change metrics, except I don’t actively hate you and I want you to enjoy looking at things.

So, like, you know, I try to be a little bit kinder. Anyway, the QueryStore subtab, we also have a QueryStore regressions tab, right? So there’s one that just looks at general QueryStore stuff and one that points out queries that have, like, regressed in some way over, like, a period of time.

So you do get some of that, like, like, oh, cool. Like, like, yeah, that thing did get worse. Like, I should go look at that.

And then you can, like, see all the plans for stuff that got worse. But a lot of the DMVs I discovered are just sort of cumulative over time. So if you just keep logging data from them, like, the numbers just kind of keep going up.

And if you, like, it sucks, right? Because you just keep, like, oh, yeah, it’s like, oh, my God, it was 60. And now it’s 60.

And now it’s seven. Ah, crap. So I calculate deltas for some of the DMVs that are cumulative. Like, QueryStore doesn’t really have this problem, but some of them do. And you have to do some math.

And I got some help with the math because I am most assuredly not a math person. Where I do some sort of per second rate normalization, which means that I basically divide the change between one collection and another for a query or a procedure or something. And sort of, like, they divide it by the elapsed seconds between collections, which gives me sort of a consistent line, even if collection intervals vary a little bit.

But it makes comparisons a little bit more meaningful when you’re looking at time ranges. So, again, if you want to check all this out, it’s at code.erikdarling.com. It’s in the performance monitor repo.

If you want to download it, it’s all going to be in the releases section. That’s where all the zips are that have the EXEs in them. So, you can go in there.

You can get it. You can try it out. And, again, if you go to unzip it and you can’t see an EXE, sometimes that’s Microsoft security stuff. You might have to, like, unblock it to get it to show. But let’s go look at the actual dashboard because I’ve been talking about looking at PowerPoint too long.

My eyes hurt. So, when you first go in to look at queries, you’ll have the performance trends graphs. Apparently, I really am turning into a developer because I have not executed a stored procedure in 24 hours. I don’t know what’s wrong with me.

My head’s spinning. But, sort of behind me here under this armpit, this is where you just see sort of query executions a second. So, like, spikes and, like, when, like, activity really ticked up. That’s this graph over here.

Apparently, we had a real high point at 4.5 a second. Whew! Yeah, I’m thrilled with that. And then we have query durations from, like, the plan cache over here and from query store down here. You can see that there is some stuff that the plan cache got that, or rather, the query store got that the plan cache didn’t, which is fascinating to know about, right?

Plan cache is unreliable. That’s why I prefer query store. This is why monitoring tools that don’t touch query store should go just, like, head in the oven goodbye.

Over here is the active queries tab. This basically just runs spwhoisactive and logs it to a table. You know, like, all the mechanics of that are boring.

Ah, crap. Sorry, add a mechanic is not boring. The mechanics of how that works is boring. So, if you want to look at what’s currently running on a server, you have a current active queries tab. So, if you push the refresh button, this will actually run one of the queries that I use in sppressure detector to just, like, execute against the server, grab whatever’s running now, and pull it back.

This is just a little bit easier, because, like, if I wanted to do that for spwhoisactive, it’s, like, you start thinking about, like, exposing all the different parameters, and then you have, like, a dynamic column list, and, like, hey, it gets to be too much. So, like, I just wanted to give people, like, a fairly simple, like, what’s running right now query to sort of back up what shows up in spwhoisactive. Again, there’s nothing running on my server right now, but, hey, I pushed the button and it didn’t crash, so we’re in good shape, right?

We have the query stats over here. This is what’s pulling from the plan cache. You know, you can sort this any way you want, like, up and down by any of these columns.

If your, you know, code came from a stored procedure, which clearly mine in the last 24 hours has not. I am fully ad hoc developer. You know, the object will show up in here.

But, you know, you get all the sort of information that you would want from all this stuff, right? Like, you know, like, CPU, reads, memory, DOP, like, everything that you would want. The query text.

And then if you go and you right click on something in here, right? You know, I showed this yesterday, but I’m so excited about this, I’m going to show it again. If you go in here and you say, like, view plan, you can see the estimated plan for something, right?

And, like, again, like Microsoft making that MS SQL plugin for Visual Studio Code really made all this very easy. Because I had all the assets I needed for the icons and all the XML information I needed. And so, like, I could get, I could really just plug this in basically.

If you come over here and you click get actual, I’m going to do this for a slightly less annoying one. Say get actual plan. Then this thing will execute the query against the server.

It will throw away the results and it will produce an actual execution plan, right? And so, here it is. And, again, you get all of the execution numbers against it. Now, something that I’ve, you know, I’ve said Microsoft should do for a long time.

Is when you get an actual execution plan and you have, like, all they show you are the durations. I also expose the CPU time, right? Because you get CPU counters per operator as well as duration per operator.

So, I expose CPU time in there as well. Just to zoom in a little bit closer so you can see, right? Like, the top number here is the normal duration number that you would always see.

But underneath you have the CPU time, right? So, like, this, this operator, you know, let’s just pretend it ran for one and a half seconds. The CPU time is 1.13 seconds because we had some time spilling, right?

So, that’s what that indicator up there shows you, right? That we, like, you know, something happened. So, we have that and then coming back to queries, though.

Procedure stats, again, I haven’t run any procedures lately. So, that’s empty. Query store shows you all the query store data. Now, if you click on a query row in here, like, we can saw that one had, like, 60 executions for it.

This right here is what you get when you click in and drill down. So, you can see, like, all of the, like, per execution things in there, right? So, every time this query executed, like, how long it, like, all the different execution metrics.

This query apparently hasn’t changed really since at all since it did anything. So, that’s nice. But up here in the top corner, this is where you can change sort of how the graph shows things, right?

So, you can change it to any one of these and you’ll get a slightly different view of it. There’s no difference between CPU and duration. So, not a lot of fun to look at, to be honest.

But, the drill downs are the important thing here. When you double click on them, you get all of the query executions, right? So, that’s important, right?

You might want to see more than one thing there, right? But then, coming a little bit further, this is query store regression. So, you can see, like, some query that did worse over time, right? You can, again, you can drill down into any of those.

And then, query trace patterns is just stuff that I capture with my sort of special, like, performance trace of, you know, bad queries. Anyway, that’s sort of an overview of all the different query tabs in there. Again, if you want to check out this completely free open source SQL Server monitoring tool, it’s at code.erikdarling.com.

Again, go to the releases section, download a zip. There are EXEs in it, I promise. If you don’t see them, your security software is messing with you.

So, you should do all that. Anyway, thank you for watching. I hope you enjoyed yourselves. I hope you learned something. And I will see you in tomorrow’s video where I will continue talking grandiosely about this wonderful monitoring tool of mine. It is completely free and should hopefully put every other monitoring tool company that makes you pay for their garbage out of business.

Alright, thank you.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.