What Else Happens When Queries Try To Compile In SQL Server: COMPILE LOCKS!

What Else Happens When Queries Try To Compile In SQL Server: COMPILE LOCKS!



Thanks for watching!

Video Summary

In this video, I delve into the world of compile locks in SQL Server, a topic that has been on my mind ever since I noticed an unusual amount of non-traditional blocking in my databases. Specifically, I focus on how auto stats updates can lead to these compile locks, causing significant delays and potential bottlenecks. By walking through a detailed demo using the Stack Overflow database (yes, it’s all tables in SQL), I illustrate exactly what happens when a query has to wait for statistics to refresh during its compilation phase. This not only highlights the issue but also provides insights into how to mitigate these compile locks, ensuring smoother operation of your SQL Server environment.

Full Transcript

Erik Darling here with Darling Data. And boy, I’m just basking in the glow of my newfound TikTok celebrity status. Fame and fortune await. Stitch incoming is probably what I’d say if I knew what I was talking about. In this video, we’re going to talk about compile locks. And the reason why I want to talk about this is that I’m going to talk about compile locks. these is because I have seen so much of this happening. So like when I first wrote spHumanEventsBlockViewer, I was sort of surprised by the amount of non-traditional blocking that would happen in SQL Server. And one of the big ones that I think this one, like out of all the non-traditional blocking that can occur in SQL Server, this one is rampant. I see this so so much that I wanted to finally put together a demo that shows you what happens and talk about a few things you can do to alleviate compile locks in your SQL Server environment. So I’ve done a little bit of setup work on this already. There’s a table in the SQL in the Stack Overflow. There’s a table in the SQL. No kidding. It’s all tables in the SQL, isn’t it? There’s a table in the Stack Overflow database.

called Votes. And if you are of a similarly challenged mind, as I am, you might like looking at words and immediately scrambling them into every other possible word you can possibly make out of them. Perhaps that’s just a lifetime of doing weird word puzzles, because no one else would talk to you. Might be that. Who knows. But I’ve taken the votes table. And I’ve switched it around to be called the Vitos table. So it’s not votes, it’s Vitos. It has no practical meaning in this context. There’s nothing you can actually veto in Stack Overflow. Venture Capital makes all the decisions now, and boy, are they tired. So what I did was I took the votes table, and I stuck it into this Vitos table, but I doubled the number of rows, right? So it’s the votes table twice, union together, and stuck into the Vitos table.

And I put a primary key, because I’m a good SQL Server person, and I put a clustered primary key on my Vitos table, so that I know that all of my Vitos are unique. I’ve also run this query already to generate some autostats. Oh, boy. Must have been late when I was working on this one. Must have been another one of those 4 a.m. adventures that I like going on.

So, yeah. We’ve got a table called Vitos with some autostats created on the user ID and the bounty ID column. Now, I’m not saying that doing update stats with the sampled percent and persisting that sample percent is a bad idea. I’m not saying that at all. I am just doing that because in this particular demo, it exacerbates the particular problem that I want to show you.

There are times when updating stats with a particular sample percent, even full scan, is a pretty good idea, because you just might not get a very good picture of what data your indexes are holding with the default sampling percent. You could miss a lot of stuff. You could miss entire chunks of stuff. But on the plus side, they’re fast. And I’m using persist sample percent so that whenever there’s a stats update, SQL Server has to do the same thing.

All right. So, SQL Server has to maintain that percent sampling for those statistics. So, I’ve also got this stored procedure here. And this stored procedure is admittedly written in a rather silly way. I should have proofed this a little bit more before I started recording, but it’s okay.

Who doesn’t like listening to clacky keyboards in the background? It’s not like we haven’t had five years of that Zoom calls with, you know, everyone trying to out-developer each other by getting the loudest possible keyboard in it. It sounds like someone dragging cans down the street eventually.

So, these are just two queries that we’ll have to refresh statistics on the columns that we have auto stats on up here, right? So, pretty simple, right? And then just select from those tables, join together an ID. This thing returns like four rows. So, let’s walk through what happens when this thing has to compile, like, just on its own.

So, I’m going to run this update, which is just enough rows. I did the math. I did the square root math. And running this update is just enough rows to trigger auto stats updates when the query runs. So, what we’re going to do is, just to make life easy, I’m going to show wait stats, and then I’m going to show what happened, what, then I’m going to run the procedure, and then I’m going to show wait stats again, specifically for this wait on async stats refresh thing.

And I suppose I don’t need the order by here anymore. So, it’s only going to be one row, but it seemed like a good idea when I was trying to see the wait stats initially. So, we’re just going to order by that one row. It’s a free order by, right? It’s totally free.

So, let’s run these all together. And I got query plans turned on. And you might look at the initial result and think to yourself, I think Erik Darling has been working on this demo a bit in this window, and you would be correct.

So, we start off with about 36 seconds of wait on statistics refresh for this particular session in the database, right? And then this, then 13 seconds later, 13 miraculous seconds later, we have another result down here. And if we look, we have gone from 36 seconds of wait on sync stats refresh to 47 seconds.

So, about 11 seconds of waiting on sync stats refresh. I might have said async one time. I’m sorry about that. It’s sync. S-Y-N-C. Sync. Like the thing in your bathroom.

I’m going to stop that thought. So, if we look at the query plan, right? We can ignore the ones that go and look at the wait stats for this session.

We can zoom in on this jolly little fella in the middle. And this thing runs for 1.7 seconds. Okay. 1.7.

What did we spend 13 seconds waiting on? Well, you might expect to go into the query wait stats. And you might expect to see all sorts of good things being returned about what are query weighted on.

But you notice that even though in the results over here, we definitely spent about 11 seconds waiting on those synchronous stats refreshes. We don’t see that in the query plan when we look at the wait stats, right? There was no sign of that.

The top wait for this query is CX sync port, which you got 10 milliseconds of. Why? Why, I ask you, is wait on sync stats refresh not in the wait stats for this query? It’s because Microsoft hates you and wants you to suffer.

I’m kidding. Microsoft has added a lot of great things to query plans. The fact that we have stuff like operator times and wait stats at all is cool. But for the love of God, include the stuff that matters.

So, we can get some clue, right? Like if we zoom in way up over here and we look at compile time, compile time will just about perfectly match. Oh, that got away from me a little bit.

Compile time will just about perfectly match the number of seconds we spent waiting on wait on sync stats refresh. Like the thing you have in your kitchen. So, we can see that here.

But since we don’t have that wait stat in the plan XML, right? We don’t have that wait stat down there in the query plan. If you were looking at any other query in the world, you might be spent, you would be left with a lot of time guessing what happened. If this were like a more complicated query, you might be like, wow, was the optimizer spending all that time figuring out a query plan?

You’d have to like go get all sorts of crazy trace flags to look at like optimizer steps and memos and log ops and trees and timing and stuff. And you wouldn’t like that stuff is exhausting. It’s absolutely exhausting.

And it’s not fun. So, if wait on sync stats refresh were in the query plan XML, we could say, oh, we waited on that for 11 seconds. We didn’t spend 11 seconds with the optimizer just twiddling its thumbs. Like, ah, that left join go over here.

The left join go over here. Where do we put this left join? I have to convert all these right joins to left joins because no one in the right mind would actually use a right join. Like, there’s lots of stuff the optimizer would be thinking about with a bigger, more complicated plan.

So, that’s what happens when this thing runs on its own. Cool. Wonderful.

We are well on our way to better understanding nothing. I’m just showing you stuff. We understand nothing. No understanding going on here.

So, now I’m going to recompile this just out of an abundance of caution. Remember, remember abundance of, abundances of cautions. And then I’m going to re-update those same rows.

And that’ll take a few seconds. And, uh, it’s like, it’s like that part in event horizon where, uh, what’s his name’s eyes are all like scratched out. And he’s like, I have wonderful things to show you.

Uh, this is what, this is sort of like what that is. All right. So, uh, I have SP who is active open in a special DAC window over here. And if I run this, we get nothing back.

Because nothing is happening. Zero is happening. Uh, but if I come over here to SQL query stress, uh, this is already lined up to connect to the right server. Uh, and what I’m going to do is just for, I’m going to stick to, I mean, you don’t even need two iterations.

Cause we’re going to stop it way before one finishes, but I have 200 threads that are going to run this store procedure. All right. So, uh, if I kick this off and I run this, this is going to take a couple to run because we got a lot going on here.

It was kind of a mess, a bit of a traffic jam. Uh, and let’s give that one more run just so you get a sort of a sense of how grotesque things are. Like in Event Horizon.

And there we go. That’s what I wanted. So let’s cancel that. Cause we don’t need that happening anymore. But now it’s time to talk about these results. I’m going to scroll over a little bit here to frame things up a little bit more nicely for us.

And I want you to take a look at the blocked session count column over here. Right. Uh, the first session has 190.

Maybe this is easier. The first, I’m going to lose my finger right here. Uh, the first, first block session has, uh, first session has 199 blocked sessions under it. The second one has 198.

And for some reason the rest are just like, we don’t know what we’re doing. We have no idea what’s blocking us. Uh, this is all terrible. We don’t know what’s happening. Uh, terrifying. It’s horrifying.

We don’t know. Right. We’re just stuck. You can see in this column over here, they are all waiting on LCKMX. And they’ve been waiting for about nine and a half seconds, which is pretty close. Like I, like we actually kind of got lucky there.

If that was like another second or two longer, I might’ve missed it. Cause we have that like 11 second compile window where everything goes, goes to hell. But that first procedure where it’s saying, create procedure is compiling. And the rest that are saying exec, those are waiting to do anything.

So that first store procedure is trying to compile a query plan. Well, that’s happening. None of the other store procedures can do anything.

They are stuck waiting for this thing to produce a query plan. The reason why it’s stuck waiting to produce a query plan is because it is waiting on sync stats refresh. That’s no good.

Right. It’s not a good time. Why would SQL Server do this to us? Why would SQL Server make all these other queries wait on this one query to produce a plan? Well, what choice does it have?

You can’t have queries all compiling plans at once. It’d be anarchy, chaos, stuff. Stuff you wouldn’t like. Well, you can’t have multiple plans in the cache. It’d be awful.

What’s happening? So, what we got here is that one thing waiting a really long time. The reason I ran SP who is active the way that I did is to get back this additional info column right here. This lovely hunk of burning XML that we can click on.

And when we click on it, boy and howdy, do we get some interesting stuff. So, if we look at this query, what is this thing doing? What command is this?

Select StatMan. I’m going to end the pop culture references there. It’s not funny. I hate that song.

Every version of that song. From the original to the remake to the cover to the sample. All of them. So, this query is waiting on select StatMan. And every other query, if we look at the other rest of these, are going to be waiting on ZoomIt will cooperate.

This stuff. Object lock, Stack Overflow 2013, a SQL database or an SQL database with an object ID that matches the object ID of our SOAR procedure called CLock, C underscore lock, which is in the DBO schema.

Lovely. So, this is what queries will look like when they’re all trying to compile at once, but one of them is trying to figure out an execution plan. This can happen for all sorts of reasons, right?

It doesn’t have to be like the stats refresh thing. If you have a stored procedure that calls a lot of other stored procedures, it does a lot of other stuff, then like you might be stuck trying to compile a whole bunch of stuff across a whole bunch of plans, and that could be a real mess.

You can also get into a situation where you have like, you know, one super complicated query, and the optimizer is really just like, I got to figure all this out. I got to be really good at this.

And you can see query optimization take a long time in some cases. The reason why this stuff will happen would be like auto stats. Like most, I think most commonly would be like memory pressure clears out the plan cache. Like maybe a settings change or something would too.

You might see auto stats updates or an intentional stats update update stats and validate plans. All of a sudden a whole bunch of stuff has to recompile or compile really. I mean, I guess it’s sort of a recompile, but it’s like compiling again for the first time.

Something like that. And then you might also see like if you use temp tables a lot, changes to temp tables might cause recompilation stuff. So there’s all sorts of funny reasons why you might see a query plan decide to up and recompile out of nowhere.

But you might also see if you run that, if that a whole bunch of other things are also trying to run that store procedure, coming up with that query plan for the first time will block them. Will it always be for a long time?

Maybe not. Depends on a lot of stuff. Will it always happen? Well, I mean, assuming that all of the stuff that I’ve taught or all or any of the things that I’ve talked about happening and your store procedure needs to come up with the query plan and you have a million other people trying to run that store procedure too, then yes.

But like in general, it won’t always happen. I made this bad on purpose. I wanted to show you wonderfully.

So some stuff you can do that would help this particular situation would be to turn on auto update stats async. Right? So you don’t, you’re not like this query isn’t dependent on stats updating so it can generate a query plan.

It generates a query plan based on the current stats. And then something goes off in the background and it’s just like, yeah, I’ll figure out stats for this on my own, on my own time. I’ve also seen dynamic SQL be really successful with this stuff because it sort of like a different context.

And so like the store procedure is just kind of like, oh, cool. I’m just going to go find something somewhere else. Microsoft has a whole document about different reasons or like different things you can do to prevent this, but I’ve never been successful trying any of them.

It’s like, don’t prefix them with SP underscore, make sure that like the user who calls them like has a schema prefix like dbo dot whatever. Like this is all stuff that was true here. Right?

Like something about like encryption keys and opening them and closing them. I’ve just never seen any of those things cause the problem. Like I know that whoever wrote the document probably saw that stuff happen and probably might have fixed it that way. But for me, I almost never see that stuff. I always see this stuff.

So auto, auto, auto, auto updates, that’s async. Async. Right?

Not sync. Like it’s like the default async, like the non-default or sometimes running your, running, running the query that was waiting on stuff with dynamic SQL can be helpful for that too. So yeah. Those are two solutions that I’ve, that I’ve, I’ve used quite successfully to fix these issues.

Um, so I think that was it. I hope, I hope that you saw wonderful things. Sure.

I hope that you saw at least things. Saw sync. That’s cool. So, uh, thank you for watching. If you liked the video, I do appreciate it. I appreciate the thumb.

If you like me, I do appreciate the subscribe. Um, I hope you learned something. I hope you enjoyed yourselves. Uh, I hope you will also, uh, uh, be, uh, witnesses to my meteoric rise as a tick tock SQL Server influencer. I’m probably gonna have to get goofy hats and stuff to like, look like a, look like a gamer streamer, but, um, I don’t know.

Maybe I will. Maybe I won’t. We’ll see. Anyway, uh, I have another video to record today.

Uh, I’m not, I’m not gonna tell you what it’s about now. Uh, but it’s gonna be a, gonna be a banger too. So, but this was, this was a pretty good, this is a, I think this was kind of a banger.

Right? Cause this is a really interesting situation. This, this kind of stuff will show up in the block process report too. Um, uh, and the next one’s gonna be cool as well. Cause it’s gonna really challenge how you think about query tuning in SQL Server.

So, uh, let’s cut that off here. Again, thank you for watching and I will see you in a little bit once I’ve thoroughly prepared all my demo material for the next one. See you then.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Join me In Boston May 10 For A Full Day Of SQL Server Performance Tuning Training

Spring Training


This May, I’ll be presenting my full day training session The Foundations Of SQL Server Performance Tuning.

All attendees will get free access for life to my SQL Server performance tuning training. That’s about 25 hours of great content.

Get your tickets here for this event, taking place Friday, May 10th 2024 at the Microsoft Offices in Burlington.

Here’s what I’ll be presenting:

The Foundations Of SQL Server Performance Tuning

Session Abstract:

Whether you want to be the next great query tuning wizard, or you just need to learn how to start solving tough business problems at work, you need a solid understanding of not only what makes things fast, but also what makes them slow.

I work with consulting clients worldwide fixing complex SQL Server performance problems. I want to teach you how to do the same thing using the same troubleshooting tools and techniques I do.

I’m going to crack open my bag of tricks and show you exactly how I find which queries to tune, indexes to add, and changes to make. In this day long session, you’re going to learn about hardware, query rewrites that work, effective index design patterns, and more.

Before you get to the cutting edge, you need to have a good foundation. I’m going to teach you how to find and fix performance problems with confidence.

Event Details:

Get your tickets here for this event!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

sp_QuickieStore: Now Handling The Biggest XML

sp_QuickieStore: Now Handling The Biggest XML



Thanks for watching!

Video Summary

In this video, I share an exciting update to SP Quickie Store, a tool I’ve been working on to help SQL Server users manage and analyze their Query Store data. I explain the problem of large query plans that can’t be displayed as XML in SQL Server Management Studio due to exceeding 128 nested nodes deep. To address this issue, I introduce SP Quickie Store’s new feature: it automatically converts these large query plans into a readable format by using the `TRY_CAST` function and saving them as SQL plan files. This allows users to easily view even the most complex execution plans without manual intervention. If you’re interested in trying out this solution or want access to my other useful stored procedures, head over to my GitHub repo at code.erikdarling.com. Don’t forget to subscribe to my channel for more SQL Server tips and tricks!

Full Transcript

Erik Darling here with Darling Data with an exciting, thrilling, industry-leading announcement. I was voted by EarGut Magazine to be the SQL Server Consultant most likely to lead the industry in announcements, but there’s really no indication about whether those announcements would be good or bad, anything. Good news is, no layoffs so far. New social media in turn is working out nicely. Apparently I have videos on TikTok now, so they’re just like chopped up versions of this, so if you prefer me in smaller bits and bytes and pieces, you should follow me on TikTok, I guess, or whatever you do on TikTok. I don’t know if it’s following or not. I don’t actually open the app, but, you know. It’s there. I’ll put a link to it in the notes for this video. But anyway, SP Quickie Store. I’ve been working on a really cool update for this the last, I guess, evening. Depending on how you count your evenings, it is evening. And what it does is, so, I should explain the problem a little bit first.

And then I’ll show you the solution. So, in Query Store, Query Plans are not stored as XML. And the plan cache, you know, you don’t have to worry about, well, I mean, you do kind of have to worry about this stuff. That’s why plans are stored in like seven different places. But for this, rather, for Query Store, and specifically, Query Plans are not stored as XML. They’re stored as like this weird, like, compressed image text thing. And it’s up to you to turn Query Plans into XML so you can click on them and look at the pretty pictures.

Unfortunately, some Query Plans are so gosh darn big, so enormous, so more than 128 nested nodes deep that you cannot display them as XML and SQL Server Management Studio. You try to do it, and you get an error. So, if you just said like, convert XML this blob, and it was too big, you’d get an error saying this XML is greater than 128 nested nodes deep. You can’t do that, dummy. And so, that sucks. But SQL Server has a function that has, I mean, goes back pretty far into the mists of time called try cast. Try convert is a different animal. Try cast is like, wherever.

So, with try cast, you can try to cast something. Well, you can, and specifically, you can try to cast this as XML. And if it doesn’t work, it’s just null. Problem is, you still might want to see those Query Plans. You still might want to say, like, well, okay, so what? You can’t, like, XML clickify it as SMS. I still want to see the thing. Now, to be clear, you can still open plans that are greater than 128 nested nodes deep as graphical query plans.

You just can’t do it, like, click on it, open it, because, I don’t know, Microsoft hates XML or something, I guess. Just didn’t want to deal with it. I don’t know what the deal is. I wasn’t there when XML was implemented in SQL Server, so I couldn’t tell you any inside baseball on that.

Anyway, so this is what Quickie Store does now, all right? So, I’m going to hit some approximation of F5. It might have been Ctrl and E. It might have been F5. You can’t see my fingers, and I’m not telling you.

So, we have two queries up here and two query plans. Now, if we zoom in a little bit, zoom in a little bit, there we go. Thank you, Zoomit. Very responsive application.

There’s two different things here, all right? There’s this one up top, which is, let me see, this is a huge query plan. And there’s this one on the bottom, which is just a regular old query plan.

So, if you click on this one, we get nice pretty pictures. Great. Stunning. Miraculous achievement. Thank you, SSMS.

By the way, I’m running SQL Server Management Studio 20, right? Big shout-out to Aaron Stellato for getting us V20. You know, it’s cool.

But this first query plan, a little different, right? If you click on this, we do not get a nice pretty picture. We get this insane blob of text. What do we do with this insane blob of text?

Well, there are some directions up at the top of the insane blob of text that tell you what to do. It says, remove the headers and footers and save it as a SQL plan file. Cool. So, here’s what we’re going to do.

We’re going to just erase that. Oh! Look at that. Pretty colors now. Beautiful colors. Yes. And then we’re going to just grab the scroll bar and head all the way down to the bottom.

And we have one other thing to get rid of. It’s the sort of the XML containment unit. And it’s this last question mark greater than thing.

So, we delete that. And we just make that a line. And then, our job gets pretty easy. We’re going to say, hey, we’re going to save as.

And we’re on the desktop already, so that works out well. And then we’re going to say, all files. And there’s an annoying text file joining us for the video. Thank you for making an appearance.

Annoying.txt. And we’re just going to, rather than mess around a whole bunch and make you watch me probably make typos and other stuff, I’m just going to delete XML. And I’m going to say, SQL plan.

And we lost our color. We lost our glorious color. Very sad. But what we can do now is close this out. And we can go to file and then recent.

And then hope that nothing weird is in there. No, it’s just stuff I’ve been working on. That’s fine. Okay. I get nervous.

It’s like, if I opened up a browser and it was like, would you like to restore your tabs? They’d be like, oh, maybe not. No. I mean, Darling Data does not have an HR department, but we do try to maintain the strictest standards of family-friendly SQL Server content.

So, you know, if we ever get an HR department, we can’t get any retroactive violations. But if we just click on this now, because that’s in our recent files menu, it’ll open up and we have a beautiful graphical execution plan. All right.

And this is, of course, a massive execution plan. It just keeps going and going and going. It took a little bit of work to get a query plan to test this out on. If we say zoom to fit, it doesn’t disappear.

It’s just this really thin line across the middle. So that just keeps going. There’s the end of it. I’m not going to zoom in because it’s just a weird disgrace.

Someday I’ll figure out how to make one shaped like the little Charlie Brown stripe and put that somewhere. Maybe. I don’t know.

I saw someone playing Doom in Task Manager the other day. Not like they made it say Doom. Like they were actually playing the game Doom. It was like the CPUs are like drawing the images. It was wild.

But I mean, I don’t know if they’re actually playing. They might have just like had some program do a thing. Probably. I don’t know. I’m not that good at computers. I’m only good at databases. So you deal with what you got.

So anyway, very excited about this. If you want to download SP Quickie Store, you can go to my GitHub repo at code.erikdarling.com. There will also be a link to code.erikdarling.com in the show notes.

And so you can go get that and all my other store procedures. Because I have a lot of good ones that help you with SQL Server and help you find problems. Maybe even fix problems.

And if you can’t fix the problems yourselves, you can always hire me to do it. I am a consultant after all. I do get paid to fix these things. Strange world we live in.

So anyway, what else? Let’s see. We talked about TikTok, Quickie Store. I think that was it.

Yeah, that’s all we talked about. Cool. So thank you for watching. I hope you enjoyed yourselves. I hope you learned something. I hope you download SP Quickie Store and use it to analyze your SQL Server’s query store. And if you like this video, I do like thumbs ups from my beautiful watchers.

And if you like this type of SQL Server content, pretty please subscribe to my channel so that I can brag about how many subscribers I have to absolutely no one. I just like seeing numbers go up. So aside from like blood pressure and cholesterol and stuff, those are the bad ones.

Whatever. Whatever. Anyway, thank you for watching and I will see you in another video probably sooner than later because I have to record videos so that my social media in turn can stay busy. So thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

The How To Write SQL Server Queries Correctly Cheat Sheet: Views And Common Table Expressions Are The Same Thing

Vapor Trail


Perhaps one of the most exhausting parts of my job is disabusing developers of the notion that common table expressions hold some weight in gold over any other abstraction layer in SQL Server.

Think of it like this:

  • Views are like a permanent home
  • Common table expressions are like a mobile home

You can put equally terrible queries in either one and expect equally terrible results.

It’s sort of like installing a toilet: If you just throw it in the middle of the kitchen floor without any consideration as to architectural design or pipe locality, it’s going to look bad immediately, and stink eventually.

Of course, since views are programmable objects, they have a bit more depth of character with creation options. Common table expressions, being temporary views, don’t have much of a story arc in SQL Server.

For example, you can index a view, but you can’t index a common table expression. Either one is entirely capable of using indexes on the underlying table(s), of course.

What I find particularly curious is how well received common table expressions are, while views are often looked down upon, despite have many functional equivalences.

Developer 1: This stored procedure has hundreds of common table expressions in it

Developer 2: Wow that’s amazing you’re so good at SQL

vs

Developer 1: This stored procedure database has hundreds of common table expressions views in it

Developer 2: Wow that’s amazing horrible you’re so good at SQL a shabby degenerate and a skunk of a human being

Surprise!

Do That Viewdoo That You Do So Well


One thing that I see in a lot of views (STILL!) is SELECT TOP (100) PERCENT, which is a pretty good sign that I’m going to have to fix a lot of other problems from the 2005-2012 era of SQL Server development.

As a slight digression, I find it odd how many myths pervade SQL Server development from the 2005-2012 era. It was the golden age of SQL Server books.

All this great knowledge was out there, but everyone seemed to find a blog post that was like “in my testing the estimated batch cost for the table variable was way lower than the estimated batch cost of the temp table, so they’re always faster”.

The dogma is real.

I know this worked in like SQL Server 2000, but that’s not excuse for keeping it around today:

CREATE OR ALTER VIEW
    dbo.u
AS
SELECT TOP (1)
    u.*
FROM dbo.Users AS u;
GO 

CREATE OR ALTER VIEW
    dbo.u_p
AS
SELECT TOP (100) PERCENT
    u.*
FROM dbo.Users AS u;
GO

You’ll notice the query plan for the 100 percent view is quite conspicuously missing a TOP operator at all. The optimizer throws it away because it knows it needs to get everything anyway.

sql server query plan
you don’t get a top

Shocking news from 20 year ago, I’m sure.

Equivalency


Getting back to the main point of the post here, common table expressions are the same as a vanilla view — unindexed, with no special create options.

WITH 
    ThisIsTheSameAsAView
AS
(
    SELECT
        u.AccountId,
        c = COUNT_BIG(*)
    FROM dbo.Users AS u
    JOIN dbo.Badges AS b
      ON b.UserId = u.Id
    JOIN dbo.Posts AS p
      ON p.OwnerUserId = u.Id
    JOIN dbo.Comments AS c
      ON  c.UserId = u.Id
      AND c.PostId = p.Id
    JOIN dbo.Votes AS v
      ON v.PostId = p.Id
    GROUP BY
        u.AccountId
)
SELECT
    *
FROM ThisIsTheSameAsAView AS view1
JOIN ThisIsTheSameAsAView AS view2
  ON view2.AccountId = view1.AccountId;
GO 


CREATE OR ALTER VIEW
    dbo.ThisIsTheSameAsACommonTableExpression
AS
    SELECT
        u.AccountId,
        c = COUNT_BIG(*)
    FROM dbo.Users AS u
    JOIN dbo.Badges AS b
      ON b.UserId = u.Id
    JOIN dbo.Posts AS p
      ON p.OwnerUserId = u.Id
    JOIN dbo.Comments AS c
      ON  c.UserId = u.Id
      AND c.PostId = p.Id
    JOIN dbo.Votes AS v
      ON v.PostId = p.Id
    GROUP BY
        u.AccountId;
GO 

SELECT
    *
FROM dbo.ThisIsTheSameAsACommonTableExpression AS cte1
JOIN dbo.ThisIsTheSameAsACommonTableExpression AS cte2
  ON cte2.AccountId = cte1.AccountId;

If you were to capture execution plans for both of these, you’d see the same query plan pattern where the query contained by the view and the common table expression is executed twice, so the results from one reference can be joined to the results of the other reference.

I’ve talked about this many times in the past with common table expressions, but not in comparison to views.

sql server query plan
oh dear me.

If you’re one of those developers who disdains views for their very existence, you shouldn’t feel any differently about common table expressions in practice (at least from a performance perspective).

One place where views edge out common table expressions is that the code is contained in a module that can be referenced by any other query or module (stored procedure, function, etc.). If you fix query performance for a view, everything that references it also gets faster. Doing that for a common table expression that gets used frequently is cool but… then you have to go find and replace it all over the place.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

A Little About PAGEIOLATCH Waits In SQL Server

A Little About PAGEIOLATCH Waits In SQL Server


Video Summary

In this video, I delve into the concept of page IO latch weights in SQL Server and how they can indicate performance issues related to insufficient memory for the amount of data stored. Using my scaled-down server setup, which mimics common client environments with limited resources, I demonstrate how high page IO latch weights can significantly impact query performance by forcing frequent disk access. I also discuss practical steps to mitigate these issues, such as optimizing indexes and improving query writing to better leverage available memory.

Full Transcript

Erik Darling here with Darling Data. In today’s video, we’re going to talk a little bit about page IO latch weights. And before we do, what I need to do is introduce you in case, well, let’s be honest, you may not have seen every single video on my channel. You may not have seen very specific videos where I talked about this tiny little server that I use to really beat up on and show how bad things can get with SQL Server. So let’s zoom in on this a little bit. I should probably be in the right window context for that, right? I hope hitting control and one didn’t break anything important outside of this thing. So this tiny little server has about 14 gigs of memory assigned to it and only two virtual processors. So this thing is very easy to overload and overwhelm. The thing is that this is sort of like me having to create something that scales or like create something that’s a good scale model of like some of the hardware configurations that I see when working with clients. So like my Stack Overflow database is only like a hundred and something gigs, but like, you know, like my laptop that I usually use, the VM that I usually use is 16 cores and like 96 gigs of memory. So like, that’s pretty solid, right? Hardware to like, you know, data comparison. You can still make bad things happen with it, but it’s a lot easier to like show the kind of performance problems I see with a really scaled down VM, but still the same size of data. So like you could take, you know, like a production server that maybe you’ll work with and it might have like, I don’t know, like eight cores or 16 cores or something and like 128 gigs of memory, but you have like a terabyte or two of data on there, right? So like, this is just sort of like a scale model of like the size of the database that I normally use to make sure that the server can kind of like display some of the same issues that I see happening with like live production servers and my day-to-day client work. So that’s kind of the reason behind this thing. So let’s get out of there. And let’s talk a little bit about page IO latch weights before we go too much further. Now, like, I don’t know, maybe like a hundred, 200 years ago, sometime around the time of the American revolution, there were a lot of blog posts that said like, if you have like high page IO latch weights, but either EX or SH that like you, I don’t know, have a disk IO subsystem problem.

I’ve never met a disk IO subsystem. So I don’t really know when one would have a problem or not. It’s kind of a, kind of an outdated term now anyway, I think. But that’s not really what it means. What it means when these weights really accumulate and when you have like a lot of queries that are constantly waiting on these, or you look at weight stats for the server as a whole, and you just see like, like page IO latch weights are a significant, like number of hours of weights compared to like, like how many hours your server has been up. That means that either you don’t have enough memory for the amount of data that you’re storing period, right? Like nothing you do is ever going to work. You might have great indexes. You might have great queries. You just can’t possibly cache everything you need in memory. So your queries are constantly going out to disk to get it, to bring it into memory to, so that your queries can start, you know, using them.

It could also mean that maybe one, like there’s a, just like a deficiency somewhere, right? Like you, maybe your queries are written like crap with a lot of non-sargable predicates, like is null, coalesce, substring, left, right, whatever, in the where clause. Or maybe you just have like no good indexes for your query to use and everything is scanning like a heap or a clustered index or something. I don’t really have a good hard and fast rule about how much memory you should have compared to how much data you have.

You know, if you can get close to one-to-one and, you know, you see your page IO latch weights, like almost disappear, great. You might be in weird situations where, you know, like you might have on your server, like a five terabyte database full of nothing that anyone ever touches. You just dump data into it from other places and it’s like archive, cold storage, like maybe like once a year someone will do something with it.

And your main production database is reasonably small. So like don’t just look purely at size of data when you want to figure out how much memory you have. Really what you want to do is look at your page IO latch weights.

And kind of like a weird brute force way of thinking about it is that like every time you double memory, you’ll cut your page IO latch weights in half. That’s not always going to be perfectly true. That’s not always going to be like exactly what happens.

But that’s like a just like a way to start thinking about it so that you can start figuring out exactly how much memory you might need to get rid of them as completely as possible. The reason you want to get rid of them as completely as possible is because when queries have to go to disk, right, disk is always going to be the absolute slowest component in a server. I say I use like I use in a server loosely because most storage these days is not actually in the server.

Most storage these days is connected via network. That’s also true of my VM, right? Because if you look at my VM, it’s on this tiny little Samsung disk, right?

And this Samsung disk is connected via a little USB cable right here. And this tiny little USB cable is often what has a very, very hard time moving data around, right? So my VM and everything is on here, but all my memory is on my laptop.

So anything that I do has to get scooted across that wire, which can really slow things down. One great way of figuring out if not your storage, but if your storage networking is having a hard time keeping up with the amount of data load that you’re putting on it, is to look in the error log for 15-second I.O. warnings.

To make this easy on you, I have a store procedure called SP underscore log hunter. I apologize. My social media intern who does all the sound effects in my store procedure is still out sick.

I don’t know. Maybe drunk just disappeared something. I don’t know. Maybe they quit. I just haven’t found the letter yet. But if you go there, you can run.

If you go to code.erikdarling.com, you can grab SP log hunter, and you can use that to parse through your error log and look for all the awful things that hide out in there. The 15-second I.O. warnings are one of the messages that it looks for. So what I’m going to do is I’m going to configure my tiny little server tuning server to have 8 gigs of memory.

And I’m going to create this store procedure, which actually it’s already created. I don’t know who I think I’m fooling here. And I’m going to use SQL query stress to throw 10 threads, at least according to my notes here.

I need 10 threads and 10 iterations. All right. Good background information there, right?

10 threads, 10 iterations. 10 times 10 is 100. I’m going to run this thing 100 times. I’m actually going to run it more than 100 times. If you look at the logic in here, it’s kind of gnarly. This runs in a while loop, and this while loop essentially will make this thing run 10 times.

Every time it runs, it’ll pick a random number between 1 and 5. And depending on what number goes, we’re going to do a select count from a different of the main tables in the Stack Overflow database. So what this is designed to do is, like, with the amount of memory that we have in here, is just a really stressed disk.

All right. So if we look at, where is that hiding? Oh.

That’s in here somewhere. Maybe it’s not. Maybe I lied to you. Maybe I need to type in a demo. I’m sure I’ll offend someone. Usually do, right? Someone gets mad about something. Let’s look at SP pressure detector.

Okay. Let’s examine some stuff here. One of them, or rather, what I want to look at, is down here in the memory section. So I have 126 gigs of data, 14 physical gigs of memory, and 8 gigs assigned for max server memory.

All right. So there’s no way that me reading all those tables randomly from all this stuff is going to play well with this stuff. All right.

So obviously, we’re at a little bit of a disadvantage here in our data to memory ratio. So let’s, I already have this window open with SP who is active in it. Put a, oh, that’s two semicolons.

I typed again. Freak out. Right now, nothing’s going on. But if I hit go here, and we start looking at what’s running, all of these queries are going to get really just beaten up by page IO latch weights. All right.

We just keep running this and looking at stuff. We have some really, really bad weights going on here. All right. Almost 500 milliseconds. There’s another one.

Oh, 200 milliseconds. There’s a couple more that are almost 200 milliseconds. So this is what query workloads will look like on servers where, like, you just plain don’t have enough memory. It might not be this amazingly consistent.

It might not be, like, just everything waiting for all the time. But if you use SP who is active enough and you get familiar enough with your server, you’ll probably see this situation happen quite a bit. Right.

If you, you know, just, like, make sure, like, keep an eye on, like, your really busy times of the day, this is going to happen, like, constantly. Right. You’re just always going to see, like, in whatever list of queries are running, you’ll see, like, a bunch of them waiting on page IO latch weights. So let’s go cancel this because we don’t really need to, we don’t need to keep doing that.

So that’s what, in general, queries will look like when page IO latch weights are, A, like, dominant in the workload, and B, when you’re, like, you’re overwhelming storage hardware at some point. Either, I mean, I guess it could be the disks. It’s usually not the disks.

The disks are usually fine. It’s usually the networking between your server and wherever you’re reading stuff from. So one thing to sort of think about with all that, again, like, as you double memory, like, you might cut those weights in half.

Now, if I were running this demo on my laptop where the storage is direct attached and really fast and, you know, just, like, all internal and, like, the memory and the storage and the CPU are, like, within, I don’t know, it’s a 16-inch laptop. So they’re going to be pretty close to each other. There’s not a lot of, like, distance to travel for things to get from one part to the other.

We would still see a lot of page IO latch weights, right, because I don’t have enough memory to keep what I need available for queries to use. But you probably just wouldn’t see, like, the milliseconds get up as high, right? And I know that, you know, like, almost no one is using bare metal servers with direct attached storage.

And, you know, if you’re on a VM where your storage is remote to your server or you’re in the cloud where you’re, like, I know there are certain cloud instances that do offer, like, some forms of direct attached storage. But let’s face it, if you’re watching free videos on YouTube, like, to learn about SQL Server, your company is most likely not springing for, like, the really good servers with, like, the really fast direct attached storage. You’re probably, your storage is probably, like, a couple tin cans and some old yarn.

So there is that. Now, one way that, well, two ways that you can sort of make better use of the memory that you have is to give queries better indexes to access data, like, quickly and easily. And, you know, like, all these queries, if we look at the estimated plan for this thing, all of these queries are just going to be doing clustered index scans, right?

So we’re reading, like, effectively the table, like, everything, right? It’s, ugh, get in there. The clustered indexes are, you know, big.

They’re all the columns in the table. I was actually at a session sort of recently by a fellow named Tim Chapman who works at Microsoft. He made a really good point that calling clustered indexes clustered indexes can be somewhat misleading because it’s really clustered tables, right?

Because you have a heap table or you have a clustered table. Putting a clustered index on a heap makes it a clustered table. It’s not like you have this heap object and then a clustered index somewhere over here.

You just cluster the table, right? So your table is the clustered index. So I thought that was a really good point. I’m trying to make, like, a mental note to, you know, phrase things that way because I think it’s actually a really good point and something that a lot of people do get confused about when they’re working with SQL Server.

So a lot of other database engines, they don’t necessarily have the concept of clustered index, right? Like Postgres is like heaps and then being treated in nonclustered indexes and the 7,000 other kinds of indexes you can create there. Oracle, I mean, who knows?

I don’t have enough money to test it. So I’ll just leave that alone. But if we wanted to make better use of the memory that we currently have in the server, again, we could create narrow, nonclustered indexes that would be useful for our queries to use. But we also have to make sure that the queries are written in a way to take advantage of those indexes.

So while this index creates, I forget how long this takes. You know, it’s a tiny little baby server. It might take a second.

I’m going to show you with this index. And this is a single key column index on the PostID column in the comments table. What I’m just going to show is two examples of queries where we have a good where clause like this, right, plainly written without any interference, and then a bad where clause that looks like this, where, you know, we’re not going to be able to take good advantage of the index that we just created on the table.

So I’m going to run the query two different ways. And in those two different ways, what I’m going to show you afterwards is how much of the index ends up in memory, like after the query runs. All right.

So we’re going to come back up here. And we’re going to run this first with the good where clause. And let’s execute you.

All right. So we have our nice narrow little index, right? If you look up here, you can see this frame that a little bit.

The primary key, the clustered primary key, or the clustered table that is the comments table is about 8 gigs. And our narrow little nonclustered index is about 241 megs. Now, I did use page compression with this index.

I tend to, like, use page compression a lot with indexes when I create them because it’s just another really good way to take advantage of or to make better use of your server’s memory. It’s a smaller object on disk. It’s a smaller object in memory.

And there’s just some, you know, some CPU overhead as you decompress it on the way out of the buffer pool. So our query, when it ran, it used our nice narrow little nonclustered index here. And we’re able to seek very easily to the row or rows.

I mean, there’s one row. It counted one that we cared about. And because of that, we were able to read just a very small portion into memory. Now, granted, this is just one execution of one equality predicate.

In real life, you’ll have, like, a whole bunch of queries that are running and looking for different equality predicates. So you might, over the course of time, you’ll end up reading portions of that index. And you’ll probably see more parts of that index in the buffer pool.

But it’s still a much better use of the buffer pool than just straight up scanning a clustered index. So let’s rerun this with the badware clause. Zero on that and one on that.

I should probably write this as a loop or something. But, you know, I’m kind of lazy. And I like typing in demos because it makes all the right people upset. So now let’s run this with the badware clause using that coalesce function there.

And what happens is because we still get the same result back of one row. But when we look at the query plan, it’s a scan of that nonclustered index. It’s no longer a seek into that clustered index.

And because we had to scan that whole thing. Now, not every scan will always read everything in there. If you use, like, top or offset fetch, you could bail out early. But in this case, we don’t have either of those.

We’re just getting a count. But because we had to scan that, we had to do a scan and get the whole thing into memory, it looks a lot different this time. Right? Now all 241 megs of the index end up in memory.

So when you’re thinking, when you, like, let’s say that page IO latch weights, both the underscore sh and the underscore ex, those are the most common ones that I see. I don’t usually see the other page IO latch weights cropping up and being awful issues.

Let’s say that those are, like, really just, like, predominant weights on your server. Or every time you sample your workload, you see a lot of those page IO latch weights cropping up and slowing queries down. Right?

Just adds, like, variable speed queries. Like, maybe not parameter sniffing. Maybe it’s just, you know, crappy luck with what data pages are in memory. So, like, let’s say that, like, you know, you’ve made an assessment of the server that, like, page IO latch weights are a problem. You can kind of, again, you can kind of brute force reducing them by, like, thinking every time you double memory, you’ll cut those weights in half.

Right? Like, for me, ideally, if, like, I don’t really like to see page IO latch weights be more than, like, 20% of, like, your server’s uptime. Granted, like, bursty workloads can make it look like they’re barely a problem.

But, you know, like, really, like, you know, consistent, like, ongoing workloads. I want to say 24-7. But I hate saying 24-7 because it reminds me of being on call, which is, like, depressing.

But, like, let’s say, you know, you’ve made that assessment. You can generally think that every time you double memory, those weights will get cut in half. You can, of course, like, that’s usually, like, a good, like, I want to make a big change really quickly.

Like, I want to, like, I’m going to do something to the server that’s going to make every query faster immediately. That’s one thing you can do. If you can’t, like, I don’t know, for whatever reason, either you can’t do that or you’ve already done that, then it’s time to start looking at, you know, queries and indexes.

You know, like, your first step probably shouldn’t be adding indexes. Your first step should be, like, assessing indexes. Look for unused indexes.

Look for duplicative indexes. SP Blitz indexes, of course, wonderful for that. And you’re going to want to start, like, trimming down indexes first. After the indexes get trimmed down a whole bunch, then adding in new indexes that support queries better is a good second step.

Also, making sure that your queries are written in a way that they can actually take advantage of all the indexes that you’ve created on there. So that, you know, like, you don’t create these indexes and then realize that your queries all have, like, is null, coalesce, left, right, substring, you know, other sort of, like, non-sargable predicates in your queries that make it so that your queries can’t seek the data. They have to scan data to filter everything out.

So, you generally don’t want to tinker with disk speed to fix this. You want to make sure that you’re not reliant on disk anymore, right? Like, getting faster disks might speed up the page IO latch weights, but your goal should be to just, like, get rid of them by not being reliant on disk.

Faster disks can be useful for some things, right? So, like, if you, like, you know, if you have, like, really high write log weights and, you know, the average milliseconds per weight is pretty high, faster disks can be useful for that. But having more memory is still valuable because you’ll be, like, especially considering, like, most servers are using remote storage, more memory is still really valuable for that because you will be taking traffic off the network pipe between your server and the storage.

So, you will clear that space so the write log stuff can happen faster. IO completion weights, backup related weights, those can also be helped by faster disks and whatnot. But really, your goal should be to increase memory to the point where, like, A, your queries are more consistently fast because you don’t have to sometimes go to disk and sometimes not.

And you want to clear that network path so that when queries do need to go to disk, either to read data or to write data, they have a pretty clear pipe to do it. So, that’s my thoughts on that. And that is a little bit about page IO latch weights.

Whew! All right. So, as always, I hope you learned something. Hope you enjoyed yourselves.

Thank you for watching. If you like this video, there’s a thumbs up button that I would appreciate you punching the crap out of. Like, really wind up. Go at it.

Like, punch it like it kicked your dog or something. If you like this kind of SQL Server performance tuning content, there is also a subscribe button, which I would like you to attack with equal aggression, equal veracity. That’d be cool.

And let’s see. What else? I don’t know. Yeah, like, subscribe. Got that covered. I think that’s about it. All right. It’s dinner time. I’m going to get out of here.

I’m going to go change into my formal Adidas t-shirt and do the dinner thing. So, thank you for watching, and I will see you in the next video. It might be tomorrow if you’re all good, well-behaved out there and you really like and subscribe with adequate force.

Thank you.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

sp_PressureDetector: Now With PerfMon Counters And Server Sampling

sp_PressureDetector: Now With PerfMon Counters And Server Sampling


Video Summary

In this video, I discuss two recent updates to SP Pressure Detector, a tool I’ve been working on for monitoring SQL Server performance. Despite the late hour—my usual night owl tendencies led me to stay up until four in the morning, sans any celebratory drinks—I managed to add some new features that should help you better understand your server’s current state and recent trends. The first addition is a suite of Perfmon counters, which I’ve ordered by object name and counter name for clarity, though feedback on this ordering would be greatly appreciated. Additionally, I’ve introduced the ability to sample your server over a specified period, allowing for more granular analysis without long wait times. While these changes are still in their early stages, I’m excited about how they can enhance troubleshooting workflows. As always, any suggestions or critiques are welcome at Darling Data; we’re dedicated to making our tools as useful and user-friendly as possible.

Full Transcript

Erik Darling here with Darling Data. And in this whirlwind video, I’m going to talk about a couple new additions to SP underscore pressure detector. And unfortunately, my audio tech who adds the echo to all of my storefront videos, store procedure announcements is out sick today. So you’re just going to have to deal with an echo-less store procedure announcement. So the two big things that I added to SP pressure detector recently, it was a weird night where I didn’t drink. And I felt very motivated.

And I ended up staying awake until like four in the morning, adding all this stuff in. I’m not going to walk you through the code because it’s kind of boring. It’s just like, you know, a loop and a go-to and a second pass and things for one thing. And it’s just a bunch of perfmon stuff for the other thing. And if in your whole entire long-legged life, you have never seen a sys.dmos performance counters code, you’re actually probably pretty lucky because everything you have to do to query that dynamic management view is real ugly.

Whoever designed that has a pure hatred for humanity that I don’t think can ever be matched, at least not by another human. If aliens ever invade Earth and start inflicting torture and cruelty upon the entire human race, like in that John Travolta movie where he’s an alien and really likes gold, I think it’s a Scientology flick. I can’t really remember. I can’t really remember. It’s something Earth, Battlefield Earth maybe.

Then maybe someone could equal, probably not surpass, but at least equal the hatred that whoever designed this management view has for humanity. So we’re going to skip that part because I care about you. We care about you a lot. That’s why I stay up until four in the morning dead sober writing these things.

So, yeah, we’re going to just dive right into it then. So I’m using SSMS 20, brand new release. Very excited about the fact that Azure Data Studio is no longer packaged in with SQL Server Management Studio 20.

There are a couple of downsides, though. One, I’m a little lazy, and I haven’t quite set up everything in here the way I want it to. And also, SQL Prompt does not yet detect SSMS 20.

I went to install it, and it was like, you need to have SSMS installed. And I was like, I have SSMS installed. So I guess it’s just a waiting game on that. All right.

Anyway, I’m going to give SP Pressure Detector a run like this. And I only have the minimum disk latency thing up here set to one so that results actually come back from it. Because I have good disks in my laptop, my very expensive laptop.

And gosh darn it, there’s not a lot of latency in there, which is both good and bad. All right. Okay.

So we’ll give this a run. And the new thing that I added to the results here is Perfmon counters. So I’m still tinkering with this a little bit. I’m not exactly sure if this will be the final set of Perfmon counters that I include.

I might add some. I might take some away. I might change the way that the results are delivered. But for the most part, I’m okay with what I get back now.

Like, it’s working, and it just needs some tinkering. Right? Because, again, you stay up until like 4 in the morning doing something. You know, you make some editorial choices that, you know, next day you might not agree with.

That goes for probably whether you’re sober or not. Probably more so if you’re not. But, you know, 4 in the morning does desperate things to the soul.

So right now I’m returning a bunch of Perfmon counters. The ordering of them I like, but I’m not in love with. I order by object name and counter name and then totals within that.

I tried just ordering by, like, totals, but I found that, like, the object name stuff being… The object name and counter name stuff being all mixed up made the results, like, confusing to read. And it was kind of nice having everything, like, grouped together and then ordered within those groups.

But, you know, you might like it a different way. And if you do, if you prefer it a different way, go to my GitHub repo. We can maybe figure out if there’s, like, some custom ordering that I can add in there.

Or maybe just you have a better idea for the ordering. We can go with that. I don’t have so much pride of authorship that I think I do everything right on the first try. Especially at 4 in the morning.

So, yeah. Feedback is always welcome at Darling Data. So, yeah, this section, you know, pretty normal Perfmon counter stuff. Again, you know, I’m not really sure if everything is going to stay in here.

Or if I might subtract some stuff or I might add some stuff. I don’t know how I feel about, like, having, like, metadata lock stuff in there. Like, there’s probably going to be some time when, like, it’s, like, the problem.

And I’ll kick myself if it’s not in there. So, for now, I’m just going to leave it in. But if there are any Perfmon counters you feel like they’re missing from this list, you can see if you go into the El Sorso Codo from this store procedure.

I’m only going to scroll down to the list of them because that’s the stuff that I would like feedback on. So, if there’s anything from this list that you feel is missing or that you feel is stupid, again, feedback.

Always welcome at Darling Data. We are voted by Beergut Magazine the friendliest and most willing to accept user feedback. SQL Server Consulting firm, the entire known universe.

And apparently that includes whatever the aliens from Battlefield Earth were, wherever they’re from, I’m not entirely sure. But, you know, anyway, this is the current list.

Let me know. Let me know. So, that’s the one thing, right? We added in Perfmon counters. The second thing that I added was the ability to take a sampling of your server. So, you can run SP Pressure Detector for, I believe, a tiny-ent number of seconds.

I think it was a tiny-ent. The cap on that is 255 seconds. If you really need bigger than that, I don’t think you’re doing the kind of emergency troubleshooting that I do. I usually don’t need more than, like, maybe 30 seconds at the high end, maybe 60 seconds at the very high end.

So, I capped it at a pretty small number because I didn’t want someone to enter, like, 2 billion in there and just, like, have a wait for going for 2 billion seconds, however long that is. The aliens from Battlefield Earth will probably have killed us all by then.

So, I’m going to kick off a query over here just so there’s a little bit of activity. I’m going to drop clean buffers just to generate some reads and whatnot. And then I’m going to run this.

And roughly 5 seconds later, depending on exactly where in the wait for we hit. Sometimes it says 6 seconds. I think it’s just like I hit, like, a weird edge of things. But that’s no big deal.

It’s close enough to 5 for me. It’s not like anything in here takes a second to run. It’s all fast, right? So, what we see up here, let me just tuck that away to buy ourselves a little bit more screen real estate. What we have in here, oh, I didn’t mean to cover that up.

I’m just silly of you. I want everyone to see the glory of the sample seconds in here. It’s very important stuff. You’re going to see wait stats.

And I have reshuffled the column orders a little bit just to make things, like, right and consistent looking. And I’ll name the same thing. But right now, the only three result sets that get the sample comparison are the first three that you see here. So, wait stats, file stats, and perfmon counters.

Nothing else right now does that. I just don’t know what else I would really care to do that for. Like, the memory stuff, I mean, I guess it would be okay.

But, like, you know, really, like, I just don’t see a lot of value in it. If you see value in it, maybe we can, you know, we can, again, user feedback is always welcome. We can figure that out.

But, like, any section that has the final column sample seconds in it, like these do, those are the ones that have sample comparisons. The rest are just kind of a snapshot of what’s currently happening. So, like, the buffer pool memory is current.

The 10DB info is current. All the stuff that you see down in here is current. The thread usage stuff actually might be kind of cool to have a comparison for. But, you know, you know, V1, 4 in the morning.

I wasn’t willing to stay up until 5 in the morning to get this going. So, that’s what we have there. So, this will tell you, you know, what weight stats you had during that sample. What drive activity you had during that sample.

And this is still ordered by the total read stall here, which might look a little bit funny. Look at 10DB with negative 26. That’s cool.

Thanks. Thanks, 10DB. File 7 having fewer reads at the end than at the beginning, I guess. Or, sorry, less average read stall at the end than at the beginning really threw me off there. I might have to do a little bit of fixing on this to get all the numbers kind of in a nice positive space.

But even though these files had the most reads, those reads happened very, very quickly. All right. No messing around in my laptop.

So, a couple new cool, I think, new things added to SP Pressure Detector. Again, this is all free open source stuff. You can download it.

You can, if you’re feeling particularly masochistic, you can read the source code. You can put these on your servers. You can run them. You can find your SQL Server problems. And if you need help fixing your SQL Server problems, young, handsome consultants are standing by to help you.

Like me. I meant me, not anyone else. All the other consultants are old and ugly.

You want young and handsome consultants. Right? Cool. Anyway, thank you for watching. Hope you enjoyed yourselves. I hope you learned something. I hope you find this code useful.

I hope that you’ll go download it, try it out. And, of course, give the old darling data some feedback. Because that’s what we thrive on. People saying, you’re an idiot.

Or, thanks. Either one. Kind of cool. Anyway. Thank you for watching. And I will see you in another video some other time. Goodbye.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Join me In Boston May 10 For A Full Day Of SQL Server Performance Tuning Training

Spring Training


This May, I’ll be presenting my full day training session The Foundations Of SQL Server Performance Tuning.

All attendees will get free access for life to my SQL Server performance tuning training. That’s about 25 hours of great content.

Get your tickets here for this event, taking place Friday, May 10th 2024 at the Microsoft Offices in Burlington.

Here’s what I’ll be presenting:

The Foundations Of SQL Server Performance Tuning

Session Abstract:

Whether you want to be the next great query tuning wizard, or you just need to learn how to start solving tough business problems at work, you need a solid understanding of not only what makes things fast, but also what makes them slow.

I work with consulting clients worldwide fixing complex SQL Server performance problems. I want to teach you how to do the same thing using the same troubleshooting tools and techniques I do.

I’m going to crack open my bag of tricks and show you exactly how I find which queries to tune, indexes to add, and changes to make. In this day long session, you’re going to learn about hardware, query rewrites that work, effective index design patterns, and more.

Before you get to the cutting edge, you need to have a good foundation. I’m going to teach you how to find and fix performance problems with confidence.

Event Details:

Get your tickets here for this event!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Updates To sp_QuickieStore, sp_HealthParser, and sp_PressureDetector

sp_QuickieStore


The Darling Data machine has been hard at work making the bits and bins and whatnots work better.

You can download all of the main SQL Server troubleshooting procedures I use in one convenient file.

Okay, so like, maybe not all that hard at work, and we I had some help with a funny bug, stemming from a recent Pull Request. The Read Me file has been updated to reflect the newest parameters.

Let’s start with giving some credit, here, since I hate an overdue bill.

ReeceGoding reported and very capably fixed an issue with sp_QuickieStore. The problem only occurred when you ran the procedure with @get_all_databases and the new @escape_brackets parameter set to true. Each loop would add an escape character to the search string, which could end up looking like this: %\\\\\[AnyStringStartingAndEndingWithSquareBrackets]\\\\\]%'

The good news is that it’s fixed now. A big round of applesauce for that. It’s the only substantial change to sp_QuickieStore.

Now, on to the other stuff!

sp_HealthParser


I had to make a bunch of adjustments in here. I was testing out some stuff and hoping that I could make it work, but the math just wasn’t there. Or the maths just weren’t there, if you’re into multiple maths.

For the wait stats sections, I’ve removed the “total wait time” column. It just didn’t make any sense, because the only information that I had was the total number of waits, the average number of waits, and the maximum wait time. I was hoping I could get sensible numbers by multiplying the total and average, and adding the max wait time to that. The problem is that the numbers were huge. Enormous. Senseless.

The other thing I had to do some work on was in the “waits by duration” section. It seemed like the results would just re-report on every collection cycle, even if no new waits had occurred. It would make things look like you had the same waits in every grouped block of time. Via the magic of windowing functions, I weed out absolute duplicates, and only show where the wait stats report something new. As part of this, I also filter out waits with a low average duration (less than 500ms by default). You can change this behavior by using the @wait_duration_ms parameter.

Another thing that I set a threshold on was in the “cpu task details” section. In there, it would show sections with a warning logged to the system health extended event. The problem was mainly that a warning would be logged whenever there was even a single pending task (query waiting to get on a CPU). Having just one of those is not a very interesting sign of CPU pressure, so there’s a parameter called @pending_task_threshold that defaults to 10 which you can use to decide how many pending tasks matter to you.

sp_PressureDetector


This is perhaps the biggest update! It hadn’t changed much in a while, and I wanted to get a little more out of what SQL Server has to offer.

The big changes are:

  • Added in some useful PerfMon counters to the results
  • Added the ability to sample server activity

When you run sp_PressureDetector the “normal” way, it’ll show you perfmon counter activity since startup. I only collect specific ones though, and only specific counter types. See, there are a variety of types of counters.

The math to figure out how much of something happened during a period of time is about the same for all of them, but I found looking through the various categories and specific counters, that only those that come from types 272696576 and 65792, and only certain ones within those.

The new section will look something like this:

sql server query results
perf, mon

If you decide that you don’t care about perfmon counters, you can skip that section by using the @skip_perfmon parameter.

The sampling code was inspired by running into some client issues where hitting F5 was okay, but remembering what all the numbers were before to compare them to after. You can run the sampling code by doing this:

EXEC sp_PressureDetector
    @sample_seconds = 5;

Not every section supports this yet. I started with the ones that I thought would benefit the most:

  • Wait stats
  • File stats
  • Perfmon counters

I may add it to more later, but these are good enough for now. The easy way to think about it is that only sections with a “sample_seconds” column in them are sampled. The rest are current.

Look for a video walkthrough of the latest round of changes.

Again, you can download all of the main SQL Server troubleshooting procedures I use in one convenient file.

Issues and Contributions


If you’d like to report an issue, request or contribute a feature, or ask a question about any of these procedures, please use my GitHub repo. Specifically, check out the contributing guide.

As happy as I am to get emails about things, it makes support difficult for the one-man party that I am. Personal support costs money. Public support is free. Take your pick, there!

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

SQL Server Query Transformations With ROW_NUMBER And ANY Aggregates

SQL Server Query Transformations With ROW_NUMBER And ANY Aggregates


Video Summary

In this video, I wanted to share a fascinating query plan transformation that I recently stumbled upon while working on some SQL Server queries. I decided to dive into it after noticing a difference in how two seemingly similar queries were executed by the optimizer. The first query used a row number function and referenced it in the outer select statement, resulting in a standard window aggregate filter operator in the plan. However, the second query did exactly the same thing but without referencing the row number function in the outer select, leading to an unexpected transformation where the optimizer grouped everything together using an any aggregate instead of the usual windowing functions. While this didn’t have a significant impact on performance (both queries finished within milliseconds), it was still quite surprising and demonstrated how SQL Server can optimize query plans in ways we might not expect. I also wanted to give credit to Paul White, who had already written about this phenomenon years ago, highlighting the specific conditions under which such transformations occur. This video is part of my ongoing series on writing efficient queries and understanding query execution plans better.

Full Transcript

Erik Darling with Darling Data. And I know that not really much in the grand scheme of things, but my YouTube channel recently hit, did not cross yet, but hit the 3,000 subscriber mark. And so we’re going to have a 3,000 second moment of silence to commemorate. Just kidding. We’re not going to do this. We’re going to talk about a funny query plan transformation that I ran into the other day. And that I hadn’t actually come across before, at least that I’d noticed, that I thought was pretty cool. And there I was feeling all like flying high, like found this cool thing. Hey, check this out. And of course, Paul White wrote about it when I was like five years old. So we’re going to, I’m going to show you what it is. And then we’re going to go look at his post for a minute and I’ll give, I’ll give credit where credit is due. It’s all of our fathers. So these queries are probably way more complicated than they need to be, but they’re just what I had that produced it. So that’s what you’re getting. I’m going to zoom into where these queries are different, just so you understand a little bit about why I thought this was cool, and why I was surprised by the query plan that I got. So this first query, in the inner part of the query, I have a row number function, aliased is n. That’s this whole thing here. And then the first query, and this is the only part that’s going to be different from the second query, I reference the windowing function in the select list. In the second query, I do exactly the same thing right here.

Actually, I should show you one other thing is that at the very end of the query, both of these filter down to where n equals one, right? So in the second query, I don’t reference that column in the select list. You can see that’s where it’s different. I end that with user ID there, but I still do the filtering down here, right? So where y.n equals one. So the only difference between these two queries is that in one, I reference the row number function and the the outer select and the other one, I don’t, but they both still filter down to where y.n equals one. So let’s look at the query plans for these. In the first one, I have what I consider to be a pretty normal looking query plan for a query that has a windowing function in it, at least for batch mode, right? Because I have my window aggregate operator here. If this were happening in row mode, there’d be a segment and a sequence project.

And, you know, that would be what’s making the row number happen. Right? And if you look over here, I have a filter operator and that filter operator is going to be what brings the result set down to where the row number equals one. That expression 1010 in this context is the row number function. So filtering that down to one is what happens there. So we generate the row number, we filter it to one.

What happened in the second query that surprised me quite a bit, and this isn’t a performance thing necessarily, this is just a cool transformation thing. You see both queries finish in just, well, within like a millisecond of each other. 1.127 seconds and 1.128 seconds. So this isn’t a performance thing. This is just kind of a cool optimizer trick. So that’s that.

But if we look at the second query, notice that there’s no window aggregate operator and there’s no filter operator. Right? If we look at these two plans kind of side by side, there’s like a window aggregate filter up here. And down here, there’s only a hash match aggregate.

Hello, my name is Eric. I speak for a living. So if we zoom in on this plan a little bit, you’ll see that we don’t have any of that stuff. All we have is the hash match. And the details of the hash match are pretty cool.

Now, another thing that I want to point out, this is what really like freaked me out at first. I was looking at this, I was like, but how did it get down to the right number of rows? Because when you look at it, right, like this query up here at the filter, it gets down to 1092436.

The estimate’s close enough, you know, horseshoes, hand grenades, government. And down here, the hash match aggregate is what gets us down to 1092436. So this does produce the right number of rows. It just does it in a different way.

So if we zoom, if we look at the hash match aggregate, we’re not going to get anything too interesting from the tooltip. But if we go into the properties and we line this up in a way that does not make us look like a vicious idiot, we’re going to see a couple things. We have the hash key build and there’s nothing really interesting in here.

This is just showing unions and whatnot. But if we look in the defined values up here, we have this any aggregate, right? Scalar operator equals any.

Interesting stuff. So down here, again, where stuff is not all that interesting, where we just have these stupid union names, these union columns are what’s in the column expressions from the row number. So like the hash key is building up the columns in the partition by and order by that I have in my row number function.

And then this any aggregate is what is giving us matches or just giving us whatever comes out of the text column in the comments table. So I do have to go back to the query a little bit to explain what I’m doing, where I am trying to get, I’m trying to figure out a unique set of values for user ID, score, creation date, and post ID. But in the select, I also have this text column.

So what I was trying to get away from, this is going to be in a blog post of mine coming up about union and union all and part of my how to write queries correctly series. So what I was trying to get away from is just doing a union between two different queries that hit the comments table, because deduplicating the, like getting a distinct result set with the text column involved was slow or slower, right? It’s like, you’ll see in the post, but so that’s what I’m trying to get away from.

And so I figured, well, I don’t need to get rid of, I don’t need to like deduplicate the text column because like, like that’s, that’s like distinct enough on its own, right? Like if I find where a user, if I find the user ID score, creation date, and post ID, that gives me a unique enough combination to figure out if this is unique. So SQL Server does all this stuff, and it groups all this stuff, and then it uses that any aggregate to just get whatever exists in the text column.

So that was a performance thing. This particular example where one uses the row number and the filter to do it, and the other one just groups everything together and then uses the any aggregate on the text column. That is not really a performance thing.

It’s just a cool transformation, like I said. So if we go over to Paul’s post about it, there is a section. I appreciate that section heading. What magic is this?

It is, here are some rules. So this, this is a simplification rule. And there are some particular things that you need to qualify for in order for this particular transformation to kick in. The filter has to, has to restrict the row number to one.

If you do like greater than one or less than two or something like that, probably not going to get what you want. The expression added by the sequence project must not form part of the query result. So that, that is why it was different from one to the other.

Because when you select the row number, it does the whole window aggregate filter thing. And, but when you just have it in the where clause, SQL Server is like, oh, let me just give you any old thing. Uh, you have to partition by in order by the grouping columns, which I did, right?

That’s just by magic. And the non-grouping columns must be constrained, not null. I read that wrong at first. And I was like, it was like, if you go look at the comments table layout and stack overflow.

Like, I was like, I read that, I read that wrong the first time and I was like, but wait, but score and user ID are nullable, but maybe because I’m restricting the null zone. But then I realized it was the non-grouping column, which is the text column. And that is indeed constrained, not null.

So just sort of by some happy, happy happenstance, this happened to work out and get that transformation, which I thought was cool, which turns out to just be old news. So anyway, if you, dear watcher, dear viewer, I guess, I like watcher better. I like to think most of my audience is creeps.

If you, dear watcher, ever come across a query plan and you write your row number function or your windowing function and you don’t see the usual stuff that you would see when you have, when you write windowing functions, one thing that might be happening is you might be getting this cool transformation with the any aggregate. So just something to keep in mind out there. I don’t really have much more to say about it. I’ll link to Paul’s post in the notes for the video so you can go read much more detail what’s going on.

But I just thought it was something kind of neat and cool. And if it surprised me, I’d like to think that it would surprise a lot of other people to see. So kind of a neat thing. Not terribly informative, but just to me at least, terribly interesting. So anyway, usual spiel here. Thank you for watching. Hope you enjoyed yourselves. I hope you learned something. Probably not too much.

Maybe just maybe maybe you learned something is a little bit ambitious for this video. Maybe just observed something or are now aware of something is probably about the limitation that we can put on this one. Right. So good, good. Anyway, if you liked the video, give it the old thumbs up. If you like this sort of content, please subscribe to the channel so we can have a 4000 second moment of silence at some point. And once again, thank you for watching. I think it is Indeed.

I think many of you worked. I think it wasilleurs the last week even because of sports

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Returning A Row When Your Query Has No Results

Canary


Empty results are often dissatisfying. Especially when stored procedures return multiple result sets, it can be easy to get lost in which result is empty.

Of course, there’s many a good argument to be made against actual production stored procedures returning multiple results — usually these should be multiple procedures — I see it happen quite a lot.

Rather selfishly, I do this for my stored procedures, for all the reasons in the first sentence. Especially when debugging stored procedures, you’ll want to know where things potentially went wrong.

In this post, I’m going to walk through a couple different ways that I use to do this. One when you’re storing intermediate results in a temporary object, and one when you’re just using a single query.

Intermediate


This is fairly obvious and straightforward. You do the insert, check the row count, and return a message if it’s zero:

DECLARE
    @t table
(
    database_id integer,
    database_name sysname
);

INSERT
    @t
(
    database_id,
    database_name
)
SELECT
    d.database_id,
    d.name
FROM sys.databases AS d
WHERE d.database_id > 32767;

IF @@ROWCOUNT > 0
BEGIN
    SELECT
        t.*
    FROM @t AS t;
END;
ELSE
BEGIN
    SELECT
        msg = 'table @t is empty!'
END;

Nothing new under the sun in this one at all.

All In One


Let’s say you don’t want or need a temporary object. Your query is good enough, smart enough, and gosh darn it etc.

This is a particularly tricky one, because there’s no way to check the row count from within the query. In this case, you can use a common table expression in a rather handy way.

WITH
    d AS
(
    SELECT
        d.database_id,
        d.name
    FROM sys.databases AS d
    WHERE d.database_id > 32766
)
SELECT
    d.*
FROM d

UNION ALL

SELECT
    0,
    'table @t is empty!'
WHERE NOT EXISTS
(
    SELECT
        1/0
    FROM d AS d2
);

Of course, the usual caveats about common table expressions bear repeating here: The query within the common table expression will run twice:

  1. Once when we select from it outside the CTE
  2. Once when we check for the existence of rows in the CTE

I don’t recommend this approach for long running queries within common table expressions, since this is essentially double your displeasure, double your misery.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.