I think subqueries in select lists are very neat things. You can use them to skip a lot of additional join logic, which can have all sorts of strange repercussions on query optimization, particularly if you use have to use left joins to avoid eliminating results.
Subqueries do have their limitations:
They can only return one row
They can only return one column
But used in the right way, they can be an excellent method to retrieve results without worrying about what kind of join you’re doing, and how the optimizer might try to rearrange it into the mix.
Since subqueries are in the select list, and can’t eliminate rows from results, you will most commonly see them incorporated late in query plans (much further to the left) as outer joins. The optimizer is generally smart enough to retrieve data for select list subqueries after as much other filtering that can have been applied is applied, so they can be evaluated for as few rows in the outer results as possible.
The most important thing you can do as a developer to make sure your select list subqueries are fast is to make sure you have good supporting indexes for them.
Well Done
Let’s start with this query:
SELECT
u.Id,
u.DisplayName,
u.Reputation,
TopQuestionScore =
(
SELECT TOP (1)
p.Score
FROM dbo.Posts AS p
WHERE p.PostTypeId = 1
AND p.OwnerUserId = u.Id
ORDER BY
p.Score DESC,
p.Id DESC
),
TopAnswerScore =
(
SELECT TOP (1)
p.Score
FROM dbo.Posts AS p
WHERE p.PostTypeId = 2
AND p.OwnerUserId = u.Id
ORDER BY
p.Score DESC,
p.Id DESC
),
TotalPosts =
(
SELECT
COUNT_BIG(*)
FROM dbo.Posts AS p
WHERE p.OwnerUserId = u.Id
AND p.PostTypeId IN (1, 2)
)
FROM dbo.Users AS u
WHERE u.Reputation > 500000
ORDER BY
u.Reputation DESC;
The goal is to find every User with a Reputation over 500,000, and then find their:
Top scoring question (with a unique tie-breaker on most recent post id)
Top scoring answer (with a unique tie-breaker on most recent post id)
Total questions and answers
You might look at this query with a deep sense of impending dread, wondering why we should make three trips to the Posts table to get this information. I totally get that.
But let’s say we have these indexes in place:
CREATE INDEX
u
ON dbo.Users
(Reputation, Id)
INCLUDE
(DisplayName)
WITH
(SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);
CREATE INDEX
p
ON dbo.Posts
(OwnerUserId, PostTypeId, Score)
WITH
(SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);
They have everything we need to support quick, navigational lookups.
Query Planner
The query plan for this arrangement looks like this, finishing in 23 milliseconds.
all i need
If you write select list subqueries, and they’re terribly slow, there’s a very good chance that the indexes you have in place are not up to the job, particularly if you see Eager Index Spools in the query plan.
All of the time in the plan is spent in the final subquery, that counts the total number of questions and answers. But even that, at 23 milliseconds, is not worth heaving our chests over.
Three round trips are not at all a problem here, but let’s compare.
One Way Ticket
I’m not opposed to experimentation. After all, it’s a great way to learn, observe, and become enraged with the state of query optimization generally.
Here are two rewrites of the above query, to only make one trip to the Posts table. The first one uses a derived join, and the second uses apply. They’re both written to use outer joins, to match the semantics of the three subqueries:
/*
Join
*/
SELECT
u.Id,
u.DisplayName,
u.Reputation,
p.TopQuestionScore,
p.TopAnswerScore,
p.TotalPosts
FROM dbo.Users AS u
LEFT JOIN
(
SELECT
p.OwnerUserId,
TopQuestionScore =
MAX(CASE WHEN p.PostTypeId = 1 THEN p.Score ELSE NULL END),
TopAnswerScore =
MAX(CASE WHEN p.PostTypeId = 2 THEN p.Score ELSE NULL END),
TotalPosts =
COUNT_BIG(*)
FROM dbo.Posts AS p
WHERE p.PostTypeId IN (1, 2)
GROUP BY
p.OwnerUserId
) AS p
ON p.OwnerUserId = u.Id
WHERE u.Reputation > 500000
ORDER BY
u.Reputation DESC;
/*
Apply
*/
SELECT
u.Id,
u.DisplayName,
u.Reputation,
p.TopQuestionScore,
p.TopAnswerScore,
p.TotalPosts
FROM dbo.Users AS u
OUTER APPLY
(
SELECT
p.OwnerUserId,
TopQuestionScore =
MAX(CASE WHEN p.PostTypeId = 1 THEN p.Score ELSE NULL END),
TopAnswerScore =
MAX(CASE WHEN p.PostTypeId = 2 THEN p.Score ELSE NULL END),
TotalPosts =
COUNT_BIG(*)
FROM dbo.Posts AS p
WHERE p.OwnerUserId = u.Id
AND p.PostTypeId IN (1, 2)
GROUP BY
p.OwnerUserId
) AS p
WHERE u.Reputation > 500000
ORDER BY
u.Reputation DESC;
A somewhat brief digression here: Query rewrites to use specific syntax arrangements are often not available in ORMs. Many times while working with clients, we’ll stumble across a bushel of quite painful application-generated queries. I’ll show them useful rewrites to improve things, and we’ll all marvel together at how much better things are. I’ll suggest implementing the rewrite as a stored procedure, and all hell will break loose.
Please don’t be one of those developers. Understand the limitations of the technology stack you’re working with. Not everything produced by code is good.
Compare and Contrast
In this case, both of the attempts at rewrites result in identical query plans. The optimizer does a good job here, but both of the single-trip queries is about 2x slower than the original.
In this case, that difference will look and feel microscopic. And it is, mostly because I provided indexes so good that you could write this query any which way and it would work out pretty well.
back of the van
While one round trip certainly felt more efficient than three, each trip from outer to inner side of the nested loops had a bit more work to do each time, and that added up.
It’s nothing consequential here, but you may run into plenty of situations where it’s far worse (or to be fair, far better).
For me, the real advantage of writing out the three separate subqueries is to better understand which one(s) do the most work, and might need additional work done to make them fast.
When you do everything all at once, you have no idea which piece is responsible for slowdowns. We know from the very first query plan that getting the full count does the most work, but that wouldn’t be obvious to me, you, or anyone else looking at the two query plans in this section.
Thanks for reading!
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.
Whether you want to be the next great query tuning wizard, or you just need to learn how to start solving tough business problems at work, you need a solid understanding of not only what makes things fast, but also what makes them slow.
I work with consulting clients worldwide fixing complex SQL Server performance problems. I want to teach you how to do the same thing using the same troubleshooting tools and techniques I do.
I’m going to crack open my bag of tricks and show you exactly how I find which queries to tune, indexes to add, and changes to make. In this day long session, you’re going to learn about hardware, query rewrites that work, effective index design patterns, and more.
Before you get to the cutting edge, you need to have a good foundation. I’m going to teach you how to find and fix performance problems with confidence.
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.
In this video, I delve into the world of SQL Server agent jobs and their impact on database performance, alongside my co-host Joe Obish, a renowned expert in the field. We explore common issues such as overlapping schedules that can lead to CPU spikes and how these jobs can be scheduled more efficiently. Joe shares his experiences from various client engagements where he has helped optimize agent job cadences to avoid unnecessary load on servers. The discussion also touches on the challenges of using Query Store for workload analysis, especially when dealing with frequent agent job runs, and suggests strategies like “fuzzy scheduling” to better manage these tasks. We emphasize the importance of flexibility in scheduling to meet business requirements without overwhelming server resources. By sharing practical insights and real-world examples, this video aims to provide actionable advice for database administrators looking to improve their SQL Server environments.
Full Transcript
Erik Darling, Hello and welcome to the second episode. It’s no longer a podcast. It is now officially a radio show. We’ve given up on the podcast idea. Apparently, it’s just a saturated market and radio is where it’s at anyway. But anyway, welcome to the second episode of Bit Obscene, the radio show. I’m Erik Darling. I do consulting things with SQL Server. And joining me today, again, keeps coming back is Joe Obish, world, world renowned SQL Server agent job expert extraordinaire. And if you hit, if you hadn’t guessed by that introduction, the topic of today’s show is SQL Server agent jobs and like a SQL Server agent in general. So let’s get started with that. Joe, take it away. You apparently have some thoughts on agent jobs and I’d love to hear them. What? This is a radio show. Why bother digging this SQL Server agent job on my closet? You know, you can’t see people on the radio. My second thought is we’re flipping the script a little, Joe. I thought we’re doing so well that we had our first sponsor. You know, they rearranged. Send everything. You seem to be telling me that that I might have misread your email and I realized as not sponsoring. This is out.
They were technically anti-sponsoring us. They wanted you to not show their products in the background. It’s too late now. Yeah. We’re screwed. Here come the lawyers. You know, now our viewers might be thinking, well, punching bags has to do with databases. And I think there’s bigger connection. You know, for example, when you’re in SMS, you know, and you see the little like a database.
What’s what’s shaped like a cylinder, right? It’s also shaped like a cylinder. It is. Until you punch it enough. It gets a little. And speaking of punching, I’m sure you’ve met many developers. You basically treat the database as their own personal punching bag.
You know, various metaphors would apply there, you know, punching bag, toilet. You know, there’s lots of things that lots of maltreatment of databases by developers out there. And for the most part, databases can’t even fight back. They can’t defend themselves.
You know, they just have to sit there and take whatever abuse is hurled at them. I think databases have their own way of, you know, sort of protesting developers various ways. I think there’s a marking up to it here from Microsoft.
You go to the cloud and you get a free punching bag because you’re, I mean, you’re certainly going to need it. Well, when you move to the cloud, you are the punching bag. Microsoft just continues to hang you upside down and punch money out of you.
When I’m looking at a database and I see, like, my, like, 10th parameter sniffing issue of the day, you know, the punching bag really comes in half. Do you get parameter sniffing from your agent jobs, Joe? Oh, okay. You want me to end the account of your team?
No, I want you. I didn’t know if you were looking for a segue or not. So Asian jobs are, I guess, another example of how databases can be abused. It’s also an example of how Microsoft abuses us.
The other day I was looking into getting some, you know, like, they have Asian jobs for Azure SQL database now. Just what it is. It’s like this type of thing you have to buy.
You charge, like, $23 a month just to run Asian jobs. It’s, uh… $23? That seems cheap. It seems cheap, but, you know, it’s got to be pure profit for them.
Oh, yeah, totally. I mean, Asian is existing code. Nickel and dimed or, you know, run Asian jobs. Microsoft is turning into Oracle in all sorts of fun new ways.
They have subscriptions to everything. You want to plan cash? $10 a month. I thought we wanted to have, like, Microsoft guys on here one day.
You’re kind of, you know, kind of sinking. I, uh, I’m not entirely sure who from Microsoft would be brave enough to join us. Brave and foolish.
I can think of one man who’s brave and foolish, you know? Yeah, me too, but, you know, I don’t know if he has a working headset. It’s a big question.
I have somewhat of a narrow view when it comes to, you know, what problems people have with their databases. And by that I mean I’m not a consultant. You know, I don’t put my suit and tie on in the morning every day.
I haven’t seen that many production databases. Now, you, on the other hand, have a very wide view of the outside world. You’ve seen my databases while you’re doing your, I don’t know what’s called, Erik Darling Deep Dives?
Is that what you call it? Uh, you know, I don’t have an official name for anything. Oh, you’re very marketing.
Yeah, you know, it’s sort of an undefined relationship. It’s whatever you need me to do with SQL Server, we can work through it together, you know? So while you’ve been doing what people need, how often have you informed them that they have agent job problems or difficulties or, you know, there’s areas to improve with that aspect?
So, I mean, the majority of my work is, of course, SQL Server performance tuning. So if someone has an agent job that is doing something particularly heinous from a performance perspective, then, you know, tuning whatever is inside that agent job is certainly on the menu. Sometimes I’ll help people set up certain agent jobs to do maintenance and stuff if they need that sort of help.
A lot of the times when that is something that people require, that usually takes the form of me fixing their index maintenance to stop doing stupid, useless rebuilds and reorgs and just continue to update statistics. That’s probably the most common thing that’s, aside from, like, you know, performance tuning, whatever is encapsulated by the agent job. Yeah.
What kind of things have you seen, Joe? And now you’ve said that, I’m wondering if my experience is a bit atypical. You know, maybe it’ll be instructive anyway. So I think there’s a trap that can be quite easily fallen into depending on how much you use agent jobs.
Like, it’s natural to, you know, like, say you have a bunch of tests you want to run. Okay. I’ll run these tests every hour.
These tests every 15 minutes. These tests every five minutes. These every one minutes. It’s a lot of overlapping job schedules there, Joe. Yeah.
So, like, you know, like, it’s a very natural thing to think, you know, like, you’re basically thinking in terms of, like, like, how would a human do a chore, right? Like, oh, every hour, every 15 minutes. Yeah.
And as you pointed out, when you pick these, like, you know, human-friendly frequencies, they all overlap. In the big, in the grown-up world, we call them cadences, Joe. Cadence.
Cadence. I don’t think I’ve ever seen the word cadence in SQL Server. And what cadence would you like this job to run? Anyway, if you do the math or if you, you know, like, envision a schedule, like, every hour, you’re going to have every single agent job kick up. Yeah.
If you follow the, you know, every five or 15 or one hour. Yeah. It’s like, I have definitely seen servers that have huge CPU spikes every 15 minutes. Mm-hmm.
Just because, you know, do the hell agent jobs are scheduled, you just get that huge burst of work. Yeah. Every 15 minutes. Yeah. I mean, it’s to the point where, you know, like, it was definitely, the server was noticeably slower during those times.
Like, you know, like, if you’re in the application, you can notice it. If I was doing great performance tuning, it’s like, oh, this, this, you’re just taking longer than I thought. And I would look at the time and, oh, well, it’s, you know, 3.15 p.m.
So, of course, it’s taking longer. Also, wait a minute. So, a schedule, you’ll shoot up or you’re blocked or something. Yeah.
And, you know, like, it’s also, if you’re taking a holistic view of late stats, it can really trip you up there, too. Like, oh, well, you know, over this four-hour time period, I’ve got 50% CPU average but a ton of SOS waits. You know, like, why do I have only 50% CPU but a ton of processes are spending their time waiting on CPU?
It’s a scheduling problem. But, you know, like, and there’s certainly inefficiencies in scheduling. But for the workload I was looking at, it just ended up being a lot of it was just Asian jobs.
You know, you’d have these big bursts. And then when you look at a bigger window, they’ll throw everything off and make the weight stats analysis misleading. Well, yeah, you know, I think it’s especially perilous because, you know, Microsoft does not give you any particularly useful tooling from the server level to show you when exactly weight stats happened.
You know, you can get some database level stuff out of Query Store, but you might not have Query Store enabled for every database. Not every database needs that level of monitoring. And, you know, there’s a real disservice there because what may look like a real problematic workload generally is really just a problematic workload on the hours.
Right? Yeah. Can you change the Query Store, like, roll up interval?
I don’t remember. What, for weight stats? No, just in general. Because, I mean, by default, things are summarized by the hour. Yeah, you can change the interval, I think, to different cadences.
But that’s something that I generally do just because usually by the hours is okay enough. I would say it’s okay enough unless you have this problem or, you know, many other problems that we’re not going to talk about today. Yeah.
Yeah. And, you know, like, maybe there’s this argument for, well, I do wonder if some people think, oh, well, like, I have to run this code every hour for business purposes. You know, the business demands that this is my requirements.
I have to do it. Sure. So, I don’t have a choice. Maybe people are expecting some data set to refresh every hour. Yeah. And, like, I think that needs to be decoupled from the actual scheduling. I think I’m a big fan of what I’ll call fuzzy scheduling.
Like, you need to run every hour. You know, like, don’t run it every hour. Like, run it every 3,593 seconds, for example. I would even take a different approach than that.
I think that if your job is, or rather, if your agent job is focused on some sort of data refresh or data purging, you should run that much more frequently. Because if you think to yourself, it’s sort of like the, for me, it’s sort of like log backups, right? Like, people will set up log backups in some schedule that meets their RPO goals, right?
Like, let’s say the business is like, you can’t lose more than 15 minutes of data. So, you set up log backups every 15 minutes. But that’s kind of a stupid approach because you should be aiming to beat whatever goals you’re given so that, like, if anything ever happens, you’re safe, right?
Like, if you don’t want to lose 15 minutes of data, you should be taking log backups, like, every five minutes. Because that way, you’re protected if anything happens to one of those log backups. In the same sort of vein, I would say that if you need to do an hourly data refresh, if you count on doing that refresh every hour and you, that task starts taking longer and longer, you’re going to start missing that goal.
So, you should do that more often because that way, if you do it more often, you’re moving smaller chunks of data and you have a better chance at meeting that hourly goal. You no longer put all the strain at that 60-minute mark. I think the point you just made leads into what I was going to say, which is, you know, it’s not like anyone cares when a process starts.
They care when it finishes. So, to even say, well, you know, like, I’m thinking about, you know, okay, well, for whatever business reason, we can’t refresh our item or quicker than an hour. Or, you know, that’s what the customer says they want.
Maybe the data would be incomplete if they didn’t refresh or something like that. I don’t know. But, like, even then, like, starting at the hour, on the hour exactly isn’t meaningful because you have no idea how busy the server will be. You don’t know if you’ll be waiting for locks.
You don’t know if you’re thinking, well, does that right fail? Right? You know, last time we talked about the many, many ways your perfect code can still fail. Absolutely.
Right? So, I think that it’s important to be flexible in how you schedule your agent jobs. If you, you know, have a problem like this where you have many agent jobs and there is a nozzle tax on your server CPU. And along the lines of what you’re saying, changing the cadence could be helpful in some cases.
I think even so, like, you still don’t want to schedule on those nice, even cadences if you can help it because then you’re going to get the overlap. And, you know, if you’ve got your four core loan machine that’s chugging along and you kick off 50 agent jobs at once, you’re not doing anything useful other than generating waste stats. Or worse, just, I mean, well, I mean, it’s still a waste stat, but just, you know, kicking off 50 agent jobs on a four core server, you have a very, very strong chance of generating some thread pool weights, I think, or resource semaphore weights, depending on what exactly.
Those queries are up to. Yeah. I mean, you’re not doing anything helpful or good or useful.
So, I mean, you’re, you’re, you’re, you’re basically launching a denial of service attack on yourself. So, I’m of the opinion that, you know, scheduling agent jobs every minute or five minutes or whatever, you just should do it. You know, like, even just doing it, like, every 61 seconds or 59 seconds and changing it for a job is, you know, that way you’re not getting that.
You know, it’ll overlap, you know, it’ll be offset. Um, the other thing I’ve done sometimes is just, you know, combining agent jobs.
Sure. Like, five things running every minute, you know, you could, you could use the, uh, you could have, like, multiple steps for job or some other method, you know, things like that. Sure.
Um, I was able to get pretty far with those types of approach, both combining agent jobs that had similar cadences into one. Yeah. And then changing the cadences to not be even minutes.
Yeah. You play a lot of FizzBuzz game with them. One, one annoying thing is if you create, um, if you create an agent job with, with a frequency of many seconds, I think it’s over a hundred. Mm-hmm.
SSMS throws an error if you try to open it. Hmm. No, I’ve never tried to do that. I, I, I guess some developers assume that the number of seconds in a frequency would always be two digits. So if you, so if you pick something like 301 seconds, you know, five minutes plus one.
Mm-hmm. And then you try to view the, the schedule. Yeah. And that’s the mass that there was an error. Yeah.
Other than that, it still works. They were probably expecting 15, 59 seconds as being like that. Yeah. Yeah. Because, you know, like, because why would someone pick 301 seconds as opposed to five minutes? Yeah.
It’s just, well, why wasn’t, why wasn’t that developer defensive enough to, to say I can convert this number of seconds to a number of minutes and have to have a remainder? I mean, I’m a, I’m a math idiot and I know you can do that.
I’m sure it was done like 20 years ago and, you know. What? Before math. No one cares. Um.
So I’ve done tricks like that to, you know, you know, when looking at servers of many Asian jobs and you have the CPU spikes, that’s one way of addressing it. Yep.
Yep. But sometimes even that isn’t enough. Yep. Um. For example, if you have many, like, tenant databases, you have one database per customer or whatever, and they all have the same Asian jobs. You know, like, if you, if you use scheduling tricks, okay, I’m not going to do it every five minutes.
I’m going to do it every five minutes plus one second. I’m going to combine things. If you have 10 databases and they’re all doing the same Asian job work, then you’re still multiplying that work by 10 times. Mm.
So. Yeah. That’s something I’ve run into as a problem that had to be solved. Yeah. One of my least favorite things in the world is when I, was when I expand the agent job node and I’m greeted by a list of like a thousand GUIDs because someone has SSRS reports all, you know, named in there. And SSRS reports aren’t named anything helpful.
They’re named job GUIDs. And then the job names are all GUIDs rather. And when you start looking at how those are scheduled, they’re all very, very tightly coupled. And those, those reports have a really nasty way of walking all over each other because like there’s some job step that like locks some other table and then starts the running the job.
And the whole thing is a nightmare. So that’s, that’s one of my least favorite things to see when I, when I open up agent is like, you know, multiple agent jobs, you know, you can at least, figure out what’s going on by the name to some degree, but you see those GUIDs, man, you’re, you know, you’re in trouble.
You know, you’re in a bad, bad way. Sorry. I’ve never done that.
So, uh, actually I did that once and that was really bad. Yeah. Yeah. Yeah. Uh, I didn’t know. I was, I was like, I was paralyzed. They were like, what do you want to drink? And I was like, I, I, the, the alcohol you don’t have.
Last time I’d ever do that. Yeah. I mean, uh, it’s, it’s, it’s good.
They got raised equal Saturdays because now that won’t happen anymore. Uh, well, I thought, I thought the fellow that was running SQL Saturday, Madison moved or something. Bridge, bridge.
Uh, not yet. It, you know, it’s, uh, anyway, I thought he was moving to New York. It’s, it’s, uh, all in the past, you know, your, your, uh, traumatic experience. It’s all.
Well, I, I blame that more on the, the, the people who planned that meal more than, more than anything, because there was a really good opportunity to go to a restaurant that had like a wine list. And everyone wanted to go to someplace like 20 minutes outside of Madison that had like a soda, a soda, a soda list. Great barbecue.
I’m sure. But who could tell because you can’t drink with it. You possibly enjoy a meal. So speaking of not enjoying things. Yeah.
How do you, how do you solve that SSRS problem? I have not yet had the pleasure of working with SSRS. That’s a mix of things. You know, SSRS reports are typically written by absolute buffoons and just need some tender loving care to fix them up. And so you, if you can make all the reports fast enough, generally the scheduling problem is a little bit less, a little bit less wooly.
Uh, then, then, uh, then like, then like with agent jobs that actually do like meaningful work. Do you think any of those buffoons watch our videos? God, I hope so.
So they hear that. Hmm. Good. Yeah. Well, if, if, if, if the view count goes down between episode two and three, then I’ll have some theories as, as to why that was. Yeah.
Well, that’s fine. You know, uh, I’m sure Everlast will be quite, quite pleased if, if our view counts go down. Um, Jimmy Everlast.
So, you know, and he said before, I mean, like sometimes agent jobs need to be optimized. Sometimes they shouldn’t exist. Yeah. Sometimes they’re not needed. And, you know, like, well, the, um, you know, like before, if you do a problem before you optimize, um, figuring out like what actually needs to be there is definitely good exercise.
Um, you know, my limited experience, you know, oh, you, someone who, they have a job that has like temp in the name was created months ago. You still need it. You have a job that’s aired out every day for the past year.
Should not be hearing out. You have a job that was, especially when it’s DBCC check TV. Like, wait a minute. You have a job that was mysteriously disabled months ago.
Yep. Should be disabled. It should be enabled. Yeah. Um, especially when it’s DBCC check TV. I’ve, I mean, who needs that, right?
I’m a big fan of, I don’t know, either source control or some source of truth where, you know, what Asian jobs you should have. Because I have gone through their size of, you know.
Yeah. Hey, look at all these great Asian jobs that are airing out and are disabled and whatever else. Do we need that? Um, I’m guilty of this too, but you know, when you create Asian jobs, there’s that little, like, that little box where you can like put notes or like, did they call it a description?
Yeah. You ever see that fill down? Uh, I do when I create them. Oh, wow. All right. Yeah.
Look at you. The, the, the, the, the model SQL Server you deserve. Well, I mean, let’s, let’s, let’s be real here. Uh, the, I, I don’t, I actually don’t, I don’t recall exactly how many agent jobs I distribute in my GitHub repo, but I, I make, I make darn sure everyone knows that that agent job came from darling data in the description so that when they open it up, they go, ah, I see.
Okay. That’s where that came from. The little, do a little self-promotion for the 0% of people who open agent jobs and pay any attention to what the description is.
I, I do think though, if you’re creating a temporary job, you know, you’ve got, you know, your future self will thank you if you like write even like one sentence or maybe even a couple of words in that description saying like what this thing has.
Well, I mean, I, I think it would be very helpful if agent jobs had like a very easy expiration date to apply to them. Like this job will self-destruct in three months when.
Oh, we can use the word easy. I think easy is the key word. Oh yeah. Easy is definitely the key word there. There’s definitely hard ways to do that. It’s not, not a lot of easy ways to do that.
I, I think I, and I, I have that, uh, I have that Asian job open source thing I did. Yeah. I don’t think I have anything written to the, to the description of the notes.
Oh, there you go. You are, you’re guilty of your own crimes too. It’s true. It’s crazy. Maybe I should go back.
Crazy to think about. Maybe I should sell that as, as advertising space, you know. There you go. Yeah. Just put like everlasting there and see, see, see how fast, see how, see how well that goes. Wow.
Joe’s everlasting agent jobs. No explanation. I mean, a lot of Asian jobs are, are everlasting. You know, they get created. No one knows why they’re there and they never get deleted. It’s true. Yeah.
They just run forever and ever. They error out forever and ever. Um. Speaking of agent job errors. One of, one of my least favorite facets of SQL Server agent is that it does not run with fancy set options that match what you get from SQL Server management studio.
If you just crack a query editor window open. And that has led to a lot of grief from people who, uh, try to use things like, you know, filtered indexes or, uh, computed columns or index views, because all of a sudden agent jobs start failing with these mysterious errors that did not fail when you, when you tested your code carefully in a SQL Server management studio tab.
And I think. That’s interesting. What? I actually didn’t know that. Um. Yeah. It’s crazy. Can you create a server procedure that forces the right options?
Well, I mean, you, you just have to put the right anti set options and like the job step or whatever. Like if it’s called a procedure, you can put it in the store procedure. If you’re just calling like code in a job step.
Yeah. You can put the set options in the like pre code run, but yeah. I don’t want to do that. Oh no. It’s. No.
It’s not fun and sexy to mess with. I don’t see set options. This would just be right. Uh, I guess along the, that topic, how do you feel about, you know, agents absolute with, with a ton of TC goal, just in that little unwieldy box.
Dim. Very dim. Dim. Very dim. Yeah. I’m with you on that one. Um, I mean, I’m sure in some cases you, you can’t use a procedure because you’re doing like illegal things, but if, if it can all be helped, you know, I’d much rather see agents out calling a procedure and that procedure has where you can, that way you can.
You can’t. You can’t. Totally. You can’t. You can’t. You can’t. You can’t. You can’t. It’s just, you know, um, that box is a perilous place. Um, the, the number of times when like, I I’ve tried like, like I’ve highlighted everything and like tried to do something and then like accidentally hit the space bar and had it all disappear on me.
It’s probably more often than I’d like to admit, but like, you know, trying to like edit code in there. It’s not fun. Uh, yeah, I, I, I really, I really dislike that. It’s not fun.
I mean, it’s, it’s just not, it’s not highlighted. It’s not formatted. There’s no, nothing friendly about it. Yeah. I don’t think that there’s a way to, to, to parse it. Right.
No, no. Yeah. There’s just, it’s just a blob. It’s like dynamic SQL in a, in a, in a box. It’s very annoying. I’ve seen people put dynamic, dynamic SQL in the box. Yeah.
Same. And most of the time it doesn’t work. Yeah. I like, I like when I opened that up and there’s like five or six cursors declared at the very beginning and you’re just like, Oh, where’s this going? You’re like, I don’t know.
I’m not following this here. It’s a billing hours opportunity. Huh? Yeah. Nothing says billable hours, like editing cursor code and SSMS prompt. Really, really valuable to people out there in the world.
Wow. If only someone would charge us an exorbitant amount of money to, to edit this code. I mean, don’t you mean a very reasonable, um, industry competitive amount of money? Yeah.
But you know, to, to, to a lot of people, those words don’t resonate, uh, to a lot of people. They’re like, I just want the most expensive thing. So I’m trying to position myself as the most expensive thing and worth it. Like worth every penny.
Like a Rolls Royce Bentley. I, I, I, I better be more careful with the billable hours I have left. Huh? Uh, I don’t think you have any left. That’s not true.
It is now. Don’t you realize recording this podcast goes towards your billable hours? This is, I think I’m doing this for free. I don’t think you want to make a video of yourself, like shaking down your customers.
Right. I mean, it’s not a, I don’t know, man. Uh, I, I, I’m not a business guy, but if you could care for my advice, it would be, uh, you know, do it in private when the camera’s not rolling.
Well, you know, uh, no one ever said it was a good businessman. Never been accused of that. You know what?
Another thing that really rankles me about agent jobs is, uh, way back when forget how, I forget how, how far in the way back machine this is. But, uh, I got this great idea that I would chain agent jobs together by having job steps that called, uh, like whatever, whatever the execute job store procedure is.
And my, my thinking was it’s going to call a different agent job. That different agent job is going to have its own like erroring, alerting, uh, and whatever other stuff going on.
And then I don’t have to worry about having like 50 job steps that are like, if this doesn’t finish, uh, throw an error and maybe do this other thing. And, uh, it didn’t work as procedurally as I expected it to, uh, apparently a job step just saying execute this job step is enough.
And so what would, what happened is I ended up running DBCC check DB on like 20 databases simultaneously rather than a single database at a time. And, uh, that was, that was a less than enjoyable, um, less than enjoyable time for me.
Uh, I didn’t expect that. And when I saw it in action, I was like, wow, like, I mean, that’s maybe a cool way to call like asynchronous code, like, and just have like parallelize some process.
But, um, yeah, uh, bunch of egg on my face for that one. I’m just understanding stuff. You had, uh, agent job.
Step that was just calling the procedure to execute or to like start another. What was the job or her step? Uh, so there was the overall agent job and, uh, each agent job.
Well, the, the main agent job had like a bunch led, like, I don’t know, like 10 steps in it. And each one of those steps was to call, uh, like whatever SP start job. And like, rather than waiting for the job to like start and finish it just went, well, I made that call.
It’s off doing something onto the next one. And so, yeah, I ended up, I ended up parallelizing check DB. Yeah.
I mean, to be fair, like, I think that’s, that’s the best way for it to work. Um, unless you’re counting on it to, to, to be synchronous. But well, I was, I was counting on it to be like, this job step, the job step completes. The next job step starts, calls the other job, the next job’s upside.
But no, it just went all of you at once. And I was like, ah, I’m just going to leave this out. If, if, you know, the various, you know, it’s hard to know what Microsoft envisioned with Asian jobs. Maybe, maybe it’s just the wild west support anything, but, you know, because I have that, that open source thing, which the thing implemented was comps for maintenance.
Yes. And, you know, like, I’m definitely letting them run many parallel jobs if they want. You know, I’m curious if, you know, if that’s an abuse of the framework or if Microsoft envisioned us doing it that way.
You know, I think when, when it comes to agent jobs, like there’s like, like there’s, there’s even the concept of like an agent job server where you can like have a bunch of centralized agent jobs and have them connect to other servers to do stuff. So I really do, I really do think that agent was designed for like a lot of flexibility and like a lot of like, who knows what crazy stuff anyone is going to do with any of this. And it wasn’t in like, I’ve never seen release notes where like there was an improvement or some reigning in of what agent does and can do.
Because I think that was, that was like Pandora’s box. And once they, once they let that out as it is, there was no way to bring it back. There was no way to like add any sanity to SQL Server agent.
It was just kind of like, well, it is what it is now. And just hope, like hopefully people will be responsible with it. Everyone heard Eric, you know, feel free to go wild with your agent jobs.
Go crazy. You’re not going to get brains out. Yeah. And then come pay my either reasonable industry competitive rates or my super exorbitant weights, depending on which, which you prefer. However you, however you want mentally want to classify my rates so that you can, so that you want to pay that invoice.
That’s fine with me. But if you are looking to do crazy stuff, you know, you, you want a lot of parallelism. Yeah.
As opposed to Eric, you didn’t want parallelism. I think my code is still on GitHub and it works. Why wouldn’t it still be on GitHub? I don’t know. I mean, I got deleted. I mean, I haven’t, I haven’t checked on it. It’s not like people ask questions or anything.
Well, you also didn’t really do a great job of marketing it. Wow. Okay. You didn’t. I mean, you know, it’s, it’s not enough to just.
It’s not an agent job. It’s just some sexy thing you can market. Well, but, but, but sure, you’re right. Right. But what you are offering, which was the columnstore maintenance thing is something that people fail miserably at. Don’t have a lot of good, not a lot of good alternatives.
Like all the scripts don’t handle columnstore very fluently. That tiger toolbox thing. I don’t, I don’t, I don’t even, I don’t remember.
I remember you talking about it at some point and you said it was okay, but not fantastic. You know, Nico had all those scripts in his CISL library, but he started working for Microsoft and those died. You know, so those are a little, those are a little shoebox coffin buried in the backyard.
And so like you had this, you had all this stuff that was really smart and they would do really good maintenance on columnstore. But then no one knew about it because you like. You should link in the description.
Sure. I’m sure. I’m absolutely do that. But you know, if you, if you want to call attention to something, you can’t just, you know, it’s not just a matter of putting it on the internet. People have to be able to find it and get to it. Know it exists and know it’s all, know what problem it solves.
Yeah. Yeah. Yeah. So I, so I talk about all my stupid scripts constantly. I want, I want people to, to get them and use them. And they put MTV in the video name.
No, the video name is called SQL Server agent jobs. Okay. Yeah. What else would I call it?
Hashtagotempty. Do we, do we, do you want to talk about anything related to tempty B in this? No, I was just referring to your, your, your blog post. I have like a million inch tags at the bottom.
Oh yeah. Yeah. It’s a story for another day. Maybe you should tell them the cameras are.
Well, it doesn’t really matter much anymore. I do know that. There are people out there who use the comps or maintenance thing, which is, you know, a professionally satisfying thing to know.
Yeah. I have no idea how many people there are. You know, it could be, yeah, I just have no idea. You can look at your GitHub repo and you can see what like the, the download stats and stuff are.
Oh, wait, wait. They have download stats. I thought there’s, oh, I didn’t know that. Yeah. You can see. I thought it was just like the fork count. No, you can, you can see like forks and stars at the top. And you can, if you go into like some special, like, like admin view, you can see how much traffic your repo gets.
And I believe there’s also a download, download count on there as well. Oh man. I wonder if I’d feel better or worse after checking that, you know? Uh, I don’t know.
It could go either way, right? I don’t know. It’s a big number. You feel good. Just walk in with low expectations. Don’t expect, don’t expect anything. You know, like these conversations, don’t expect anything.
That’s right. Don’t expect to learn anything or walk with anything useful. We’re still having fun. Yeah, exactly.
And, and punching things apparently. Should I get some sports equipment in my background? Uh, I thought you’re, you’re never willing to show your background. Oh, well, I mean, my background is a, is a green screen for other recording purposes.
So I could, I could, I could maybe, I could draw some sports equipment on it. Draw like a weight bench and stuff. If you want, we could like have a subscriber goal or something where if we get enough people, I could like punch the bag.
You, you, you, a lefty or a righty? Something like that. A righty. A righty. All right. I want to, I want to see you throw, I want to see you throw left-handed punches then. Okay.
I mean, you, you, you’ll have to subscribe to yourself then. You can start making some, some burner accounts and, you know, get those numbers up. You know, I, I do frequently subscribe to myself, Joe.
Don’t know. I don’t know who else would do it. I suspect there’s not really a big appetite for, you know, like I have some maintenance thing. I want to run.
I want to be like super parallel and efficient and run, you know, between 2 a.m. and 4 a.m. And that, you know, like people just don’t have those types of problems or they do what they’ll know about it. Or they do, but don’t, like there isn’t a big enough need to solve it.
Yeah. You know what I mean? Like, I, I, I do think you’re already reaching the edge of, you know, like what DBA is, or I guess, I know his developers, like, are even thinking about when it comes to, you know, solving a problem like that. Yes and no.
I do think that there is some demand in it, demand for it. So semi-recently, I couldn’t tell you exactly when, but, you know, Ola did add options to his scripts to run, run, run commands in parallel. I know it’s available for backup.
I want to say it’s also available for check DB. I don’t know if it’s available for index maintenance, but you can run Ola’s scripts and use like the backup in parallel parameter to run multiple backups at the same time. I think for maintenance tasks, the bigger rarity would be finding people who were on boxes that could accommodate that sort of maintenance in parallel.
You know, especially backups would, you know, usually a single backup does enough to, you know, completely saturate that, you know, one gig iSCSI connection they have to their SAN, which is, you know, you know, in the cloud, it’s worse. God forbid. But, you know, there is, so there is some demand for it.
And I’ve worked on at least a couple processes where the implementation, they wanted it to run in parallel. One of them was a dev refresh thing where I had to basically write, like, I used a table that would help generate dynamic BCP commands. And that would essentially like BCP out from a list of tables.
And that was designed to run in parallel. And then, like, there was an import side that was also designed to run in parallel so that they could refresh, like, a smaller subset of prod data to dev servers. And then there was another one that was a schema migration thing where they were moving from one big, long, flat table to, like, six or seven smaller, narrower tables.
And that was also designed to run in parallel. So, you know, there’s definitely a need for it in some circumstances. And I think there’s definitely a general desire for that sort of thing.
But, you know, really, I think the limit you hit is definitely server hardware and resources before, you know, like the capabilities of, you know, server being, like, you know, there’s a difference between, like, being able to do it and being able to do it well. So, one of them was multi-threaded via PowerShell. I remember that.
And the other one was multi-threaded via just setting up a bunch of agent jobs to run the tasks. So, no, this was before any such framework existed. Yeah.
This was very early in 2020. Not early in 2020. I’ll tell you that much.
If only I had been a few months sooner, you know. Yeah. Well, you know. Day late, dollar short, one parallel task shy of success. It’s all a cry and shame. You know what’s not paralyzable that I really like to be?
What’s that? All right. So, this is more of a conceptual thing. Doing a reinitialization of replication. Oh, God.
At least I think it can be paralyzed. I’ve looked. I’ve tried. I think I found, like, you know. Because the thing about replication is you go search for it and it’s like some guy I was talking about, like, you know, like 15 years ago. Mm-hmm.
Right? Like, there’s ever anything, like, new or current or like that. Replication has largely fallen by the wayside, I think. Like, I see fewer and fewer people actually using it or interested in it or wanting to know more about it. And there are those of us who are stuck with it now forever.
We inherited it. Yeah. And, well, there was some, like, weirder place in Microsoft that sounded really terrible. I don’t remember what it’s called. But maybe we can, you know, one day just read the documentation, like, make fun of how bad it is.
But, yeah. That’s a great way to make listener count go down is talk about this feature no one remembers the name of. It may be replaced replication.
That’s. It can be cathartic to hear experienced database professionals make fun of the software. Well, is it?
At least it is for me. Would it be fair to make fun of it without trying it? Yeah. Okay. I mean, what, I mean, like, surely you can make fun of big data clusters of that triangle. Well, they died.
So, I mean, yeah, yeah. Like, it died so funny. And think about how much material has to be there. You know, it made it. Couldn’t even make it to the next release. Well, I mean, I know that there were a few books about it. And, you know, there are a lot of people who told you if you didn’t get on board with big data clusters, you were, you might as well just dig your own grave.
I don’t know if that, I don’t know if that quite turned out. But, you know, the hype cycle moved to Synapse and now the hype cycle is on Fabric. And so we’ll just have to wait and see what the next hype cycle is so we can make fun of the Synapse and Fabric, right?
We can’t make fun of the current hype cycle. I feel like Microsoft has a community obligation to buy up all the books that those poor people wrote about features that they, like, immediately killed off. You know, because it’s got to feel so horrible to write a book about something and then Microsoft just, you know, kills it.
Yeah. I mean, so there’s two ways that Microsoft kills features, right? They either, like, outright kill them like they did with big data clusters or they kill them with negligence.
So, you know, stuff like, you know, I don’t know, what’s one that has been neglected for a long time? Like, let’s say service broker. I know that there was a lot of information for a minute and a lot of people who were like, yeah, service broker.
This is going to, like, revolutionize the modern DBA for the next 10 years. And then everyone was just like, I have to send XML messages. But, you know, I’m not doing that.
It’s not fun. I don’t know. I mean, Microsoft is, like, sort of silently killed, like, I don’t know, partitioning and indexed views because they won’t put an ounce of effort into them. So, you know, there’s the outright bang death and then there’s, like, the ghosting death.
Microsoft has a bad habit of ghosting features after it goes on these hot and heavy dates with them. All right, Microsoft. Well, if you’re listening, it appears that Eric is going to complain about the state of partitioning on every single one of these that we do.
Damn right. So, you know, if you want to stop, you’re going to have to do some piece of it. You know, I’ve got to put a little bit of effort in.
Just support min and max. So it’s a good start. Support min and max in partitions. It’s a great start. I think we should wrap back around what we’re supposed to be talking about.
Agent jobs? So, yeah. Okay. Like, you know, like, let’s say you have 10 databases. They all have the same agent jobs. So you’re getting that clipback.
Collision of cadences. So to speak. And you’ve got your CP specs every 15 minutes. I don’t know if this is the best way to solve it.
But the way I solved it was to, you know. Is there a easier way to do it? No, not really.
I like how you work that out in front of everyone. I changed the start and end times. Yes. By, like, a random factor.
So, like, it would essentially skip up to one execution around midnight. Yeah. And that random factor.
Well, I mean, like, it actually wasn’t random. It was based off, like, a check sum of the database name and the agent name. So, like, in effect, you know, like, say you have some Asian jobs that run every 301 seconds. For one database, they’d start running at, like, you know, at, like, one minute after midnight.
For another, it would be, like, one minute 35 seconds after midnight. And so on. Yeah.
So that was a way to, you know, to shift all those previously concurring Asian jobs. Yep. And now they’re just running throughout the day. Yeah.
I admit it was a bit distasteful to, like, you know, to, like, lose an execution, possibly. And they justified it, but they, well, because, you know, Asian jobs can always fail.
Sure. Right. Or, like, Asian jobs gets turned off. Sure. It goes down. Like, there isn’t really a universe where you can say, okay, I’m going to run this every five minutes. And it’s going to check the last five minutes of this queue table that I’m looking at.
Yep. Like, that just isn’t safe to be. Right. So, assuming all our Asian jobs were already written correctly, which is, you know, not at all a bad and reasonable assumption. Mm-hmm.
It’s perfectly fine to miss up to one around midnight. Sure. And that was the way that I, like, spread out that word. Yeah. I mean, I think, you know, the nice thing about Agent is that, you know, there is some built-in retry facility. Because if you put it on a schedule to run every X whatever, then even if one fails, the next job will hopefully just pick up whatever didn’t go right last time.
Or it will, you know, just either that or, like, something just real screwy happened and it’s just going to continue to fail until a human intervenes anyway. Right? Like for replication?
Yeah. Can you stop saying the R word on this radio show? Give me the willies. Oh. Oh.
And now I know what to spend my billable hours on. You will get very little useful content out of me if that’s how you choose to, that’s the path you choose to walk. But anyway, I do have to wrap things up here.
I have to get my kid to an after-school thing. So, why don’t we summarize neatly SQL Server Agent? So, in my case, I was able to solve the production problem.
You know, CPUs smoothed out. There weren’t any jumps. I didn’t have to look at the clock and be sad because it was half an hour on the mark. So, if you’ve got a lot of agent jobs, you might have problems where you have CPU spikes during the day because the schedule is going to roll out.
Some ways to solve it are combine agent jobs together, you know, run them in steps. Don’t run things every five minutes or every hour. You know, schedule them like a computer would.
Think like a computer. Run it every 31 seconds, every 61 seconds. That way you’re getting that offset. And if need be, if you’ve met many databases all running the same jobs, you can even offset the start time so it’s not overlapping.
And, you know, like, just as a general sanity thing, have some source of truth for what agent jobs should be there. Either source control or something else. You know, get your agent jobs you don’t need.
Fix ones that are airing out. Things along those lines. Anything you want to add? Yeah, I mean, you know, for me, the general caution with agent jobs is, you know, aside from everything you mentioned, is that they don’t run with the ANSI set options you might expect. And that if you attempt to chain agent jobs together, you might be in for a very parallel surprise.
So, those are my two points of wisdom about SQL Server agent. And that little text box and SMS. Don’t use that.
Yeah, don’t use that. And also write descriptions for your agent jobs so people know what the heck they’re doing. Documentation is important. Something like that. All right.
Well, thank you, everyone, for joining us for the second episode of the Bit Obscene radio show. It’s a little tardier than we expected. First, Joe was sick.
And then my family had some sort of, like, hyper-advanced, extended, played RSV remix going around. And last week, I was a disaster area. So, I’m just getting back to normal now.
And hopefully, we will have a brand spanking new episode for you on a more timely cadence for your listening pleasure. So, thank you for listening and or watching. Joe, would you like to say goodbye?
Or is goodbye too hard for you? This is why I don’t make promises about when the next episode will be. So, I feel very vindicated.
And, you know, my lack of commitment. All right. Well, your lack of commitment is obvious. Thank you, Joe. Thank you, Joe.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.
SQL Server Index Design: Getting Key Column Order Right
Video Summary
In this video, I delve into how data is stored within rowstore B-tree indexes in SQL Server, focusing on the mental visualization technique that can help you understand index structures more intuitively. While Microsoft has made some features only available in Enterprise Edition, which can cause performance issues when moving to Standard Edition, I use the `generate series` function from SQL Server 2022 to demonstrate how this function can be a valuable tool for visualizing and optimizing indexes. By cross-joining the function multiple times, I illustrate how data is logically ordered within an index, emphasizing why it’s crucial to consider key column order when designing indexes for optimal query performance.
Full Transcript
Erik Darling here with Darling Data. Feeling hyped up, amped up, ready to go, ready to plow through this day. In this video, I want to talk about how you and your mental brain should visualize, it’s not like the literal storage format, but it’s a good visualization practice. for how data is stored in rowstore indexes. This is not a video about columnstore indexes. columnstore indexes are a completely different thing. No information that I give you here to be used for the mental visualization of columnstore index storage. Just rowstore B-tree indexes, alright? Nothing else. Isn’t those crazy in-memory hash and range indexes or whatever? This is just rowstore B-tree indexes, alright? Nothing else. Just regular rowstore B-tree on-disk indexes, alright? So, I want to start this video off not surprisingly lodging a complaint against Microsoft. And the reason why this is a complaint is because they’re usually smarter than this. Usually, like, take developer edition of SQL Server, for example. It’s Enterprise Edition.
All the Enterprise Edition features, there’s no way to turn Enterprise Edition features off in Standard Edition. So, they get all you suckers hooked on Enterprise Edition stuff, and then when you’re like, haha, well, I’m going to get all this cool stuff for two grand decor, and you go to restore your developer database over to Standard Edition, and Standard Edition’s like, whoa, whoa, whoa, whoa, whoa. Whoa. Can’t do that here. Or even worse. You unwittingly, unknowingly, were benefiting from Enterprise Edition Optimizer features that you stopped getting in Standard Edition, and all of a sudden, performance sucks. The only way for you to fix it quick, you get Enterprise Edition. Usually, Microsoft is real smart about dangling that candy in front of you, waving that lollipop in your face. But they’ve gotten, apparently, they’ve just gotten stupid over the years. Because watch this. So, generate series is a new function for SQL Server 2022. And like a lot of new functions, like string ag and like other stuff like that, string split is another one, they hit it.
They’ve tucked it away behind a compatibility level. But like for a lot of stuff, you can tell SQL Server what compatibility level you want your query to run in. So, I’m going to tell SQL Server, I want you to run in compat level 160, and I want you to use this generate series function. But it tells me it’s an invalid object name. It’s invalid, they say in France. I don’t think it means the same thing. I think it’s different. And this is stupid. If you want to get people hooked on something, you’ve got to make it easy for them to get.
That’s why drug dealers are very successful. And Microsoft could be as successful as drug dealers. They would just stop being stupid about these things.
So, let’s get rid of a couple things here. Let’s move on. Our demonstration for the day. Let’s put our database into the 160 compat level. Let’s get rid of this foolish thing. Which does us no good.
No good whatsoever. And let’s frame up this query nicely. Now, to make things short and tidy, what I’m doing is I’m cross-joining the generate series function to itself four times.
And I’m only generating the numbers one through five. This is just to keep the result set small and manageable to help give you some drugs. By drugs, I mean an easy way to visualize index contents in your rowstore, Btree, on disk index.
So, stick with me here. I’m going to run this query and the results are going to look like this. And if I can just grab that thing and buy ourselves a little bit more real estate up here, if we were to create an index across these four columns, this is about how the data would be ordered in that index.
Maybe not perfectly physically ordered, something that’s logical ordering, but just to give you a nice mental grasp on things, the primary sorting of the index index is the first key column.
And if we scroll down through first key column a little bit, eventually, first key column, around the 125th row or so, is going to flip to the number two.
Right? And if we scroll down another 125 rows, we’re going to flip to the number three. Right? You can see that flip right there. And if we just, you know, real quick go down, we’ll see four and we’ll see five, and five will be the end.
So this is why people will tell you things like, this is the column that’s in like a lot of where clauses. This is like the primary thing that users are going to be looking to find when they search for data.
It’s a good idea to have that be the first key column and to have other, perhaps less common, less commonly searched columns be later in the index, because this index will define the primary sorting, sorry, this column will define the primary sorting of the index.
And guess what? Data is real easy to find when it’s sorted. That’s S-O-R-T-E-D, not S-O-R-D-I-D. Sorted.
Sortied. I think it’s French for exited. Maybe. I don’t know. So like, if you wanted to find values in your first key column, where they equal two, well, it’s really easy to seek to this chunk of twos.
Right? And once you’re in this chunk of twos, it is very easy to navigate to other key columns, through other key columns in the index.
So once you have, once you have that range of twos, it’s real easy to say, find all the ones in the second key column, because they’re all in order. Right?
And if you wanted to extend your where clause and say, I also want to find all the ones in my third key column, those would be very easy to find, because you would have all the ones in order here, and then all the ones in order here.
And if you wanted to extend that to a fourth predicate and say, I only want to find the ones in my fourth key column, that would be very easy to do. You could find two, one, one, one very easily. Right?
There’s all sorts of things about index stores that make these sort of searches nice and easy. But it’s also why if your queries aren’t using the first key column, all the other key columns are a bit of a jumble, and it takes a lot more work to go find values in them.
Like, let’s say we wanted to find all of the fives from the second key column. All right? We would have to go through all the ones, twos, threes, fours, and fives in the first key column.
Right? And then we would have to find where, right? Because we can’t, we’re not searching on this. So we can’t seek to anything in here and then go find anything here.
We need to find everything in here. It’s a five. Right? And we have fives throughout all of the different rows. in the first key column.
So searching on this, just this, is not going to be as efficient as searching on this and this. Okay? Now, let’s play that again with the third key column.
You can see that the fives get even more spread out. Right? Because now we have to go through all the ones, twos, threes, fours, and fives here.
We have to go through all the ones, twos, threes, fours, and fives here. And then we finally get to where the fives are in this third key column. But you can see how spread out they are after the other key columns are sorted for the index.
And if we take that a step further and we go and we look for where the fourth column equals five, the values are even more spread out because we have all the ones, twos, threes, fours, and fives here.
All the ones, twos, threes, and fours, three, fours, and fives here. I should go back to kindergarten. All the ones, twos, threes, fours, and fives here. And then we finally get down to where there are just fives here.
So this is why I record a lot of videos about indexing, about index key column order, and about why it’s really, really important for you to think about this stuff when you’re tuning queries.
Not necessarily at the initial index design phase because when you’re initially designing indexes, you might have very little idea how your application will actually be used. You might have a few ideas about how it will get used, right?
You might have a few modules that do specific things, but as soon as you let users free in your database, it’s very, very difficult to predict what kind of queries they’re going to be running, how they’re going to use your application to do things, right?
Things that might seem totally logical, straightforward, and obvious to you as the application developer may not be totally obvious, straightforward, or anything to end users once they start poking around in there.
You might think, well, everyone’s going to want to search by this first key column. It’s going to be great. Look how fast this stuff is. Then users get in and they’re like, no, I really want to find stuff in the third key column.
And all of a sudden, their query performance is not as good as your query performance, especially as that database gets bigger and bigger and bigger and things grow. Queries have to get more data, work through more data in order to find people are interested.
So, this is a good way to visualize how indexes store data. Again, it’s not the exact physical implementation.
It’s just a good visualization technique for you. good idea to have the columns in your filter definition be at least included columns for the SQL server has them all available in your index to use.
So, yeah, when you’re designing indexes, you’re thinking about index key column order, there are a lot of things to think about, but it becomes easier to think about all of those things when you have a good mental picture of how your query can navigate through your index to find the data that it’s looking for.
This is why pretty common advice is to, you know, go for equality predicates first because to some degree, equality predicates will generally be more selective than inequality predicates. This is less true depending on the data type of the equality predicate, right?
Like a bit column, potentially not as selective as, say, an identity column, right? An integer, big int or something. Range predicates like, you know, start dates, end dates, things like that.
You know, different things to think about. Like, this search might be selective if we’re only looking for a week, but this search we’re looking for 10 years, not very selective. So there’s all sorts of things that come into play when you’re designing indexes, but keeping index key column order in mind so that you can narrow down what queries are searching for and figure out what a good leading key column index is, this makes life a lot easier.
You can think of each column in the key of an index as sort of like a gatekeeper to the next column, right? If you are not searching on the first column, data in the second column is not going to be in good order.
If you are not searching on the first and second columns, index in that third key column is not going to be in a good searchable order and so on. So hopefully this was helpful to you.
Hopefully this will aid you in your quest to design the perfect index for your query or queries. I don’t know.
I have a call starting soon. I should probably go prepare for that inevitability. But if you enjoyed this video, thumbs up is always appreciated.
If you like this kind of content, a little subscribe is always appreciated. It’s free. It costs you nothing to get a notification that I recorded a video.
And as always, I hope you enjoyed yourselves. I hope you learned something. And I hope to see you in the next video, whenever that may be.
So, thank you for watching.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.
How To Evaluate Index Effectiveness While Tuning SQL Server Queries
Video Summary
In this video, I discuss a practical approach to evaluating the effectiveness of your indexes when tuning queries, using specific examples from the Stack Overflow database. I walk through running several queries that progressively narrow down the result set based on score and date filters, demonstrating how index seeks can sometimes be less efficient than expected. By leveraging trace flag 9130, we explore how residual predicates are applied later in the query execution plan, highlighting where most of the filtering actually occurs. This insight is crucial for making informed decisions about indexing strategies, especially when dealing with range predicates and varying data selectivity.
Full Transcript
Erik Darling here with Darling Data. One thing that I’ve noticed recording these videos, I don’t know if it’s the professional grade studio lighting that I use here, but I’m getting a lot more gray in the mustache, particularly right here, but over here too. This I was used to a lot of gray in, but the other stuff is becoming new to me. I don’t think that I’m quite ready to start investing in various dyes, other ablutions, but there is a sign, isn’t there? Anyway, today’s video, long awaited, much anticipated video, I want to talk about how you, the professional query tuner that you are, can you can sort of start to figure out how effective your indexes are for the query that you’re tuning.
Now, there are all sorts of things, other things, all sorts of other query inhabitants that might inform your indexing decisions. You know, like, quality predicates, joins, presentation, order by, top order by, offset fetch, windowing functions, partition by, order by, we’ve talked a bunch about those sort of recently. All sorts of things that you might consider when creating indexes.
But one thing that a lot of folks don’t consider when creating indexes is just how effective those indexes are for helping them, for helping their queries find the data that they’re looking for. So, what we’re going to do here is run a few queries.
And now, these queries are sort of, these are not slow queries, first off. These queries are not slow. This is just a simple exercise to help you determine how effective the filtering your queries are doing in your indexes are.
Now, a couple of things about the predicates in here. And these queries are very specifically engineered to show you sort of like a worst case scenario for indexing choices. So, this last activity date column is a little funny.
So, when the Stack Overflow databases originally got chopped down to different years, like the Stack Overflow 2010 database, the Stack Overflow 2013 databases. I don’t know how many other versions of the database exist out there in the world.
I have a columnstore version of the database. I don’t think anyone has ever used, but it’s out there. I pay like 25 bucks a month to host a torrent for it.
So, that’s fun. But, so, when the databases got chopped down, we really just cut off the creation dates at a certain point. There are other date columns that live in the 2010, 2013, and probably other versions of the database, that have a much wider range of dates in them.
So, like the last activity date on some of the posts in the 2013 database is way beyond 2013, which is kind of a funny data anomaly. But, you know, what can you do?
I suppose you could run some date math update to flip stuff back by like five years or something, but I don’t really see a lot of value in that. But, this predicate here is designed to be very wide open.
So, this is just going to grab everything in the table. And, this second predicate on score, this, I’m going to keep increasing this value to return fewer and fewer rows. All right.
So, if we run these three queries, that one’s looking for a score greater than or equal to 5,000. This one’s looking for a score greater than or equal to 15,000. And, this one’s looking for a score greater than or equal to 25,000. If you run all these, what you’ll probably notice in the results is that, just like I promised, fewer and fewer rows come back as we increase that date filter.
Now, if we look at the three queries that ran, let’s just tighten this up a little bit. Nice, tight screenshot there. We’ll see that, well, sort of weird that even as we return fewer and fewer rows, the index seek takes longer and longer.
Right? So, this one is 234 milliseconds. Actually, this one went down a little bit.
Hmm, 226 milliseconds. This one’s 235 milliseconds. Well, actually, you know what’s funny is, like, prior runs of this demo, like, the seek time would actually go up by about 10 milliseconds each time. I don’t know what’s different this time.
I guess I’m just lucky because I’m on camera. Apparently, the camera does not add 10 milliseconds to your query plans. Just makes your cheeks look extra rosy. But anyway, what the seeks look like for all of these is this.
So, we have seek predicates down here that find some of the data, and then we have these residual predicates up here. Right? So, we’re able to find the start and end date.
We’re able to find at least the start of, or rather, we’re able to filter down to some of the 5,000. But we do the residual filtering later. We can tell we do the residual filtering later.
If we use this very special trace flag called, well, I mean called 9130, numbered 9130, I guess. So, if we run these three queries, what the trace flag will do is take that residual predicate on score and apply it later in a filter. So, if we look at these and look at the index seek now, we don’t see any references to score down here, and we don’t have that other residual predicate up here.
What we have in the query plan is a filter, a secondary filter. That’s a good word, good term. And that is where the score predicate gets applied.
Now, what I want you to notice is that for these three of, like, let’s tighten this up. Let’s kind of tighten these way up, way tight on these. I don’t know why my mouse isn’t cooperating so much today.
A bit frustrating. But if we look at all three of these seeks, we can see that these seeks don’t really narrow down the rows that we’re hitting at all. We basically seek through the index to every value in the table.
Where the filtering actually happens, where we actually reduce the result set, is when we apply that score predicate outside of that range predicate on the dates. So, bringing it down to 5,000 gets us to 73 rows. Bringing it to 15,000 gets us to five rows.
I don’t know why I didn’t get my dot there. Give me a dot. Come on, baby. There we go. And bringing it to 25,000 gets us down to one single row. So, what you can, so, using that, using that information, we might decide that the query that we’ve designed, or rather the index that we’ve designed for this query, it’s not doing, not holding up its end of the bargain.
We’re not getting good filtering from that, from that index with this query. So, let’s give this index a shot. So, I created two indexes at the start of this thing.
One of them called P, and that’s the one you’re looking at now. And I probably should have mentioned this earlier. This one leads with last activity date, and has score as the second key column. Important information.
For you now, for you later, for your life in general. And now, if we start using this index on score, and then last activity date, we’re going to see these queries change a whole bunch. Very meaningful, profound ways.
Now, we are, when we do our filtering, we are able to very quickly and efficiently locate those score rows, and then apply the predicate on the date. So, we narrow these things down very, very early on, right? And if we look at the way these predicates are set up, we are able to find where the score column is greater than or equal to 5,000.
And we are able to find the start range of the dates that we’re looking for here, 2007-01-01. And then up in the residual predicates, this is where we apply the full range of filtering for the dates, right? So, all three of these, well, I mean, they went from, like, kind of like, what was it, 200 and something milliseconds to zero milliseconds.
This is a much, much better indexing proposition for this query. Now, if we go back and we look at the same type of query again, using trace flag 9130, we’ll see some still rather agreeable results. But it is helpful to use this sort of thing to figure out where you’re doing most of your filtering.
Where is most of your filtering actually getting applied? Is it in, like, the actual seed predicate or is it in residual predicates? So, if we look at these three, let’s tighten this up.
Now it’s easy, I think because I yelled at it, scared it. I scared it into submission. Now it’s back to acting up. So, if we run these three queries, when we look at what happens here, we can still see that the majority of the filtering is happening way early on, right? And that the later filter operator doesn’t actually get rid of anything.
73 in, 73 out, 5 in, 5 out, 1 in, 1 out. So, that secondary filter on the date range was not really all that helpful to us, at least from an indexing perspective, at least from a data filtering perspective. Now, you might run into situations like this in real life when you are tuning queries, and you might find that, especially if you have store procedures that allow for these sorts of range predicates.
So, sort of like, if you have, like, where some amount is greater than or equal to something, and some date range, or some data, or some column is between some date range, you might find that as you expand and contract certain ranges, you make more filters more selective, and you make other filters less selective, not fewer selective, less selective. That you might find that the indexes you have become less efficient for those wider and shorter ranges. Figuring out how to handle that is, of course, can be rather complicated.
Sometimes a recompile hint is good enough, especially if you have multiple indexes available for the query to use. Like, if we had both of these indexes available, the second one that leads on score and has last activity date as the second column, and the first one that leads on last activity date and has score as the second column. If these were both available, and we had a recompile hint on there, then SQL Server would be able to sort of figure out via cardinality estimation which index might be better to use in certain circumstances.
It might also make mistakes there, but who knows, right? All you can do is set SQL Server up for success, give it the car keys, and see what happens. If that doesn’t work, then you might have to take it a step further.
You might use dynamic SQL, and you might start evaluating what values get passed in, and you might start hinting different indexes to use depending on the selectivity of the ranges that you have. That people are passing in for their queries. So, like, if you have a very, very wide date range, and a very, very selective score filter, you might say, oh, use the index that starts with score.
If you have a very unselective score predicate, like if you said, like, where score is greater than zero or something, or a score is greater than, like, negative a million, and you had, like, a date range for, like, the last month of the whatever you’re searching, you might want to lead with the narrow date range predicate and have the secondary residual predicate on score that’s not going to screen a whole lot out. So, you might want to use that second column in the index. So, there are all sorts of ways to deal with it.
It really depends on how deep you need to get into fixing these kinds of queries. A lot of the time, it is fairly easy to just allow SQL Server to sort of make choices based on, you know, what it thinks is best at the time. But, you know, plan reuse, parameter sniffing can also be, make that very difficult proposition.
As usual, I do not recommend optimize for unknown. I do not, I do not recommend taking your stored procedure parameters and setting them equal to local variable values. I do not recommend any of these things because they are unwise decisions, most of it.
So, anyway, that about wraps it up for this video. I have one other video I think I am going to record today. I am feeling ambitious.
But, if you like this sort of SQL Server performance tuning training content, feel free to give my video the little thumbaroonie. If you want to get notified when I record more of these, especially when I record multiple videos in a day and you are just like, screw it, long lunch. I am drinking for this one.
Then, please subscribe to my channel. We are nearing ever, ever closer to the 3000 subscriber mark. And I do not know what I am going to do when I hit 3000 subscribers. Maybe, I do not know, maybe I will throw a party and live stream and watch me get drunk alone and write demos.
The majority of my evenings, anyway. So, yeah. Thank you for watching.
I hope you enjoyed yourselves. I hope you learned something. I hope that you will watch more of this prolific in-depth SQL Server training content. Anyway, thank you for watching.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.
How To Tell If Lazy Table Spools Are Slowing Your SQL Server Queries Down, And Fix Them
Video Summary
In this video, I dive into the world of lazy table spools, specifically focusing on how they appear in query plans and whether their removal can be beneficial. I explore these concepts by analyzing execution plans from queries that use cross-apply, a scenario where nested loops joins are often used due to SQL Server’s limitations with certain join types. Through detailed examination of the plans, I demonstrate how to assess the effectiveness of spools using rebinds and rewinds—key metrics that can help determine if a spool is truly adding value or just cluttering up your query plan. By walking through practical examples and offering tips on when to consider removing spools, this video aims to provide actionable insights for database administrators looking to optimize their queries.
Full Transcript
Oh, it’s Erik Darling with Darling Data. And, boy, what a weird week it’s been. I kind of wish I had a do-over on this one. But anyway, in this video, stunning masterpiece of a video, we’re going to talk about how to tell if spools are effective, specifically lazy table spools, because those are otherwise known as performance spools. And they have some rather interesting characteristics. And there are some, there are some times when you’re going to want to handle them, or rather handle their removal differently. Or at least testing to see if their removal is a good idea or not. So, first, we’ve talked a lot about eager index spools. Table spools, in select queries, anyway, are quite similar in their reasoning for appearing in a query plan. They appear on the inner side of a nested loops join, which we’ll look at in a moment. And they’re there because SQL Server doesn’t want to do a whole lot of repetitive work on the inner side of a nested loops join. And by repetitive work, let’s mosey on over to this window. I’ll show you what I mean. Let’s say that on the outer side of a nested loops join, you only have the numbers one through 10. That means that SQL Server, if it uses a nested loops join, will only have to take each number, go do work with it, and do that work one time.
All right, exactly once. That, it’s pretty easy. You most likely won’t see any spooling. Well, you might see a eager index spool if you don’t have any good indexes. But you likely won’t see a lazy table spool on the inner side of a nested loops join when you have a distinct list of numbers. A pretty distinct list of numbers. If you have some duplicates in there, SQL Server will probably forgive you a bit. But where things get far less forgiving is, let’s say our list looks more like this. All right, so I’ve just kind of cross-join this generate series function to itself. And we have this list now, where we have 10 number ones, that was close, 10 number twos, 10 number threes, and so on.
In this case, SQL Server would have 100 loops to do. But for every, for like, you know, 90 of those loops, we would just be reworking things that are already done, right? We already, we already figured out what to do with what the data for this is. We don’t need to rerun a whole query to do that nine more times. So that’s when eager index spool, or sorry, when lazy table spools typically come into play, right? When SQL Server’s like, wait a minute, you’re telling me, I just figured out all this stuff about the number one, and now we got to figure out stuff about number one nine more times? No way.
I’m going to cache this data, and I’m going to reread this data, rather than go do that work over and over again. All right, so let’s look at some plans, right? Good times, good times, these plans, these execution plans, just unlimited mirth and joy comes from looking at query plans. I’m telling you.
So you may ask yourself, what’s the point of these spools at all? Why, if SQL Server thinks that a nested loops join is just going to be that much work, why, why on earth wouldn’t it just use a different type of join? And the answer is that not all types of joins are supported by all constructs in SQL Server.
If you use cross-apply, even a lot of times outer apply, if you cross-join two tables together, only nested loops joins can be used for that. And if you don’t have a join within a quality predicate, so like all you clever dim date calendar table developers out there, when you join on that table on like where like the date in some table is greater than or equal to the date in your calendar table, and also less than or equal to the date in your calendar table or something like that, there’s no equality predicate.
There’s no column equals column in that join. So SQL Server can only use nested loops for that. And because of these scenarios, SQL Server can’t just always pick a hash or a merge join or something like that.
So we end up with nested loops because there’s nothing else. If I had to make like two wishes for SQL Server, one would be no more parallel merge joins, and the other would be batch mode nested loops.
I think those would be very nifty additions to SQL Server. Well, one addition and one removal, I guess. So let’s look at some plans that have table spools in them, and let’s see if we can figure out if those are effective spools or not.
Now, what I did was I loaded up a couple 10 tables with a million rows, and the contents of those 10 tables is a little, well, let’s say different. So this is the top million users where the owner user ID in here has 10 or more posts associated with it.
This is ordered by score descending. And then this 10 table down here is loaded up with only owner user IDs who have nine or fewer posts. So just sort of either you do a lot of stuff or you do a little stuff.
But we get a million rows of each of those fun people in each of those 10 tables. Good time so far. And then we’re going to run a couple queries that are almost guaranteed to end up with a lazy table spool in them because we’re using cross-apply, and like I said before in other probably recent videos, cross-apply is just a really neat shorthand to say, give me nested loops, because once again, I find the inner side of nested loops joins quite fascinating.
Every time I see one, I’m like, oh, it’s going to be nice. We’re going to find something good out here. So let’s look at the query plans for both of these.
Now, you’re not going to be happy with the runtime of the first one, right? Because the runtime of the first one is about a minute. And this is why I pre-ran all these queries because, you know, again, I like to respect your time and your wishes.
And who knows when you’re watching this? Who knows what I’m taking you away from while you watch these videos? I’m sure it’s something very important, something very social.
Moments in your life that you would miss and never get back. Don’t want that. But let’s look on, again, inner-sided nested loops. That’s where the fun stuff is, right?
We know how long the query runs for. But we spend a whole lot of time in this branch, right? We can see that a million rows come out of here. And SQL Server does something helpful.
And it sorts all those rows and puts them in order so that it can almost guarantee for itself that every time it hits something that it needs to populate this table spool with down here, we can reuse that table spool for any other duplicates that come along immediately. So we don’t have to worry about, like, going back to, like, you know, when I was showing you, like, the numbers in a row.
It was, like, 10-1s, 10-2s, 10-3s, stuff like that. If those were all jumbled up, using a spool would be far, far less effective because we would be like, oh, well, go fill up the spool for ID1. Crap, it’s ID2.
Okay, go get that one. Oh, we’re back to ID1. Oh, go do that again. It just wouldn’t make sense, right? It would be all out of whack. Wouldn’t be a good time. It would be like, might as well just not even have a spool there at all. And then I would have nothing to talk about.
What moments in life am I missing, right? Oh, this is giving me a crisis here. You know, tender my resignation at Beer Gut Magazine and start spending more time with my family out in nature.
Enjoying sunsets or something. But let’s look a little bit more closely at the spool itself. So a million rows come over here.
They get sorted. And then SQL Server starts using this spool. And I don’t actually know what the lowest number in the spool is. Let’s just say it’s number one.
So SQL Server is like, boom, nested loops join. Pass that ID number one down to the spool. The spool is like, I got no number ones. I don’t know if I know nothing about number one.
Let me go get that for you. So the spool runs this part of the query plan, fills itself up with number one. And then for any additional number ones that show up here, we just reread from what’s in the spool. And if we look at what’s going on in the details of the spool, and let me make this a little bit more sensible.
From a readability perspective, there we go. We can see that this spool actually was pretty effective. I know that’s a slow query, but it’s an effective spool.
We can tell it’s an effective spool by looking at this. I think it was Paul White who came up with a very good way of phrasing the rebinds and rewinds thing. And you can think of a rebind as like a cache miss where the spool has to go get data.
You can think of a rewind as a cache hit where the query can reuse data in the spool. So we had to go execute that other part of the branch 151,570 times. But we reused what was in the spool 848,430 times.
So that’s pretty effective. And I know it doesn’t seem that way because the query still ran for a minute. But wait until I show you what it looks like without the spool.
That’s where things get weird. Now, down here in the table where we have far fewer, the people in the second temp table have far less activity. They have far fewer posts and probably far fewer comments.
And they’re just not as active site users. This query runs for under a second. So if you have a query with a table spool in it and it’s already fast, don’t really worry too much about fixing the spool because it’s not really doing much of anything.
You’re getting off easy. Move on to something that takes longer. Move on to something more meaningful.
Look for queries where they take almost a minute to run. That’s where you should be focusing your time. Don’t spend a lot of time with this stuff. Because no one’s going to celebrate your career.
Spend a lot of time fixing this and ignoring this. Focus on the stuff that matters. But this table spool, if you were to look at this objectively, is not a very effective spool.
Sorry about that. There we go. So this table spool is kind of iffy, right? But it’s still fast.
We have 639,558 cache misses or times we had to go repopulate the spool. You have 360,442 cache hits or times we could reuse stuff in the spool. So we spent, they had way more activity going and running that query.
But it was still fast, right? Again, still fast. So don’t always judge a spool by the rebind and rewind numbers. Because there’s a lot that they leave out of the story.
That’s exactly how much work had to be done in order to go get data to fill that spool up with. So how can you tell if a spool is effective? Well, what you can do is not put the wrong, but don’t put things in the wrong order.
That’s a good start. Like that, too. That’s also in the wrong order. What you can do is, depending on what you prefer, you can use query trace on 86. So I should actually go back a second here.
Query trace on 86.91 means put a spool in there. Use a spool. Like spool away, sir. You need some spooling. All right.
Play some Led Zeppelin or something. And so I specifically use this here because I wanted SQL Server to use a spool no matter what it thought. I think when I was originally writing this, hitting the table with very few useful people in it was not spooling.
So I was like, no, screw it. We’re just going to force the issue because I want to show people what to look for in these plans. So 86.91 says, use the spool.
86.90 says, don’t use the spool. There’s a slightly more accessible query hint. You know, using trace flags requires, you know, elevated permissions and stuff.
There’s a regular option hint called no performance spool that will get rid of table spools for you. Or actually, it will actually get rid of table spools and lazy index spools. It will not get rid of eager index spools for some strange reason.
But when we run both of these queries without the benefit of a spool in them, right, you can see there’s no spool in here. Well, this query goes from taking a minute to taking three minutes, right? So that spool, even though that query was running for a long time, that query ran for almost a full minute, that spool saved us two minutes of time, right?
So clearly that was a good choice for this first query. The second query, if we get rid of the spool, again, we slow down, but only by like 400 or so milliseconds, 300, 400 milliseconds. So it’s not like a big critical thing that will, again, these are not the types of queries you should be focusing on.
You should be focusing on the stuff that takes time. If you’re sitting there trying to experiment and micromanage a query that’s already fast, you’re not a very useful engine, right? You focus elsewhere.
So we don’t really have any great details to look at in here. We can just see that this thing took a long time. And this branch is a full two minutes spent just seeking into this thing over and over and over again, very repetitively going to get a bunch of repetitive data. And no one’s happy, right?
No one’s going to say, hey, good job, man. You got rid of that spool. The query’s three times slower now, but that spool’s gone. It just doesn’t happen.
Maybe, I don’t know, maybe you’re luckier than I am, but that doesn’t happen for me. So one potential way of getting rid of spools is to run your queries, is to change the query a bit like this. And this is a trick.
I forget if I pick this up from Adam Mechanic or Paul White. I can’t recall which one illuminated me to this first. But this can be a generally good way to get rid of a spool in a query plan, right?
By just saying, rather than having SQL Server have to protect itself from a bunch of duplicative values, we can just say select a distinct list of values from the table and then go cross-apply that distinct list of values.
And when we do that, both query, well, this query, you know, is actually just about equally as fast as it was the first time around, but maybe like 100 milliseconds faster. But this one up here really does well, right?
Rather than take three minutes with no spool or one minute with a spool, this takes 1.6 seconds just like on its own, which is a pretty good improvement. The reason why this gets tricky, so like an alternative here might have been like if we just did like a select, rather than populate this 10 table with a million whatever user IDs, we could say like select top million distinct owner user ID into the 10 table.
And that would give us about the same thing. But one thing that gets tricky with that setup is if you’re selecting more than one column, right? Obviously, you could end up with a bunch of duplicative owner user IDs.
Because even like if you like, you know, say select distinct, you could have, you know, unique combinations of other columns that still result in duplicative user IDs. Now you could, if you really felt like experimenting with stuff, do the select distinct from the table to get this part done, and then join back to the 10 table to get other columns out in whatever, you know, arrangement you need them in.
That’s one, that’s one potential solution. But, you know, if you’re the big, if you’re dealing with a lot of big tables that can, you know, might not be the greatest setup.
But if it gets rid of a spool that’s really slowing your plan down, then it’s probably worth it. Now, another way of getting rid of spools is to write the query in a way where different join algorithms are possible, right?
So rather than writing those as a cross apply, if we just write those as a derived join like this, and rather than do our join, or rather than do our correlation inside the apply, we just do our join outside the apply, we can get far different execution plans.
You know, these, I mean, these are a little bit trickier to see how long they ran for, because this says zero seconds, but like almost all of this plan is in batch mode. And so like all these operators are just kind of keeping track of their own time.
They’re not keeping track of like the full plan time, like you’ll see in row mode plans. But we do have a row mode operator down here. And we do have, well, it doesn’t matter if this is row or batch mode, because it kept track of the three seconds of time it spent scanning the index.
And there’s about another like not even 100 milliseconds or so of time in the stream aggregate. So that’s another potential way to get rid of spools in a way that makes the query faster. I want to be very clear about the fact that, like coming back to these two plans here, just getting rid of the spool isn’t enough.
You need to get rid of the spool in a way where it speeds the query up, right? Just getting rid of the spool isn’t always enough for that, right? All right, cool.
So we covered that. Anyway, scroll down to the bottom here. Because I have all these queries pasted in here so I can bring the demo around and show people spoolie stuff. So scroll past a few things.
So if your spool’s rebinds are way higher than the rewinds, or if you’re just generally suspicious of a spool in your plan, a couple ways you can test the query to see if it’s better off without the spool are to either use option no performance spool or to do option query trace on 8690.
What sometimes works is what I was showing you with the queries that do the select distinct list from the temp table. Again, that does get tricky if there’s multiple columns in there that you need to return because now you’re looking at two trips to the temp table.
And, you know, that can also have its own downsides, let’s say. Just a sort of quick example of that. Like, if we select the distinct top 50 from here, but we’re getting owner user ID, post type ID, and score, you know, we grab this data and all of a sudden, you know, well, the query plan doesn’t matter, but if we look at the results, you know, we have a bunch of number twos in here.
Oh, that didn’t go well. We got a bunch of number twos in here, and then you can see where that switches to three right below it. And then we even get a whole bunch of fours in here. So that might not just, that just might not be enough.
Might be a good start to reduce the result set by some amount, but it’s not going to, probably not going to be enough to get rid of the spool because we still end up with, like, big, long, listed duplicate values. Anyway, that’s about enough about spools today.
If you enjoyed this video, I would appreciate you to the back teeth if you liked the video, gave it the old thumbs up, the old internet yeah. If you want to get notified when I post videos like this and try to animatedly and excitedly explain SQL Server performance issues to you, give my channel the old internet bing.
The old subscribe. And then you’ll get those notifications, and then you and me can be best friends forever. Yeah, sound good. Anyway, hope you enjoyed yourselves.
I hope you learned something. Thank you for watching. And I will see you in another video another time. I have two more windows open here, so who knows when I might just have to record something.
Anyway, cool. Thank you for watching.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.
Whether you want to be the next great query tuning wizard, or you just need to learn how to start solving tough business problems at work, you need a solid understanding of not only what makes things fast, but also what makes them slow.
I work with consulting clients worldwide fixing complex SQL Server performance problems. I want to teach you how to do the same thing using the same troubleshooting tools and techniques I do.
I’m going to crack open my bag of tricks and show you exactly how I find which queries to tune, indexes to add, and changes to make. In this day long session, you’re going to learn about hardware, query rewrites that work, effective index design patterns, and more.
Before you get to the cutting edge, you need to have a good foundation. I’m going to teach you how to find and fix performance problems with confidence.
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.
Why Multiple Seeks Are Slower For Some SQL Server Queries
Video Summary
In this video, I delve into the fascinating world of SQL Server query optimization by revisiting an intriguing blog post from Paul White that has stood the test of time for over a decade. I share my unique take on his example, illustrating how subtle changes in query syntax and indexing can dramatically affect performance. Through detailed analysis and practical demonstrations, I explore why certain index seeks might not be as efficient as expected, leading to surprising execution times. This video is packed with insights that will challenge your understanding of SQL Server’s optimizer and encourage you to rethink how you phrase your WHERE clauses for better query performance.
Full Transcript
Erik Darling here with Darling Data. Still sound like I’m dragging myself out of a cave a little bit, but that’s alright. I’m never going to have to hear my own voice anyway. One of the joys of being on this side of the camera, the microphone. You never have to listen to me. Plug my ears. Today we are going to talk about something exciting, titillating. And it relates back to this blog post by Mr. Paul White, who very, very kindly uses the pastel crayon. Still not sure what to call it. It’s not exactly a drawing. It’s not exactly a painting. I guess it’s a crayoning that I did of him. It’s all his social media avatars. So that’s quite nice. I never, never, never, never thought that I would be a famous artist. But, I mean, kind of funny that this came up when it did because this post is turning, what, 13 this week? 13 years old. Paul’s been smarter than all of us for 13 years. Can you imagine how heavy a burden that is? Being smarter than everyone?
Like, like, 15 years? Same. But the main thing that we’re talking about here is when is a Sikh not a Sikh? And, well, I would love to just read the article to you. I don’t think that that would be a very nice thing to do. The word is Paul’s copyrighted material. So we’re going to try to avoid any litigiousness in this romantic week that we’re all suffering through. All right. I’m going to get right down to the end here. Because at the end, where the answer is. And the answer is, when is a Sikh not a Sikh? That is, when it is 63 Sikhs.
Well, that is correct. And what we’re going to do is, so, I mean, like, Paul’s post has an example in it where, you know, like, if you run these queries a lot, like, the loop that does this thing down here is a lot faster than the loop that does this big in clause here. I’m going to show you a little bit more of a profound single query example. No loops required here. And to get to, just so you know what the setup here is, I have one index on the post table on post type ID that includes score.
And this is really just to get some faster stuff going on the outer side of the cross apply that I’m doing. And then on the votes table, I have two indexes. I have one that’s just on post ID. And then I have one that is on post ID. Well, I mean, just keyed on post ID.
But it includes user ID, bounty amount, vote type ID, and creation date. And then we have another index down here that is keyed across post ID, vote type ID, and creation date. And descending, that’s why that’s called pvcd there, because its creation date is descending.
That’s probably not the best notation for telling someone that a column is in descending order. But we’re just going to deal with it, because that’s what I felt like doing. But this one also includes user ID and bounty amount.
I forget why. I think the query changed a little bit, but I didn’t change the indexes much. So just deal with it for a minute. But what I want to do is run these two queries first.
And while these two queries run, I’m going to talk a little bit about what’s going on with them, because that seems like a nice thing to do for the folks at home who may not just be able to look at a query and figure out exactly what it’s supposed to do.
So I’m using cross-supply. Again, the reason that I use cross-supply for a lot of things is because a lot of interesting things happen on the inner side of nested loops joins.
And cross-supply is a great way to get the optimizer to say, yep, nested loops join. That’s it for me. Just a little bit of a demo writing shortcut there for all the folks at home. And so what we’re doing is we’re selecting some stuff from posts, and then we’re cross-applying out to the votes table.
The votes table, I mean, they scrub all the user ID stuff out of there. So, like, I mean, there are probably other interesting queries that could be written, except, like, the user ID column in the votes table is largely null, except the thing for, like, bounty stuff.
So it’s just not a very, unless you, like, null join demo, it’s not a very interesting user experience. That’s why I’m going from posts out to votes, because that makes sense.
Every, well, not every post, but many posts have many votes. A good, strong correlation in there. And what we’re looking for in the votes table for this first query, I mean, they’re both the same query.
They really just have different index sense in there, because that’s what I want to show you primarily. And so what we’re doing is we are looking to rank posts by which vote type ID was newest.
I mean, that’s kind of dumb, but we’re going to go with it. And then we are filtering down to where vote type ID is in, 1, 2, 3, and where V.creationDate is greater than 2008-0101.
These are very unselected predicates, right? Like, vote type ID 1 is, like, awarding. Like, when the person who asks a question recognizes an answer as being the answer, and they put a little green checkmark on it.
Vote type 2 is upvotes, and vote type 3 is downvotes. Most of the votes in Stack Overflow are either, yes, that’s the answer, yes, that’s a good answer, yes, that’s a good question, no, that’s a bad answer, no, that’s a bad question.
So vote type 1, 2, and 3 really just are the majority of the table. And we’re using the row number function partitioned by vote type ID, which is not in the key of our index, so we’re going to have to sort to deal with that, and then ordered by creationDate, which is also not in the key of our index.
So we’re going to have to index. We would need an index if we didn’t want to sort that, but, you know, we’re using the index up here that’s just on post ID. None of that stuff is in the key. None of those other columns are in the key of the index, it only includes.
So we’re going to have to sort there. And now in the second query, we are doing the exact same thing, except we’re using the index with all of those key columns in there, right? And I think what’s very interesting about these two execution plans is that they end up taking just about the same amount of time.
This one takes about 15 seconds. This one takes about 16 seconds. That’s interesting, right?
Because we would think that with a great index that allows us to not only seek to all the data that we care about in our index and not have to sort the data that we need sorted for our windowing function, that we would be in great shape, but we’re not.
Not in great shape. Now let’s dig a little bit more into these plans, right? Let’s look more closely at these things. So the index seek over here takes about eight and a half seconds. And then the sort here adds, oh, let’s see, what’s eight and a half minus, well, 13 and a half minus eight and a half.
We can forget the halves. Then that’s 9, 10, 11, 12, 13. So we spend five seconds in the sort there, right? So 8.6 seconds seeking into the votes table, additional five seconds in the sort, mostly because the sort spilled, right?
Spilling sort, oh, boy, watch out. 10 dB, ha! But it’s there. Don’t be afraid of it. And then what’s super interesting, at least to me anyway, is when we look at the second query plan that took about 16 and a half seconds, we spend, we don’t have to sort in here anymore because the index fully supports the sort order that we needed for our windowing function.
So we don’t sort and spill, but we just spend a lot of time in this index seek, right? If we compare the two index seeks, sort of, I mean, not exactly side by side, but at least visible on the same screen together, there’s one index seek, there’s two index seeks, we lose just about all the efficiency of not having to sort here, seeking into this index.
Question is why? The answer is, when is a seek not a seek? Well, in this case, when it is three seeks. So let’s bring this way over here so it’s visible on the screen and my big body is not blocking things.
So if we look at this, we can see that under that one seek predicate, we have three seeks.
One, two, three, right there, right? All three of those are seek operations. Three individual seek operations. Right?
Crazy. And if we reframe this a little bit better, you can even see that we’re doing three separate seeks for vote type ID one, two, and three. Crazy, right?
Insane. How is that possible? Well, let’s go a little bit further and let’s run a copy of this query that looks a little bit more, like, syntactically, like what that multi-seek is doing, right?
So let’s execute this. See what happens. What craziness will ensue here?
What baffling things will happen next? We don’t know. We just don’t know. It’s going to take about 15 seconds. Not to spoil it for anyone at home.
All right. There we go. 15 seconds are up. And here is a more sort of accurate visual representation of what that multi-seek looks like.
It’s actually three separate seeks like this. Right? So the way that this query is written is with three separate select queries, union all together, because we’re not going to get any, like, duplicates in here, because we’re looking for vote type ID one, vote type ID two, and vote type ID three.
So we’re getting three separate results. But this is a more accurate visual representation of what a multi-seek query is doing, because we have the three separate seek operators. And we can see, you know, these ones take about, oh, that’s four and a half seconds.
That’s four seconds. This one’s about five seconds. So, you know, we end up, we can more accurately see how the three seeks don’t exactly work out from, like, a timing perspective, because, again, very non-selective predicates, you know, even for, like, you know, each separate post ID, vote type ID one, two, three, and creation date greater than 2008-01-01.
That’s basically, like, the dawn of time for stack overflow. So we don’t, like, that’s, like, everything in the table anyway, right? So, like, we’re getting everything. So these are three big seeks, and they each take between four and five seconds apiece.
So that’s where the 15 seconds goes in the seek and the other plan. Now, coming back to a video that I recorded the other day, I want to show you something magnificent and interesting.
I recorded a video about how a SQL Server is not always very smart about integers. And guess what? We can play a very similar trick on the optimizer with this query, as we did in the other query.
Ready? You ready for this? So we’re going to say, hey, SQL Server, I don’t want the vote type IDs in one, two, and three. I want vote type IDs greater than zero, and I want vote type IDs less than four.
And guess what? When we do this, well, you’re going to notice that this finishes a little bit faster than 15 seconds.
This actually finishes in about eight seconds, which is about what that original query finished in if we didn’t have, this is about how fast the original query would have been if we didn’t have the sort operator in there.
We no longer spend like 15 seconds in this index seek. We have a very fast seek here, don’t we? That’s much better. Then we no longer have the multi-seek issue that we had in the other query plan where we said in.
So when we are tuning queries, you want to pay very special close attention to how we are phrasing our where clauses. We may want to try phrasing our where clauses involving integers in several different ways.
Don’t think that in or like between or less than, equal to, or anything like that is necessarily the best way to phrase your question.
Always try different things because you may find that you get much faster, better query results when you play along with the optimizer’s stupid games. Right?
Anyway, I got a phone call I got to get on. So I’m going to stop this recording. I’m going to say thank you for watching. If you enjoy this sort of hard-hitting, edgy, in-depth SQL Server content, feel free to give this video the old thumbs up.
If you want to get notified when I post these things, you can always subscribe to the channel. And I hope you enjoyed yourselves. I hope you learned something.
And yeah, I don’t know. That’s it. Thank you for watching. Have a good day. I mean, I’m going to have to record something else today because I’ve got a lot going on. But anyway.
Goodbye.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.
The sort of wonderful thing about SQL is that it has many directives which are fairly easy to decipher and use appropriately.
The sort of awful thing about SQL is that it has just as many rules that are somewhat selectively applied as the English language itself.
I have my gripes and grievances with some of the choices, of course, and so will you as you delve further into the language. A petty example is that I sort of wish that SQL used GET instead of SELECT for retrieving data.
Very few people go to the store to select milk, eggs, steak, butter, salt, pepper, and scotch. Most of us just go get it. But enough about breakfast.
Let’s talk about two of the most overlooked and undervalued facilities in the SQL language: EXISTS and NOT EXISTS. Perhaps they’d get more traction is they were called THERE and NOT THERE, but but it would be perilously easy to start confusing your WHERE and THERE clause.
Often besmirched by someone who read a blog post about MySQL in 1998 as “subqueries, which are slower than joins”, they’re two of the most useful things you can grasp and use regularly.
Though they are a bit like subqueries, the columns that you select in an EXISTS or NOT EXISTS subquery can’t be used in the outer query. You can put whatever you want in the select list, from * to 1 to COUNT to 1/0 to the entire contents of the King James Bible, and it will never end up making even the dimmest difference in the world. Likewise, adding DISTINCT, TOP, or any other row-limiting device will do absolutely nothing to change the query plan or performance.
Get over yourself.
Both EXISTS and NOT EXISTS already set a row goal of 1, because all either one has to do is determine if a single row is there or not, just with the logic reversed for each.
Your First Mistakes
Let’s say someone asks you to gin up a list of Users who have Posted anything at all, but whose Reputation still floats at the dreaded 1.
Your first instinct would likely be to write a query that looks like this.
SELECT
u.Id,
u.DisplayName
FROM dbo.Users AS u
JOIN dbo.Posts AS p
ON u.Id = p.OwnerUserId
WHERE u.Reputation = 1
ORDER BY
u.Id;
But you’d quickly find yourself confronted by many duplicate row values, because that’s what one-to-many joins produce. Duplicates.
Your next move, tongue hanging out, sweating profusely, knuckles creaking, nearly paralyzed by the uncertainty of your continued human functioning, would be to do something like this:
SELECT DISTINCT
u.Id,
u.DisplayName
FROM dbo.Users AS u
JOIN dbo.Posts AS p
ON u.Id = p.OwnerUserId
WHERE u.Reputation = 1
ORDER BY
u.Id;
And, sure, with a small enough data set, this is an okay choice. You passed the pop quiz, hot shot. Your instinct to use DISTINCT was not wholly incorrect, but there’s a better way.
But as you start dealing with larger and more imposing sets of data, DISTINCT will no longer cut it.
What EXISTS Does Different
While EXISTS will still use a join to match rows between tables, the semantics are quite a bit different. It can move on once it has determined that a row is either there or not there.
You don’t need to add DISTINCT, grouping, or anything else to get the results you wanted in the first place.
SELECT
u.Id,
u.DisplayName
FROM dbo.Users AS u
WHERE u.Reputation = 1
AND EXISTS
(
SELECT
1/0
FROM dbo.Posts AS p
WHERE p.OwnerUserId = u.Id
)
ORDER BY
u.Id;
Once EXISTS locates a match, it moves on to the next value from the outer side (in this case the Users table), and attempts to find a match. If no match is found, the row is discarded, which is common to inner joins.
Where a lot of developers get hung up at first is in assuming that EXISTS and NOT EXISTS work like IN or NOT in, and they miss the inner where clause to tell the database which rows should match.
I’ve seen a lot of EXISTS queries written, quite incorrectly, like this:
SELECT
u.Id,
u.DisplayName
FROM dbo.Users AS u
WHERE u.Reputation = 1
AND EXISTS
(
SELECT
p.OwnerUserId
FROM dbo.Posts AS p
)
ORDER BY
u.Id;
Which will, of course, return absolutely everything. Don’t do this.
The column you select inside of the EXISTS subquery does not infer any sort of matching logic.
Like I said before, it’s essentially discarded by the optimizer.
Your Second Mistakes
No half-assed SQL tutorial is complete without showing you the wrong way to find non-matching rows between two tables.
It will undoubtedly look something like this:
SELECT
records =
COUNT_BIG(u.Id)
FROM dbo.Users AS u
LEFT JOIN dbo.Posts AS p
ON u.Id = p.OwnerUserId
WHERE p.Id IS NULL;
It’s not that this pattern is never better, it’s just that it shouldn’t be your go-to for each and every query with this goal in mind.
You take two tables, you join them together, and you add a predicate to your where clause to find rows where an ordinarily not-NULL column returns NULLs.
The problem is that SQL Server’s query optimizer doesn’t contain any logic to turn this into the type of query plan that you’d get using NOT EXISTS instead.
You end up needing to fully join any tables involved together, and then later on use a filter to remove rows where no match was found. This can be incredibly inefficient, especially on large data sets.
One may even be dealing with “big data” when the follies of this paradigm become quite clear.
A generally better approach to writing this type of query is to tell the database you’re using exactly what you’re after and exactly what you expect:
SELECT
records =
COUNT_BIG(u.Id)
FROM dbo.Users AS u
WHERE NOT EXISTS
(
SELECT
1/0
FROM dbo.Posts AS p
WHERE p.OwnerUserId = u.Id
);
Your developer-life will be a whole lot less confusing and tiresome if you arm yourself with options and alternatives, which means you’ll have lots of mental energy left over to, like, learn 17 new frameworks and really impress your friends.
Think of the frameworks.
Gear Up
You should make good use of the EXISTS and NOT EXISTS patterns in your queries when you don’t require any rows from another table, and you only need to validate if something is there or not.
In cases where you need to get information from another table, joins are likely the most direct path to getting back the data you need.
But this all brings up an interesting question: what if you want to get back information in the select list without adding in join clauses, worrying about inner, outer, full, or cross, and wondering silently if one day things might go pear shaped.
We’ll talk about that in the next post, when we go over correlated subqueries.
Thanks for reading!
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.
Sort of odd, I have no idea how this idea ended up in the category that it’s in. I was viewing suggestions for SQL Server/SSMS, but when I went to submit this one, it ended up in “Microsoft Entra”.
Maybe I missed something. Maybe someone from Microsoft will be nice enough to move it to the right place. Maybe not. Right now, it lives where it lives.
I use SSMS, because my primary job is SQL Server analysis, performance tuning, and general server fixin’. It’s far and away the most competent tool for the job (sort of like me).
Also sort of like me, there’s not a lot of stiff competition out there 😘
One problem I run into regularly is when I’m tuning a query, and I want to keep one of the resulting execution plans available, so I can:
Compare it after I make some other changes and run the query again
Do some additional analysis without worrying about accidentally losing the plan
The only way to do that is to save the plan, stick the XML in another tool, or keep opening new query tabs to run things in, where I won’t lose the plan.
I think it would be a reasonable and helpful extension of the current set of right-click menu options to be able to open a query plan in a new tab.
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.