Erik Darling Data Archives | Page 44 of 141

Five Reasons Why Your SQL Server Is Slow Right Now

Posted on August 15, 2024January 22, 2026 by Erik Darling

Five Reasons Why Your SQL Server Is Slow Right Now

Thanks for watching!

Video Summary

In this video, I delve into the reasons why your SQL Server might be experiencing performance issues right now. I start by setting up a clear environment using the remote DAC (Dedicated Administrative Connection) to ensure that you can run analysis scripts without interference from ongoing problems. I emphasize the importance of having useful scripts like `sp_whoisactive` and `sp_PressureDetector` at your disposal, as they provide critical insights into what’s happening on your server. The video then explores common issues such as CPU and memory pressure, long-running maintenance tasks, blocking, and parameter sniffing, offering practical advice on how to identify and address these problems before a restart clears out all the evidence. By sharing this knowledge, I aim to help you gather valuable information that can prevent future issues and improve your SQL Server’s performance.

Full Transcript

Erik Darling here with Darling Data. And in today’s video, we’re going to talk about why your SQL Server is slow right now. I’m a little blurry. Hang on a second. I gotta fix myself up a little bit. There we go. Now we’re nice and crisp. Recently voted by BeerGut Magazine as the most crisp SQL Server consultancy in the world. So it’s always nice to get an accolade from BeerGut Magazine. It’s a nice change of pace from when they’re threatening to sue or assassinate me. So, you know, ups and downs in relationships, restraining orders, stuff like that. So the things you’re going to need when your SQL Server is slow are typically the remote DAC. That is the dedicated administrative connection. And you’re going to want that because you’re going to want a way to connect the SQL Server and run your analysis queries in a way that is not impacted by whatever awful explosion of poop stuff is going on all around you. Right? You need to be able to, you want to be able to run your analysis scripts unfettered by the catastrophe unraveling on your SQL Server. You also need some useful analysis scripts like SP Who is Active and SP Pressure Detector. Those are very good things to use in order to figure out exactly why your SQL Server is active.

So, you know, the SQL Server is unhappy at the moment. Right? So, connect to the DAC. Use the scripts. Third-party monitoring tools are cool, but under sufficient pressure, they can’t collect data either. You don’t want your third-party monitoring tools connecting via the remote DAC to get stuff because then it won’t be available to you if you need to do anything. So, you know, it’s, well, it’s attempting, right? Because you’re like, oh, if I just have them connect to the DAC, they’ll never miss out on anything, they’ll never miss out on anything, it’ll be great. Unfortunately, you might want that for yourself. So, don’t, don’t, don’t do that. Don’t, please don’t. Great. But, yeah, like, you know, if you might see if, you know, SQL Server really hits the skids at some point, you just might see big gaps in your monitoring tool dashboard where it wasn’t able to collect data either. So, at best, it can confirm when things hit the fan, but not which things hit the fan. And the reason why I’m doing this video is to sort of put a few different things all together in one place because I still run into this a lot and there’s nothing more frustrating as an outside SQL Server observer than when someone says, hey, we had a problem three days ago and restarted SQL Server. Can you tell us what happened?

Because the answer is no. Once you restart SQL Server, you clear out everything useful that I could have looked at. Granted, I get it. Sometimes you’ve got to restart SQL Server to resume, you know, those business critical tasks like making money. All right. If you don’t make money, what do you do? All right. And I’m looking at you, CrowdStrike. Stink. Stink bombs you are. So, you know, restarting SQL Server, well, it might get things back on track. It removes, you know, just about anyone’s ability to, like, tell you what exactly happened at that time. It clears out all the useful stuff, you know, and it might also prevent some useful things from being logged.

Now, there may be some evidence of what happened in other places, like the error log, query store, maybe sometimes catches things assuming it’s query related and not like something different. But even then, the getting meaningful evidence out of query store after a tragic event is kind of, I mean, it’s a stretch, right? So, what I want you to be able to do is log in to SQL Server, use the remote DAC, run useful analysis scripts, maybe even take some screenshots, maybe even save some query plans, maybe, you know, get some stuff out of SQL Server before you reboot it and get business back on track.

You know, usually if you take an extra 30 seconds or a minute to get some additional information, it’s okay, right? Like, people might sweat it out a little bit more, but, you know, if you say, you know, this will help us avoid it in the future, you know, most people are going to be okay with you doing some, like, you know, forensic digging before the reboot. We’re going to leave aside some things that should be happening, you know, in a more controlled way.

So, things like hardware changes, index changes, and settings changes, those aren’t things that you should be doing while people are trying to make money. Those are things that you should be doing in, you know, low impact times, not business critical times, right? You should avoid those things then because you want to do those things when, like, you know, you have time to, like, make the change, look at SQL Server for a bit, kind of look around, get your palms a little sweaty, and figure out if that’s a change that is okay to stay or you need to roll back.

We’re also going to leave, like, the unexpected failover off this because, you know, the unexpected failover is sort of… Surprise failovers are almost like auto stats updates and they’re, like, sudden, like, kaboom! Why is everything awful? Ah, that’s why.

And let me tell you something. I am never going to troubleshoot a surprise failover. That’s not my deal. I know people who will do it, so if you need that, I can point you in the right direction. But if you want to know why your AG failed over, this is not the face you want on camera.

Surprise stats updates are tough, too, because there are so many of them that could happen. And there are so many different things that they could affect that it’s like, you know, well, was it one stats update that bonked everything? Or did, like, 15 different stats updates and, like, cause 15 different problems?

It’s very, very difficult to track that stuff down. And it’s well-nigh impossible to do it if you’ve restarted SQL Server. Now, for settings changes, if something happens out of band, right, you know, I’m not saying who did it.

Neither will SP Log Hunter, but let’s say SP Log Hunter will look through SQL Server’s error log, and it will track down what settings changed when, but not necessarily who did them, right? You might be able to figure out who did them, because someone might suddenly, like, set all their slack away messages and be like, Nope, out for a jog.

Couldn’t possibly have been changed in MacStop and CosRashold in the middle of the day. Wouldn’t be me. Nope. This video is more about figuring out kind of what’s currently happening and the best ways to focus in on different aspects of what might be happening. We’re going to cover CPU pressure, memory pressure, out-of-band, or long-running maintenance, blocking, and parameter sniffing.

So we’re going to, probably the five most common things I see when I’m troubleshooting SQL Server issues. There have, of course, been some really interesting problems that I’ve seen over the years. They’re just not common enough to put into a video because, you know, if they’re outlandish, wild, like once to half a dozen times in a consultant’s lifetime things, they’re probably not stuff that you’re going to run into often enough that you’re going to get any value out of it.

So let’s get started before I mess something up. And the first thing that we’re going to do is focus in on CPU pressure, right? So this is what this server looks like when everything is okay.

If we sort of focus in on this section here, we have 704 threads. There’s some background stuff running that’s used 82. So we have 622 available, nothing reserved, no threads waiting for CPU, no requests waiting for threads, and we have no current thread pool weights.

All good signs. Another good sign is there are no queries running down here. This section is completely empty. Grand for us.

So let’s stick this store procedure into… Oh, I already did that, and I’ve already got a thousand threads lined up for this thing. And I’m just going to kick this off, and I’m going to run SP pressure detector a few times just until we start seeing thread pool weights start to show up in here because that’s when we know things have gotten rather interesting.

And they should start piling up. There we go. All right. Now we’re cooking with gasolines. All right.

So I’m going to kill that just because we don’t need to make things any worse. So here’s what we need to… Here’s what you want to look for. Basically, remember when this was a positive number and not a negative number, and these were all zeros?

Those were much better times, weren’t they? Everything on our server was nice and hunky-dory, and we were having picanics together, and we were eating crepes.

We were Lady and the Tramping Crepes-Suzettes, and it was lovely. It was truly a magical time in our lives. That time has passed. And what happened was we had a bunch of queries all come in and beat the crap out of our server, and this was not a very good time.

For us, no. This is times of war, famine, and pestilence, plague. But enough about 2020.

So we had all these queries running, and we exhausted our available pool of worker threads. We went negative on the worker threads, and SQL Server was lining up queries to execute. Now, one thing that SQL Server will do when it runs low on worker threads is it will start downgrading queries to DOP1.

Now, if you notice that there is one query in here running at DOP16 still that has 80 parallel workers assigned to it. Wowee!

80? This must be some query. This thing has been, you know, this is just one query that’s running in parallel. The rest of these have all been downgraded to DOP1. Or actually, I mean, there’s a few other 16s in there that are sucking up resources.

So the SQL Server does sort of like have a protective mechanism in here. The funny thing is that you will see a parallel plan for all the DOP1 queries. They’re just, it’s a parallel query running on a single thread.

So have fun with that. So this is some of the stuff that you’ll see when things get bad, along with all these null session IDs just waiting on thread pool, right?

So this is the kind of stuff you’ll see when you’re under CPU pressure. Now, let’s look at what happens when SQL Server comes under memory pressure, because that’s an important thing to understand as well.

By default, SQL Server is willing to give any one query 25% of your server’s max server memory setting. And it’s willing to give any group of queries up to 75% of your server’s max server memory setting to run and do memory granty things.

Memory grants most often come from sorting and hashing. So I’m going to run, well, we don’t need a thousand of these. That’s just mean. We’re just going to do 10 of these.

And if we focus in on memory here, we’re going to see exactly kind of what happens when SQL Server runs out of available memory to give to queries. So here is where you can see what your memory grant settings are for SQL Server.

These come from resource governor. Whether resource governor is enabled or not, this is where the settings sort of come from. If you’re on enterprise edition, resource governor can be a great way to turn that 25 number down to a much lower number so that you don’t have to deal with queries asking for insane memory grants and not using them.

SQL Server is very aggressive when it comes to the memory granting. But what you’ll see under memory pressure is you’ll see probably something like this memory clerk SQL reservations number get real big, right?

That’s 50 gigs of memory grants. And you’ll see some numbers down here start to change too. So I’m going to rerun this after I talk about this stuff so you can see that these numbers do tank down when you’re under memory pressure.

So you’ll see we have three queries that have been granted memory, seven that are waiting for memory. We’ll see the 50.21 gigs that we saw before, right, of granted memory going out the door. And these numbers in here will change, right?

So these numbers in here will get lower, especially the available memory column. And if I give SP pressure detector a run with nothing going on, you’ll see that these total and available memory columns will float right back to 60 gigs, right?

So one thing that I want to say real quickly, though, is if you don’t know what’s happening on the server, you don’t have to worry about running SP pressure detector with any parameters. If you run it with nothing, it’ll give you all the results and you’ll be able to pick out from there what’s going wonky, right?

So just like a bare but naked run of SP pressure detector will give you weight stats, disk stats, what do you call them, perfmon counters, tempDB info, memory info, like all the stuff that we’ve seen with the CPU and memory.

Like it just returns everything. So you get all that stuff back at once and you can pick and choose from what’s happening. So under CPU pressure, you’ll hit a lot of thread pool weights. Under memory pressure, you’ll most likely hit resource semaphore weights and maybe even resource semaphore query compile weights, depending on how that memory is getting booted out the door.

So that’s a couple of signs of like physical pressure on the server, physical hardware pressure on the server. Another thing that you might run into is runaway maintenance. So I’m going to focus on dbcccheckdb.

But you know, it’s kind of funny. I sure do wish that when I saw runaway maintenance, it was dbcccheckdb. Usually when I see runaway maintenance, it’s some buffoon rebuilding and reorging indexes at 5% and 30% because they read it somewhere in 2003 from a blog post that doesn’t even have like work in code formatting or a bunch of broken picture links.

So as much as I wish that folks would listen to me and stop doing useless index maintenance, it still happens a lot and I still have to talk people out of it. And some people will cargo cult around this useless facility until eventually someone pulls them kicking and screaming away from SQL Server.

I don’t, you just can’t help some people. It’s something that I’ve learned after about 10 years of consulting is you just can’t help some people.

They will just do whatever they want to do. They find their comfort zone and they stick to it. Nothing can get them out of it. So I ran dbcccheckdb before I started recording this so that we would have a log of how long it took, which is 24 seconds.

I promise we’ll get back under my armpit there. 24 seconds for dbcccheckdb. Now, you know, you can think what you want about the 24 second number. Doesn’t make much difference to me.

But if we run, now if I ran this query workload without dbcccheckdb also running, I want to say it took like 15, 20 seconds. But if I run, if I kick off dbcccheckdb, here, and I start running this here and go boop, both of these, both of these will suffer.

Right? dbcccheckdb is going to take longer and this query workload is going to take longer because they’re happening at the same time.

Now, again, very simple and easy way to see if this is what’s affecting your server. Run spwho is active because you’ll see if dbcccheckdb or any other maintenance task is running. Now, things that I would absolutely check for, checkdb, index maintenance, and full and differential backups.

If I would expect to see log backups happening pretty often during the day, because if it’s important business data, well, guess what? Log backups are a good idea. Huh?

Yeah, they’re great. Log backups. Who would have thunk it? Amazing things. You see those running during the day? Okay. It’s fine. Checkdb, full backups, differential backups, and of course, any form of index maintenance could be a pretty tough sell there.

I would even say that unless you specifically schedule statistics maintenance to happen during the day, I would be also very surprised to see stats maintenance happening during the day.

So, just to qualify what I said about useless index maintenance, statistics maintenance is not useless. Do that. It is good for you. It is a part of a balanced breakfast. But now, coming back to what I was saying, with the query workload running and Checkdb also running, well, Checkdb took an extra 10 seconds down here, right?

Look at that. 35 seconds. Not anything outlandish, because, you know, this does have to sort of fit into a video that someone will watch.

And the query workload over here took 18 seconds. Okay. So, this didn’t change too badly, but the DBCC Checkdb took longer. But one thing that I would look for, I mean, that is about, I think, five seconds worse than it was without it.

But anyway, one thing that I would absolutely look for is runaway maintenance tasks, because, you know, DBCC Checkdb is not a blocking problem. DBCC Checkdb is like a resource usage problem.

You know, index maintenance very much so could cause lots of blocking problems, right? Even with online equals on, you can still see blocking from that stuff. So, be very careful there.

Moving right along, another very common reason why SQL Server might be slow right now would be blocking. So, let’s come over here, and let’s begin a transaction, and let’s do a very small update just to take out enough of a lock to cause a problem.

And let’s copy this, and let’s stick this in here. And we don’t really need to, you know, give this a lot of threads in order to show you what’s happening.

But we do need to run spwhoisactive in a very specific way. So, we are going to run spwhoisactive. We’re going to use findBlockLeaders equals one. My finger disappears here, apparently.

So that we get the blocked session count column in our output. We’re also going to use getPlans, because getting execution plans for these things can often help us figure out why blocking was going on for so long.

Might be a very inefficient modification query. Worth looking at. You might also be able to figure out if someone was running some sort of modification query when they shouldn’t have been.

Maybe they did the old forget-aware clause trick. I don’t know. But anyway, this is still kicking excellent. So let’s run this.

And what you’ll see is very useful material. I’m just going to roll this back so we don’t have to, I don’t forget about doing that later. But if we look at spwhoisactive, we’ll see a bunch of queries experiencing lock weightness. And we’ll see over here, we’ll see these blocking session ID columns populated with spids.

They will often correlate with session IDs you see over here, which is a wonderful thing, because then you know who’s doing what. And the real magical part of the way that I ran spwhoisactive was that the results, instead of getting ordered by the duration column, the ddmmhs whatever column all the way to the left, we ordered by blocked session count so we knew who was at the very top of the blocking chain and who was suffering underneath them.

But it is a little bit misleading to see this select query right here as a blocker for the other select queries. That’s just sort of the way it looks in the dmbs. And, you know, that’s not really anything that’s, you know, going to change your life.

You just need to make sure you understand it. The reason this query has nine blocked sessions behind it, because this query was blocking this query, right?

So this select query isn’t really at fault. It’s this update query. And it just kind of looks like this one was doing something that the rest of them needed to do. So blocking, very, very common reason why you might see a server having problems. Depending on what the lead blocker is, you know, you might be able to just kill the lead blocker and, you know, let everything else move on.

There are a lot of, you know, you do have to be careful about rollbacks, because rollbacks can, you know, and if you don’t have accelerated database recovery enabled, rollbacks are awful long, can be awful long running single threaded log reading tasks that you don’t want to deal with, because they can just make your blocking problems go on longer.

So just be very careful if you decide to start killing lead blockers, what ends up happening to them, right? If the lead blocker is, you know, not a query, if it’s some other weird thing, again, just be careful with what you’re doing there.

Another thing that can happen quite a bit, and that I see quite a lot in my consulting, is of course the old parameter sensitivity. And I’m going to create a couple indexes here.

And once these finish, I’m going to run this store procedure a whole bunch of times in here. And for this one, I’m just going to do 100 threads. And even for 100 threads, this finish is incredibly fast, right?

Let’s, even if I do 1,000 threads, this will finish very fast, right? This store procedure runs very quickly most of the time. Now, if I dial this back to, let’s just say 10 threads so that we don’t, we don’t overwhelm ourselves, and I run this store procedure with a different parameter, all of a sudden things will look slow.

And we’re going to run spwhoisactive with a different set of parameters than before. We’re going to use the getAverageTime equals 1. And what this will do is it’ll look for prior executions of this store procedure where, in the plan cache, and say, well, how long do you usually run for?

Oh, that long. Well, that’s abnormal, isn’t it? And so what you’ll see is a couple columns in here. This one is how long things are actually running for.

And this one is how long things are usually run for. And that’s where you need to be careful. So, anyway, let’s wrap this up. These are common reasons why SQL Server might be slow.

Thank you for watching. Hope you learned something. Hope you enjoyed yourselves. If you like this video, give it a thumbs up or a comment. If you like this video a lot, subscribe to my channel.

I love you. Thank you. Also, hire me to figure this stuff out for you because I’m pretty good at it. Bye.

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

A Little About Out Of Date Statistics In SQL Server

Posted on August 14, 2024January 22, 2026 by Erik Darling

A Little About Out Of Date Statistics In SQL Server

Thanks for watching!

Video Summary

In this video, I dive into an often-overlooked issue in SQL Server: outdated statistics and their impact on cardinality estimation. Specifically, I explore how the default and legacy cardinality estimators can lead to significant performance issues when statistics are not updated frequently enough or rely solely on auto-update stats. By using a modified version of the Stack Overflow database, I demonstrate how adding new data without updating statistics can result in severely inaccurate execution plans, especially with inequality predicates. The video highlights the differences between the legacy and default cardinality estimators, showing how the latter provides slightly better estimates but still struggles when faced with ascending key columns that aren’t marked as such by SQL Server. Through practical examples, I explain the math behind density vector estimation and provide insights into why frequent statistics updates are crucial for maintaining optimal query performance.

Full Transcript

Erik Darling here with Darling Data, voted by BeerGut Magazine to be the producer of the freshest, hottest SQL Server content on the internet. So if you’re into that sort of thing, you’re in the right place. Welcome aboard! In this video, we’re going to talk about a somewhat overlooked problem that you can run into with outdated statistics. Especially, well, I mean, I guess it could really apply to either cardinality estimation model, either what Microsoft quite smugly refers to as the default cardinality estimator, which is the one that you use if you’re in a compat level higher than 120, which is around SQL Server 20, or 314, or the legacy cardinality estimator. Again, very smug nomenclature there, which is the cardinality estimator that you use if you’re in compat level 110 or lower, or you have the database scoped option, database scope configuration option to use the legacy cardinality estimator turned on, enabled. This thing is looking a little funny on me. It’s a little too happy down there.

One thing that a lot of people overlook. So when we think about outdated statistics, what a lot of people focus on is that you have this 200-step histogram, and perhaps the values in the histogram don’t reflect the actual counts of things that are in the table for existing data, which is totally a reasonable thing to be worried about. So this can happen either because, you know, you, well, let’s see, let’s think of some reasons here. Either you haven’t updated stats in a while, and the, you know, the numbers that are in the histogram just don’t, like, like a bunch of data got added, and like, you know, what, the old histogram said that you had like 500 rows for this thing, but now you’ve got like 500,000 rows for this thing. Granted, that’s a little far-fetched. If you have a big enough table that you can see that sort of difference, but auto stats will kick in eventually, and I know how wild you crazy kids are about rebuilding indexes, which will also update stats.

But let’s just say for a minute that, like, you know, you had some just wildly outdated stats that hadn’t been updated in a while because you have a huge table, you haven’t met the auto stats threshold yet, auto update stats threshold yet, and you just had like wildly inaccurate numbers. That could also happen if you don’t use a high enough sampling when you do a stats update. If you just use a default sampling, you can miss a whole lot of really interesting data points.

Apparently, um, apparently I smoke too many cigarettes, or apparently, since I don’t do that, we’re a family-friendly, we’re an addiction-free channel here. We are only addicted to making SQL Server go faster. You know, we don’t smoke cigarettes.

Though we do miss smoking cigarettes. But really, one of the big things that I run into is that, you know, you’ve added, like, new data to a statistics histogram that just isn’t reflected in it at all. A lot of people will call that something like the ascending key problem, which is a valid, you know, valid name for it.

You know, there’s all sorts of writing about it. And, um, it is a, it is a, something that can actually cause pretty severe performance issues. Because depending on, uh, which cardinality estimator you’re using, you could get some wildly different guesses depending on, um, what, well, like, how many rows are in the table.

And, you know, like, how many values match a certain predicate and all this other stuff. And you can end up with some really bad execution plans when you, um, you start joining, uh, those tables off to other tables. And the, you know, the predicate estimates, like rows coming out of the table, don’t really reflect reality.

And so you don’t get the good, the join that you get is, again, not in line with reality. And, you know, you get like a nested loops join when you really shouldn’t have. So, uh, we’re not going to delve that far into it in this video.

I’m just going to kind of show you what, what, what it’ll look like when that sort of thing happens. So in my copy of the Stack Overflow database, uh, what I’ve done is I’ve created a table, uh, called votes underscore beater. And we’re going to beat this table up a little bit.

Uh, I didn’t want to do it to my actual copy of the votes table because I, I, I don’t, I don’t like messing with, I don’t like messing with them. Um, uh, life is hard enough without having to, well, actually I don’t have to ever have to restore a database. If I do something like that, uh, I, I keep, uh, a version of the Stack Overflow database that I do nothing in called Stack Overflow clean.

So if I ever really mess something up in a table, I can just copy it from this other database. I’m a cheater like that. Cause I, I don’t, I don’t, I don’t like restoring databases.

It’s not, not my thing. Um, so I, I’ve created an, uh, an alternate table to use. And I’ve inserted all the data from the actual votes table into it. And I can see I did all my homework there and I created an index on it, uh, on the post ID column.

Because that’s the column that we’re going to focus on here. Now, um, at current, I actually know cause I already did all this stuff. Um, what this query returns is just some statistical information about what’s in there.

Now, um, right now I have 229,561 modifications against this table. In order for, uh, auto-update stats to kick in using the new auto-update stats thresholds. Uh, the old, old auto-update stats threshold was 20% of the table plus 500 rows, which is, you know, let’s just face it, 20% of the table.

And he, it kills me when he has to stick to plus 500 rows. Cause yeah, it’s still like, it’ll be close enough to 20%. If it’s a table big enough for you to care about, 20% is 20%. Adding 500 isn’t meaningful.

Uh, and the current range high key in the statistics histogram is this. But there’s a whole lot of data within these modifications that have, that has a range high key for post ID higher than this, right?

So that’s the important thing here. So, um, what I did in order to facilitate this, this nightmare is I created a temp table. Uh, it did some work to validate the post IDs that were going to be deleted, uh, deleted the top.

And now, so the, the stats modification counter, uh, that I just showed you here, uh, that was about 500 rows higher, um, plus 500, right? 500 rows, who cares?

Well, now we’re going to talk about how much 500 rows makes a difference. So I subtract, so they, when the table was a full size, right? Before I deleted rows out of it, there was, uh, the auto-update stats threshold was, uh, 230,062.

So I just subtracted 500 rows out of there. And, uh, so this number minus 501. And I deleted those rows out and I, on the delete command, I outputted those columns into the votes temp table.

Uh, then one thing that I did that’s really important before, uh, you know, before anything else was I re-updated statistics, right? The, the, the stats update here is to prevent, um, me going over that modification counter. Because remember, deletes count as modifications.

So I wanted to update stats to zero that out. And then I inserted all the rows from the temp table back into votes beater. And I did some work to find, uh, the highest post ID with more than one hit to it.

Uh, or rather the post ID over that old, the hot, the stats high key, the range high key. And for that, for the statistics object, uh, that has, that had the most in there. Right?

So I want to show you how this, the, the guess is crappy. So with the legacy cardinality estimator, the problem that you run into is whether you have an inequality predicate or an equality predicate. SQL Server only ever guesses that one row is going to come out of there.

And let me just scooch these, these dummies together a little bit. So for the inequality predicate, we get 229,877 rows back, but SQL Server only thinks that one row is going to come out of there.

For the index seek, where we, you know, find 300 rows, SQL Server only guesses that one row is going to come out of there. This is a thing with the legacy CE that Microsoft has done, uh, some, did some work to improve upon and has tweaked and mangled and done all sorts of things.

It had made it, you know, it’s still better, but it’s the constant tweaking. It’s, it’s like, like who can keep up? Right?

It’s like every CU, there’s just some thing that’s slightly different by like a percent. You’re like, nothing works anymore. Everything’s broken. Uh, with the, with the new cardinality, I see what Microsoft, um, so pompously calls the default CE, things look different.

Is, uh, and what I’m going to show you here, I mean, I have this use hint, this enable hist amendment for ascending keys hint on this. And the reason I have this here is because SQL Server has not marked the post ID column as, uh, being ascending.

Um, I tried it with a few different columns in the table, including the creation date column. Uh, and SQL Server did not think that any of them were ascending. So we’re just going to stick with post ID and we’re going to throw this hint on there to show you the difference.

So if I throw this hint on this query and we search for where post IDs are greater than, uh, what the, what’s in the current histogram, uh, we get a guess or rather we get an estimate that just about matches the, uh, modification counter in the table, right?

So two, two, nine, five, six, one. If we had two, two, nine, five, six, two, auto stats would have kicked in. So when we tell SQL Server that this is an ascending key and SQL Server runs this query, the cardinality estimation process, uh, looks at how many modifications there are and says, okay, well, maybe there’s that many rows.

This is for an inequality predicate. Uh, if we don’t add that hint on SQL Server, um, guesses about 30% of the table will be in there.

So this number changes from being the modification counter, uh, to about 30%, sorry, to about 30% of the modifications, right? So six, eight, eight, six, eight, eight, six, eight. I’ll show you the math on that when we get through these demos.

So you can be happy that you watch this. And when I say, I hope you learned something, you can say, oh yeah, I did learn something. I learned some math.

And then you’re going to say, oh, I learned math from a high school dropout. And then you’re going to say, maybe I shouldn’t learn math from a high school dropout. We’ll, we’ll see. I don’t know.

Uh, so anyway, uh, if we look at the equality predicate with, um, with the enable ascending key thing in there, uh, we will get a guess of 55 or an estimate of 55 rows, right? Out of the 294.

So remember with the default CE, uh, that, those are both guesses of one. So with the new cardinality estimator, you get slightly different guesses depending on, um, you know, if the, if SQL Server has already has marked the index or statistics options being an ascending key, you wouldn’t have to use these hints.

But in my case, I’m, I’m using them to sort of fludge things a little bit. So, um, what we get, if we look at the stats properties again, which will help explain those numbers a little bit.

Uh, so for the ascending key inequality predicate, uh, we got just basically the modification counter as a cardinality estimate. For the default CE without that on there, in order to get 6, 8, 8, 6, 8, which is this one, we basically, um, look at the modification counter, uh, times 0.3.

So about 30% of the modifications. For the default CE cardinality, uh, estimate, we, you know, we get here, we actually get a density vector estimate.

Now, if you’re unfamiliar with the density vector estimate, that’s something that comes from looking at, um, the statistics object, which we’re going to go do, right? We’re going to go look at this, and we’re going to go to the properties.

And these two numbers here, if you’ve, if you’ve ever, like, you know, watched my videos or read my posts or read anyone’s posts about local variables, you’ll know that when you have an equality predicate with a local variable, absent or absent or recompile hint, cardinality estimation changes from, like, a parameter sniffing thing or a literal value estimation thing to using some magic numbers.

Those magic numbers come from the statistics object. If we go into the details here, let’s just make this full screen to make it easy. If we look at the density of the column, of the post ID column, which is this, and the number of rows in the table, which is this, and we multiply this by this, we get that, that, we get that density vector guess.

The, all density is just sort of like the assumed uniqueness of this column. So the SQL Server does some math in there to figure out how unique it thinks the post ID column is, which is, which is again, a total guess based on, you know, like something times the distinct number of rows.

So SQL Server comes up with this number and multiplies it, that density number by the rows number. And that’s where we get that guess of 55 from. So that explains pretty much, you know, old versus new when it comes to outdated statistics, especially when those statistics don’t have, like when the high key of those statistics don’t have, it does not match what the current high value is in a table.

You know, it’s, it’s a lot, maybe it’s a little bit easier to think of in terms of like a, like a date column or a date time column where, you know, every row that gets added, it’s almost like an identity column where every row that gets added is new, right?

It’s a higher value than what was there before. It’s a, you know, it’s sort of like an append only, you know, type, type insert scenario. You know, it’s not like, it’s like a, you know, it’s like an order date column, not a ship date column or, you know, ship date comes in null and gets updated.

Order dates are always going to be new. Every order is going to be newer than the order before it. So that’s like an ascending key. Same as, you know, an identity column in that regard. So really one of the biggest dangers that I can see, uh, or rather one of the biggest, you know, performance landmines that I see a lot of people run into is when they don’t update stats frequently enough, or if they just depend on auto stats to do their thing, then they end up with these histograms that don’t represent the newest data, right?

It’s not really about the data that’s already in there. Most of the time, most of the time, the data that’s already in there is probably close enough to reality.

You know, there are obvious exceptions to that. Of course, you know, nothing is perfect, but it’s usually the data that lives beyond whatever the current range high key is that messes things all up. Now, of course, if we ran a stats update on the votes beater table, we would have, we would have representation for those higher values and they only wouldn’t have this anymore.

So it’s something you have to be really careful of, you know, especially if you’re using the legacy cardinality estimator, which I often think is the much better cardinality estimator. Getting that one row guess can be an absolute disaster.

So if you get that one row guess and SQL Server comes up with a, you know, crappy nested loops plan where, you know, you have lots and lots of rows and, but SQL Server’s like, I think there’s only one, you know, you could run into some real issues.

The newer cardinality estimator does do better with this. You know, it at least gives you a guess of higher than one. It might not always be perfectly accurate, but often, you know, the fact that it’s a bit closer to reality does, does buy you, you know, does buy you out of some, you know, potential performance landmines or pitfalls, potential performance pitfalls.

Pleasant, pleasant, pleasant, pleasant alliteration. Anyway, so you learned some math from a high school dropout. Hope you enjoyed yourselves.

I hope you learned something other than math from a high school dropout. If you like this video, you can, you know, if you want to create a bunch of YouTube accounts to give extra thumbs up, you can do that.

I’m not responsible for that voting ring, though. I would never encourage that behavior. I don’t even thumbs up my own videos, which, I don’t know, probably, probably tells you something about how, what an ethical, honest person I am.

If you like this sort of SQL Server content, please subscribe. Subscribers are awesome. We, we, I finally cracked 4,000, which means in the grand scheme of things, probably not a lot.

But if one of you lovely 4,000 or so people sees these videos and thinks, hey, we could use the kind of help that Erik Darling offers for SQL Server, you can hire me, too. I am, I am a consultant.

I consult and I fix these problems and I find these problems and I analyze these problems and I guess, actually, no, I said, I analyze these problems, find these problems, analyze these problems, and fix these problems.

So it’s the FAF method. The patented Darling data, find it, analyze it, fix it. The FAF of SQL Server performance tuning. Don’t steal that from me.

I’ll come find you, my Delta Miles. And I think, I think that’s probably about good for today. Um, uh, I should probably eat something I feel like. My, my, perhaps my blood sugar is cratering a little bit.

Um, I think I’ll, I think I’ll go eat a beer. Or something. Maybe, maybe some chicken. Chicken’s good, right?

Chicken’s healthy for you. But I do apologize to any, if any, if any of the data darlings out there are vegans, I do apologize. You can, you can, you can pretend I said something else. Tofurky.

Um, some, some sort, some sort of bean. But, anyway. Uh, I’m gonna go do that. And, uh, apparently I have some drivers to install, because Intel keeps yelling at me. So, I’m gonna do all sorts of fun things with beer and chicken and, and, and driver updates.

And I’m sure you’re gonna be very jealous. Um, I know that if I were, if I were not me, I would be very jealous. So, uh, as always, thank you for watching.

And I will see you next time in another video. Um, maybe, maybe, hopefully there will be no math. Because that’s, that’s, that’s a promise I’d like to keep. All right.

Thank you.

Going Further

Database Design Choices For Multi-Tenant Databases In SQL Server

Posted on August 13, 2024August 13, 2024 by Erik Darling

Rule Of Nerds

When you’re designing the back end of a database, people will have all sorts of ideas.

Normalization, partitioning, referential integrity, and, usually, trying to figure out what to do when you have more than one client.

If your application is user-focused (like Stack Overflow), you don’t have to struggle too much with the idea of isolation. But when your application is geared more towards supporting multiple entities that have one or more users, things change. Sort of like how Stack Overflow manages all the other Stack Network sites.

Were you to ask me which model I prefer, it would be every tenant getting their own database. Your other options are:

Everyone all mixed in together like gen-pop
Using separate schemas inside a single database

It’s hard to see any real upside to those propositions unless you’ve poorly chosen a cloud solution with limitations placed on number of databases. Yes, I’m looking at you.

The problems you run into with everyone mixed in together are just far too grand, especially if the concept of permissions comes into play within a tenant.

If a single user can belong to multiple tenants, forget it. You couldn’t get screwed harder if you bought a time share in Atlantis.

Embrace The Suck

Since we’ve discarded the lunchroom in Oz approach, let’s talk a little but about why the schema-per-tenant approach doesn’t work.

It doesn’t buy you anything from a maintenance perspective without dumping a whole lot of awful complexity in your lap. Worse, there’s no good tooling available to analyze anything in your database schema-by-schema.

Is your client in schema1138 unhappy? Okay, try to find out what’s going on with them in the plan cache or Query Store. I’ll wait. Look at how their indexes are used. I’ll be here. Wait stats? Blocking? Deadlocks? Hey, I’m comfy. Take your time.

Every client I have who started with this pattern has inevitably built a tool to move bigger tenants out to their own database to isolate them and figure out their problems, or move them to their own SQL Server instance. You will, too, if you choose… poorly.

While I hate SQL Server’s plan cache for finding performance problems, we are totally stuck with it as part of the query execution process. As much as I wish Microsoft would replace it with Query Store, they’re too busy trying to find ways to squeeze cloud customers for all they’re worth without the benefit of a lubricant to work on anything practically useful to SQL Server users.

So you end up with the same problem as you do with the database-per-tenant approach. Your plan cache has limitations on size and number of plans. When the same query executes from a different schema or database, a new entry in the plan cache is made.

This leads to a lot of plan duplication in there, and a lot of plans getting flushed out when those limitations are hit. The difference here is that it’s a whole lot easier to figure out what went raunchy for a specific database using Query Store than it is to figure out what’s gone sour in a specific schema in a specific database using… anything. Anything at all.

Perhaps thoughts and prayers are in order.

Rebalancing

I’ve run into far too many clients struggling to keep the lights on with the all-in-one approach, and the schema-per-tenant approach. Especially with the all-in-one approach, you end up with something like a TenantId column in every table, and in every index, sometimes even filtered down to specific problem-tenants. It’s kind of awful to watch.

This is true of many different areas, whether it’s scalability, security/compliance, performance, RPO and RTO, and of course costs.

Some readers may call out things like log backups, and Availability Groups. Sure, I get that! It might be hard to get a log backup job to run across a bunch of databases every RPO-minutes. It’s also trivial to put groups of databases into multiple log backup jobs so that you don’t have to worry about that.

For Availability Groups, I’ve seen more than a few cases where they were falling behind because there were hundreds of schemas in a database all making lots of tiny changes. Keeping lots of databases synchronized is certainly more responsibility, but no less in danger of things falling dismally behind.

Let’s say you have 400 databases. That may take way more worker threads to keep up, but at least they each get worker threads to do that business. A single database with 400 schema in it doesn’t magically get a bunch of additional worker threads to keep data moving, and it’s foolish to think otherwise.

When you start with the more sensible approach — database-per-tenant — you give yourself many more options. You can move databases to new server much more easily, and you can have different databases on different tiers of service way more easily. Think Standard vs. Enterprise Edition, SLAs, RPO, RTO, HA and DR, etc.

And of course, you can charge extra for those additional services. Don’t worry, after a decade in the cloud, everyone is used to getting their pockets shaken for every additional nicety.

Thanks for reading!

Going Further

Join Me At Data Saturday Dallas Sept 6-7

Posted on August 12, 2024August 1, 2024 by Erik Darling

Spring Training

This September, I’ll be presenting my full day training session The Foundations Of SQL Server Performance Tuning for Data Saturday Dallas.

All attendees will get free access for life to my SQL Server performance tuning training. That’s about 25 hours of streaming on-demand content.

Get your tickets here for my precon, taking place Friday, September 6th 2024, at Microsoft Corporation 7000 State Highway 161 Irving, TX 75039

Here’s what I’ll be presenting:

The Foundations Of SQL Server Performance Tuning

Session Abstract:

Whether you want to be the next great query tuning wizard, or you just need to learn how to start solving tough business problems at work, you need a solid understanding of not only what makes things fast, but also what makes them slow.

I work with consulting clients worldwide fixing complex SQL Server performance problems. I want to teach you how to do the same thing using the same troubleshooting tools and techniques I do.

I’m going to crack open my bag of tricks and show you exactly how I find which queries to tune, indexes to add, and changes to make. In this day long session, you’re going to learn about hardware, query rewrites that work, effective index design patterns, and more.

Before you get to the cutting edge, you need to have a good foundation. I’m going to teach you how to find and fix performance problems with confidence.

Event Details:

Get your tickets here for my precon!

Going Further

A Little About Filtering On Dates In SQL Server

Posted on August 9, 2024January 22, 2026 by Erik Darling

A Little About Filtering On Dates In SQL Server

Thanks for watching!

Video Summary

In this video, I delve into some quirky issues related to working with dates in SQL Server that have popped up during recent client engagements. I start by discussing a peculiar situation where a client was applying the `AT TIME ZONE` function incorrectly on a column instead of using it properly within a parameter or variable. This led to significant performance degradation, as demonstrated through query execution plans and performance tests. I then move on to another interesting scenario involving date-time columns, where over-communicating with SQL Server’s optimizer by filtering for non-null values in conjunction with a date parameter resulted in suboptimal query plans. By stripping away unnecessary predicates and using the correct data types, we were able to achieve much more efficient execution plans that significantly reduced read operations and improved overall performance.

Full Transcript

Erik Darling here with, well, you probably, you may have guessed by now, with Darling Data. And in this video, I’m going to talk about a couple kind of funny, quirky things that I’ve seen happen with clients lately. And, you know, clients, you know, the nice people who pay me money. If you’re interested in hiring me as a consultant, you can do that from my website. You can hire me. You can say, Erik Darling, please do your shamanic dances. Shamanic? Shamanic? Shamanic? Please come make our SQL Server faster, which I will do in exchange for money. Anyway, a couple funny type things that I’ve seen happen at clients recently had to do with working with dates. And, of course, I hear especially for the current generation of kids, dating is weird. Apparently, dating is also weird in databases. All sorts of strange things can happen. And we’re going to look at a couple strange things that I’ve seen recently. So, let’s do that. Alright. So, I’ve got an index, as I usually do, as I am want to have, and want to come packing an index on the creation date column. And what I found the client doing, was something backwards. Now, newer versions of SQL Server, and I say newer, but it’s been around for like 8,000 years at this point, have this fun at time zone thing. And what these crazy kids were doing is exactly what you see here, where they were using the at time zone function on a column, rather than using the at time zone function on a column, rather than using the at time zone function on a column.

So, we’re going to run a parameter or variable. In this case, it was a parameter. But for the purposes of this demo, I’m going to use a local variable because it’s a little bit more convenient for me. But I’m going to throw a recompile hint on these so that the local variable effect is taken out of the picture. Alright. So, if we run these two queries, we will see a rather profound difference in performance between them. Alright.

Alright. And running, and running, and running, and running, and running. And in the query where we call at time zone, on the comments table, on the creation date column in the comments table, we end up taking about almost six seconds to scan the index, apply the at time zone function, and compare it to the variable here. Right. Which is not good from a performance point of view.

We have an index. We came prepared. We came fully indexed to support this query, but we still had to scan that whole column. And whatever at time zone does behind the scenes is filthy foul. Because, like, you know, a dot eight query taking, like, almost six seconds to do this stuff? Bonkers.

Right? So, please, if you’re going to be doing time zone stuff in your queries, do not apply at time zone to a column in your join or where clause. Because it’ll be bad for you.

And the query where we apply at time zone to the, in this case, the local variable that we’re using in conjunction with a recompile hint to make the local variable not awful. This thing finishes in 80 milliseconds. Right?

And we efficiently use our index. So, we did a good here. Right? We did a just fine here. And, of course, you know, the results match. We get the same number of rows back from both of these.

We didn’t screw anything up too hard. And, again, you know, like I keep saying, in this case, anything you do that makes your job easier, like this, makes SQL Server’s job harder, like this. Right?

No good. No fun here. And then, you know, so there was that. And then another funny thing that I saw was, you know, like when you work with date and time information in SQL Server, often it can be challenging to get filtering right. You know, and there are, there are a lot of, there are a lot of circumstances when writing queries, tuning queries, making sure that they go as fast as they can.

We’re, you know, over communicating to SQL Server’s query optimizer can be a good thing. Right? Making sure that it has all of the relevant information available to it in order for it to make the best possible choices and come up with the best possible plan.

It’s a noble effort. Right? Right.

But what I ran into recently, and, you know, again, stepping back a little bit, was, you know, when you’re, when you’re working with date time columns, and the date time, the time portion is populated, but you don’t necessarily care about it too much. Like, you know, you’re usually looking for like a whole day, or a whole like month, or something like that. The time is often irrelevant.

But a lot of people will do, at least, so, there really, there are two groups of people. There are people who will, like, cast a date time to a VARCAR 10, which will get them, you know, the year, month, day, you know, four digits, and two digits, and two digits with a little doodad between them. It’ll get you a 10 character, 10 byte string with, you know, reasonably formatted date.

Those are some of the worst people you’ll ever meet. Another thing that some people will do, and this is the slightly better group of people, is they will just use a date parameter or variable or something, because that chops off the time portion, which you don’t, might not care about for every single predicate. So what I ran into was a query that looked a little bit like this, where there was an index, there was a perfectly good index on a date column, a date time column.

And people were using the data type to cut the time off a search query. And they were also over communicating by saying, well, we also just want to get where the data is not null. We just, we don’t care about nulls.

Which, you know, again, not always the worst idea in the world to do this. There are definitely some times, especially with joins, where I found this very useful. But, in this case, something strange was happening.

I’m going to show you what that something strange is via the query plan. So, I’m going to run just this one. And we’re going to look at the execution plan.

And, you know, this doesn’t run for a terribly long time. This isn’t by, this is by no stretch of the imagination, the worst, most inefficient query you’ll ever see. It was a lot worse with the client that I was working with because it was a much bigger table, right?

The post table in the Stack Overflow database, about 17 or so million rows. So, we can at least work our way through that data with an OK index pretty quickly. The real life example that I was looking at was a couple billion rows.

And things were, things were not good. We were reading, we were reading through a lot more data, which was taking a lot longer because the data didn’t always fit in memory. And things were just kind of icky.

We’re doing a lot of physical reads. So, I’m going to show you what happens in here. Where, what we do is rather than just find all the rows that qualify for where the last edit date is greater than the date parameter. We have this funny empty seek predicate down here.

And that’s looking for the is not null part, right? That’s what this is. We’re seeking to where is not null first. And then we have a residual predicate after we’ve eliminated the nulls to filter to the dates that we care about.

And we end up reading almost, what’s that? 6599963. So, almost 6.6 million rows when we only produce about 170, almost 173,000 rows.

Right? So, that’s not really a good read to row ratio, right? We’re doing a lot more work than we need to.

And with the client example again, because they were physical reads and not just logical reads, that was the real problem that we were having. And so, what I did first was, well, it’s like, you know, we don’t really need this predicate. Well, if you, you know, we just do, if we just filter for this, we’ll find everything we need.

It’s not like SQL Server needs to figure out what’s null or not first. It’s because nulls are never going to eat this, equal this. And, you know, it’s one of those weird things where, you know, developers who come from different backgrounds and, you know, like, you know, purely working with databases, they have a much harder time conceptualizing how to work with nulls in a database.

You’ll see a lot of use of isnull coalesce just unnecessarily, right? Just bad idea genes, throw isnull on everything because they’re used to nulls creating errors, right? There was a recent developer issue with nulls, apparently, over at CrowdStrike.

At least I think that’s what it was. I keep reading differing takes on exactly what happened. Some, everyone sees, a lot of people say it’s a null pointer.

Other people, no, it was this other thing. I don’t know. I’m not good enough at all that stuff to figure out exactly who’s right or wrong. You know, everyone seems to have their own hot take on the stack dumps there. But the first thing that I did was think, well, you know, it kind of doesn’t make sense for our seek predicate to find all the not null stuff and then a residual predicate to have to do all the other stuff.

So I just took the is not null portion out of there, right? And, you know, if we run this now, we go from 600 something milliseconds with a seek plus a residual predicate to about 17 milliseconds at the very end with a seek predicate to just the rows that we care about, right? If we look at this now, our seek predicate is, you know, alone.

And just comparing the last edit date, right? So we just do a regular plain seek to the dates that we care about here. Now, that’s a much better plan generally.

And the date thing in here is, you know, it doesn’t really cause any performance issues. I think this should be the execution plan with the thing that I want to show you in it. But like SQL, like SQL servers optimizer has these cool built in rules for dealing with like temporal stuff that like would usually be like a convert implicit and lead to like the bad kind of scans and with other data types.

So built in to the optimizer, because we have a date time column that has a time portion on it, but our parameter in this case is only a date, we have this get range with mismatched types thing come in. And this is just a thing that the optimizer has built in to say, okay, like I can still compare very easily the date portion of a date time column to a date, right? I don’t need to evaluate the binary representation of the time, right?

I can ignore that. So you have this get range with mismatched type thing, which is very useful in this case, because we can take a date and compare it to a date time and just compare the date portion of it, right? So we have that ability.

It’s pretty nice. There are a couple other optimizer rules that you’ll see around temporal stuff like that. If you go digging in the execution plan XML, you won’t see that looking through like just any of the tooltips or the properties or anything. It’s something that’s only in the XML.

So the other thing that you could do, which, you know, of course, if you wanted to retain your not dealing with the nullness of things, and you still wanted to, you know, it’s like, you pass it, you declare a date time, you know, like down here, I’m going to use date time, you can use this without a time portion affixed to it, and SQL Server will just zero it out. So you get effectively the same thing.

But you could also just, you know, pass in the right data type here. And even if you maintain the, you know, greater than date time thing and the is not null thing, if you use the right data type, these two will flip, right? So in the very first query plan that we looked at, the seek predicate was to is not null, and the residual predicate was to the date range.

But for this one, the seek predicate is now the date range and the residual predicate is is not null. So, you know, you could keep you could keep the is not null if you want and use the right data type and get effectively the same thing. I kind of like this query, you know, even like with or without the is not null, I generally like this query plan a little bit better because there’s a little bit less weirdness to it.

So we just do a plain old index seek here, and we produce our results in about 19 milliseconds. If we go back to this query plan real quick, I don’t know if you remember too much about this, but we had a little bit of extra stuff in the query plan, where SQL Server was producing, had this constant scan compute scale our stuff in it. Again, it’s not the end of the world, right? And these two queries, you know, they finish within a couple milliseconds of each other if you just keep running them over and over again.

Sometimes they’re a little bit different apart, sometimes they’re just exactly the same. So it’s not really a big deal here. It’s just it’s one of those things where in other cases where you see query patterns like this, especially if these constant scans produce a lot of rows, you’ll see this a lot in like query like joins with or clauses. You’ll see query patterns that sort of look like this with like constant scans and merge intervals and distinct sorts and then like a loop join.

And when you’re producing a lot of rows from these, it can be really awful. When you’re just when you’re actually just producing one row from this stuff, it’s not that big a deal. You can just do a seek, but sometimes getting stuck in this nested loops join when these when the operators like this produce a lot of rows that can be hellish for performance. So we want to avoid that. So anyway, that’s about all I had to say here today.

Some quirky stuff when working with dates and SQL Server, right? Of course. And we’re going to go through our usual spiel again, aren’t we? We’re going to we’re going to we’re going to say or I’m going to say not we I have I have nothing else.

I do not contain multitudes. I get I get nothing. There’s just there’s just what you see is what you get. Thank you for watching. Especially you. Hope you enjoyed yourselves. Hope you learned something. If you like SQL Server videos like this, you can you can a like this video by way of a thumbs up or a comment.

The comment does not have to be positive. But I will fight you over excessively negative comments in person. Will physically feels it. I will physically altercation you. If if you like this video again thumbs up comments if you like this sort of content, you can subscribe to the channel and join over 4000 other data darlings out there in the data verse.

This is getting awful. This is getting awful. In subscribing to the channel so you can get notified when I do these things and I make a complete buffoon of myself in front of you.

So yeah, and you can also hire me to do this personally like I’ll give you like a like a private dance of me doing this. It looks wonderful. It’s a whole whole whole thing that people just love right. I’m coming up on 700 clients since since I started this business and I’m pretty pretty psyched on that.

So if you want to be lucky number 700 if you want to join the 700 Club not not the that channel thing, but like, you know, having been someone who worked with me. That’d be cool, too. Anyway, I’m gonna go do actual work now.

Thank you for watching and I will see you in the next video.

Going Further

A Little About Filter Operators In SQL Server Query Plans

Posted on August 8, 2024January 22, 2026 by Erik Darling

A Little About Filter Operators In SQL Server Query Plans

Thanks for watching!

Video Summary

In this video, I delve into the importance of filter operators in query plans and why you shouldn’t ignore them. Erik Darling from Darling Data shares insights on when these operators are expected to appear and when they might indicate performance issues. I cover various scenarios where filters can be seen as normal, such as with halving clauses or windowing functions, but emphasize that a high number of filter operators generally suggests potential problems. The video also highlights examples like non-sargable predicates and max data types, explaining how these can lead to unnecessary row filtering late in the query plan, which is inefficient. Throughout, I provide practical advice on how to identify and address performance issues related to filter operators, including testing different query permutations and considering alternative join methods for better optimization.

Full Transcript

Erik Darling here with Darling Data, and today we’re going to talk about why you shouldn’t ignore filter operators in query plans. Now, I’ve done a few videos where I sort of touched on the filter operators. I mean, not like actually touched on the filter operators. Guy could go to jail for that. But just where I sort of talked a little bit about them and sort of like sometimes why they show up, sometimes when they’re bad. So I wanted to do one kind of specifically about filter operators, why you might see them and when you might see them and when you should pay attention to them and also like sort of how to fix queries where you see them and the queries are slow. Now, in general, filter operators are not a good sign. We’re going to talk about when they’re sort of expected sign and when they’re a bad sign. But in general, when I see query plans with lots of lots of filters in them, I get a little nervous. Mostly because, you know, filter operators are a lot of filter operators are a lot of filters in them. So, you know, filter operators are SQL Server removing rows from a result set at some point in a query plan, right? But really, you don’t want to see rows get removed from a query plan like over here. You want to see rows get removed from a query plan like as close to when you touch an index as possible. Ideally, as you’re actually touching the index, right? Like when you seek into an index, when you scan an index, that’s when you want to see rows get removed from a query plan.

SQL Server filtering out the rows. You don’t want those rows carried over through other operations and then removed later, right? You want to work with a minimal set of rows as possible. Grammatically, that seems daft. That was a bad sentence. You want SQL Server to deal with as few rows as possible as early as possible, right? You want to remove as many rows as possible as early in the query plan, as far to the right in a query plan as you can. As soon as you start removing queries, as soon as you start removing rows far to the left in a query plan, you have started to fail from a performance perspective.

So, when you would expect to see filters in query plans, here are a few examples. One is when you use a halving clause. And since halving clauses work on aggregates, you have to figure out exactly which rows meet that requirement. No rows are going to meet this requirement. That’s totally okay. I just need to show you a query plan where, you know, the SQL Server doesn’t know how many rows are going to meet that count predicate from the halving clause, right?

Where count big is greater than 2.1 billion, blah, blah, blahs. Well, you have to fully execute that query and do that count to figure out which rows would actually qualify for that. So, when you use a halving clause, you can rightly expect to see a query plan with a filter in it to filter out those rows after that expression has been calculated.

Another sort of similar idea is when you’re filtering out on like a windowing function because you have to, at runtime, like again, just like a count big, at runtime, you have to execute that windowing function, come up with what that windowing function expresses itself out to over the results that you’re asking it to express over and filter that out later. So, here’s another example of a query plan with a filter in it.

And this filter is totally reasonable because we need to generate that row number and figure out where it meets the predicate applied to it, right? That row number has to get calculated just like the count big. Similarly, you might see, so, this one’s a little tricky to explain because there are, you know, some caveats and, you know, other things to consider with it.

But just to make it, give you an easy example, if you play tricks on SQL Server to get batch mode into your queries, like create, like my example over here where I’m creating a temp table with a columnstore index on it, and I’m doing this sort of silly join to that table in here, this will generally get SQL Server to do something batch mode-y at some point in the query plan. But like I’ve talked about in other videos, this does not get you batch mode on rowstore the way the batch mode on rowstore feature gets you batch mode on rowstore.

This might get you a few operators to operate in batch mode, but it is not the full batch mode on rowstore experience. So, you end up with this query plan that’s a mix of row and batch mode operators, and batch mode bitmaps are not really tightly integrated into that. So, you’ll end up with a query plan that looks a little something like this, where you have a filter over here, and that filter comes from a batch mode bitmap, right?

Because if you look over here at this hatch join, hash, hash join, not hatch join, we are not hatcheting anyone. This hash join does execute in batch mode, and it does produce a bitmap, right? If you right-click on this, where is that thing?

Batch, yes, define values, opt bitmap. So, this is a hash join where there is a bitmap involved, and that bitmap gets applied, but not over here. We’re like, you know, a row mode bitmap, wow, row mode bitmap, we’re not even going to repeat that one.

Row mode bitmap, you would normally see applied when you scan or seek in a clustered index. In this case, though, it is not applied over here, it is applied in a filter operator after the fact, which can actually be kind of crappy for performance if you’re scanning really big tables.

Another one that I’ve covered in a previous video is you might see filter operators for startup expression predicates. Now, you can ignore the sort of red squiggles in here, because this is just a statement from a store procedure that I pulled out. This is the store procedure that I pulled it out from, I just wanted to show you what it looks like, where there’s sort of this conditional join logic.

So, if we are going to check the post table, then we actually do this join, or if we are going to check the comments table, then we actually do this join. And what that looks like from the query plan perspective is something like this, where you have one, two filter operators in the query plan.

I’m not covering up either of them, good for me. But if you look at the filter operator, you have this startup expression predicate. So, if check posts is true, then we will touch the post table, or if check comments is true, then we will touch the comments table over here, which I am blocking a little bit, but that’s okay.

This filter operator, you can still mostly see what we care about in here, right? So, these are some reasons why you might see sort of expected filter operators. These aren’t necessarily good, bad, ugly, somewhere in between for performance, but they are things that I would generally keep an eye on, and just, you know, make sure that I understand why they’re showing up.

And especially in the case of the startup expression predicate, I would definitely want to test this store procedure like I did in the video that I recorded for you, which I’ll put a link to in the show notes here.

I would definitely want to check this, any query that does something like this, with different permutations using, or reusing a query plan for check post equals one, and then check common SQL is one, because you can run into some really tricky performance issues when you do things like this.

Once again, anything you do that makes your job easier makes SQL Server’s job harder. These are words to live by when you are developing in T-SQL, or probably just any even unhyphenated SQL.

So, of course, times when you will see filters in query plans, because you did something necessarily silly, would be if you wrote a completely non-sargable predicate.

Say, we’re going to actually just, you know, we’re going to run these two queries together, and mondu, I did not semicolon that query. Let’s, you know what, let’s do a little plug for my friend SQL prompt, and let’s say, let’s insert semicolons so that we make sure we don’t have that problem again.

Thank you, SQL prompt. This video is now unofficially sponsored by SQL prompt, which can help you insert your semicolon. I often lovingly and jokingly refer to SQL prompt as my $75 a year semicolon inserter.

You can, if you feel that was maybe a not exactly family-friendly name for it, well, that’s on you, that’s not on me. So, for these two queries, the first one, of course, we are doing something awful with our lives.

We are wrapping not one, but two columns in a date diff function. And, you know, this is generally not a great way to write the query. And I’ll show you the difference between these two.

Now, even though we are going to break the sargability rules a little bit here, and we are going to have at least these columns in a non-sargable sort of fashion in the where clause here, we should see a pretty decent improvement in query speed by doing more work in writing some additional stuff out.

So, this top query up here, of course, we have a filter. Way later in the query plan, we fully scan the post table. We fully scan our index on the post table.

That’s not really the point here. The point here is that in the bottom query plan, if we look at the nonclustered index scan here, we have absolutely no predicate applied to it at all.

All of the rows from the post table are filtered out way over here, right? This predicate greater than five. If we do things a little bit differently, we still end up with an index scan, but we reduce all the rows at the index scan here.

Now, granted, this is still not fantastic. We are still scanning the index. We are not seeking into the index. Partially, this is because of the way the index is designed, because we have the owner user ID column that we’re joining on first.

We don’t have the creation date or last activity date column prioritized in the index for seeking. It’s okay here, right? Not every index can be absolutely perfect.

Excuse me. But, you know, at least writing the query in a slightly more sane way, right, where we at least have some clean expression on the creation date and last activity date columns, even though we do something non-sargible with them over here, which, you know, again, not great, but at least tolerable in this point, we actually get a pretty decent speed up on this query.

You know, not a huge, awful thing anyway. This was 1.6 seconds, almost 1.7 seconds, and this was just under a second. So we shaved a pretty okay amount of time off this query just by typing a little bit more. Again, we made our lives a lot harder.

We typed a lot more stuff. We had to do a lot more work. But when we did it the easy way, we made SQL Server’s job harder. And the harder you make SQL Server work, the angrier your end users are.

All right. Again, words to live by. Another time you might see that in a sort of similar way. And again, this is a matter of surgability.

If you were looking to figure out if a column meets a certain length requirement, wrapping that column in a function can be pretty bad. Really, all you need to do is see if that column contains any character in it, right?

And the underscore wildcard is any single character. So if we run these two queries, again, sort of a similar setup. The user’s table is very small.

It’s only a couple million rows. So you’re not going to see a dramatic performance improvement here. But in real life, in your actual queries, you might find this sort of change in querying to be very helpful. So if we look at these two.

Oops, that did not drag where I wanted it to. Curse you. Damn your eyes, SQL Server Management Studio. Why won’t you give me the draggy thing? If we look at these two queries, of course, the one where we use data length and the where clause results in a filter after scanning the entire user’s table for all the rows.

In this one here, we don’t have that filter operator, right? We have a like of a single underscore character, which is, again, wildcard for any character. They both get the same results. And, you know, again, because the user’s table isn’t huge, there’s not a terrible time difference on this.

But in real life, you can probably see more dramatic results with, you know, the queries that you’re working on tuning, right? So, you know, this is generally bad up here. This is generally a better way to figure out if a column meets a certain length requirement.

Another time when you’ll see filters is when you make bad join decisions. Again, if you go through any, you know, sort of SQL, you know, code camp thing, they’ll only show you the first way of figuring out rows in a table that don’t exist in another table.

And it’s doing a left join from one table to another and then figuring out where the primary key or keys of the column that you’re left joining to are null. Now, again, this isn’t going to be a gigantic speedup, but it’ll be noticeable.

Where the top query with the left join, we have to fully join users to posts, right? We do absolutely no filtering here.

We do all the filtering much later on over here, right? And that filter operator is, of course, asking for where the ID column in the post table is null, right? So this filter operator is usually a pretty bad sign.

This won’t always, always, always, always, always, a thousand million billion percent be the case. There will, of course, be times where, you know, you might test the same query paradigm two different ways, right?

You might test it with the left join, searching for the null. You might check with not exists. Depending on indexing and a number of other things, you might find that the left join is faster.

It’s totally happened to me in my life where I was just like, I know how to make this better. And then it did not get better, at least not without some, you know, indexing magic and stuff.

So definitely two ways to check on here. But, you know, whenever I see a filter operator in a left join type query like this, I get nervous and I want to try, at least try the rewrite using not exists instead of the left join thing.

Another thing that can cause filter operators to show up in your query plans are when you are dealing with max data types. This goes for column data types and this also goes for parameter or variable data types.

Now, I have a recompile hint on the query over here because I want you to, what I want you to see is that just declaring the variable as max means that SQL Server can do nothing helpful for you.

Right? So this and this query up here, the about me column is an envarchar maximum. We can see that right here. And if we want to search on that column, we’re going to have a filter operator in our query plan.

And unfortunately, there is no helping this. You could create a computed column that is a substring of like a reasonable search index for your query, right?

Like you might not need to search all 2.0 bajillion gigs of the bytes of data in this column. You might only need the first like 100 or even if you made this an envarchar 4000, which is a decent chunk of stuff to be able to search through, this would be okay.

A lot of people might say, oh, full text indexes, but full text indexes in SQL Server stink. They’re awful. Don’t even bother. You’re better off using like Elasticsearch or something.

Once you start thinking about full text indexes in SQL Server, you should mentally just jump right to Elasticsearch because that will actually be better for you than anything else.

And again, if we define our parameters and variables with reasonable data types, we will not see filter operators in our query plans. We will see regular index seeks and scans like in this one down here.

This one is where the location variable is declared as an envarchar 100, which matches the length of the location column. And this one up here, where I made location in envarchar max, we filter that data out later.

Even with a recompile hint, SQL Server can’t like sniff the fact that we only had, that the length of our local variable was four bytes. It was here, or I guess eight bytes because it’s unicode.

SQL Server can’t figure that out, right? So SQL Server can’t just discard the maxness of the one up there. It has to, it has to, you know, fully scan whatever table, apply that, and apply that envarchar max filter later.

Another time when it’s particularly bad, and I’m going to start this running because this is kind of an amusing one, is when you put UDFs of nearly any variety in your join or where clause.

Now, you, so let’s limit this at first to non-inlineable scalar UDFs because, you know, SQL Server 2019, at least, you know, if you, you know, have all the right incantations, will attempt to inline some scalar UDFs.

There are a lot of restrictions on it. There are a lot of caveats to it. And performance isn’t always better. Sometimes you might start seeing eager index pools and query plans where SQL Server has rewritten your scalar UDFs as inline versions.

That just means you stink at indexing and you need to fix something. But in this case, you know, you could also see this with multi-statement table-dialued functions, inline table-dialued functions.

You are far less inclined to see issues like this. You still might, but very, again, you have to really screw something up within the inline table-dialued function to make that happen.

So here are two queries, the top 175 and the top 180. And what’s kind of funny is if we look at how long each of these ran for, when we, for the top 175, we do okay.

All right? We still had to read a bunch of rows here, but not nearly as many as we read down here. And what’s very funny about all this, right, is that this query took three, oh, sorry, I messed up highlighting on that.

This query took about four seconds. This query took 34 seconds. All right? And why? I’m going to tell you why. This is very funny. This is kind of like a good, this is a good, like, like, level 500 DBA row goal joke.

This top query found if I lift up this arm and I zoom in correctly and appropriately and I don’t mess anything up. Ah, gosh darn you.

We get 175 rows back. Good on us, right? That’s the one that took four seconds. If you look at this query, well, you might, I’m going to get things lined up down here.

I’m going to lift up my arm. I’m going to scroll over and keep scrolling over and there we go. We are at 179 rows. So I asked for the top 180, but only 179 rows qualify.

So we, we, we wasted a lot of time trying to meet, trying to find that 180th row. That’s why, in the query plan for this one, we could actually get done pretty quickly because we only needed 175 rows.

In this one, where we asked for one more row that actually existed, we needed to have all 2.4 million rows come out of the user’s table to search for that elusive 180th row, but we did not find it.

And that is why we had so much more work done in here. Uh, and this filter operator, the filter operator for both of these is going to be our scalar UDF. Now, I know you’re going to, you’re going to say something, uh, about putting scalar UDFs in computed columns.

And maybe you might even say something about schema binding. Uh, if you’re the type of person who puts scalar UDFs in computed columns, you deserve to be smacked with everything that someone can find within reasonable reaching, reasonable reaching distance and, and smacked with.

Whether it’s a fish or rubber hosing or, uh, you know, a chair or something like that. Because sticking computed columns, sticking scalar UDFs in computed columns will, will doom you to a lifetime of, of query misery.

Um, they will force every single query without like, you know, trace, you can use trace lag 174 to get around this sometimes. Uh, but in general, every single query that touches your table will be forced to run single threaded.

Uh, it, you will have a bad time with all sorts of indexing and maintenance tasks. Uh, you will have a bad time with DBCC, check DB, unless you’re just running in a physical only. A lot of things get bonked up.

That also goes for computed columns and, uh, check constraints, sorry, scalar UDFs and check constraints. So, scalar UDFs and check constraints and computed columns. again, uh, you know, think the immortal words of 50 cent.

We will, we will get the strap. All right. another thing that will cause, uh, filter operators to appear in your query plans. Now, uh, this, this door procedure called Longingly is available at my GitHub repo.

Uh, it really only has one use and that is to, I’m going to show you what it does in the debug mode for a small number of loops. Uh, what it does is, um, it will, uh, declare, uh, like however many, whatever you set for loops, that’s how many, uh, uh, variables it will declare.

Right? So, I chose five, so we have five variables in here. Uh, it’ll also generate this select count from users where reputation is less than zero or where display name is in however many, uh, parameters we decide to create and then the query that it executes final at the end is all those declares with the, uh, with the final statement.

Right? So, fun times there. Now, the execution plan for this one is useless. We don’t need to look at it, but the execution plans for these are a little bit more interesting because what we’re going to see is that when we only search for 14, uh, of those, we will have a clustered index scan and we will filter out rows at the cluster index scan.

Clustered index scan. I, I can’t remember if I said that right the first time, but this is basically what that looks like where reputation is less than zero or display name is in, or display name equals blah, blah, blah, blah, blah up through parameter 14.

Right? We can see parameter 14 right there or P14, which is, I guess it’s a local variable because I declared it. As soon as we hit 15, SQL Server says, this is far too complicated for me.

You have made my job hard. I’m going to use a filter. I’m going to scan the whole table and we are going to apply the filter later. So again, we’re bringing all the rows from, uh, the, from the user’s table.

Every single one, we apply no predicate here. We apply the entire predicate in this filter. So where reputation is less than zero or, uh, we are in, well, I mean, you can see P15 there.

So at least, you know, for whatever value of complexity we have in this query, uh, 15 in, in, in clause arguments was where SQL Server went, I’m done.

You might see this with fewer, uh, in clause arguments. You might see this with more in clause arguments. I don’t know. It depends on a lot of, it depends on a lot of things, but, this is just another example of where SQL Server might throw a filter operator at you when you were expecting those rows to get eliminated a lot earlier.

Again, performance for this particular example isn’t disastrous because, you know, again, user’s table, pretty small. We’re not doing a ton of work. Uh, we don’t have, like, a much bigger query where we did lots of other joins and stuff and then that filter got applied way later.

Just a simple example so you can kind of see what I mean. So, that brings us to the end of, of my examples. Uh, I’m gonna hit save there. Oh, no, I’m gonna hit, I think I hit the wrong button.

We’re gonna hit save there to make sure that I retain all my semicolons. Uh, once again, thank you to Redgate SQL Prompt for inserting semicolons for me. Automatically.

Is that DevOps? I don’t know. We’ll find out. It’ll be fun. So, thank you for watching. Hope you enjoyed yourselves. I hope you learned something. Uh, if you like this video, uh, I like thumbs.

I like comments. You can leave me, I mean, you know, good, bad, or ugly. I don’t care. Comments in general. Anything that I can respond to, right? Uh, we like to, we like to generate, um, we like to generate, lots of, uh, interaction, right, on these videos.

the more interaction I get, the better. Even if, even if you call me a fat dummy. But then I’ll come find you. I don’t care where you live. I got, I got, I got, I got, I got Delta Miles.

Boy, howdy. I can, I can come find anyone. Uh, uh, if you like this sort of SQL Server content, you can subscribe to my channel. There’s a, there are buttons that you can push, uh, where you can get notified.

And I’m going to make sure that I have the most, um, up to date count. Uh, so you can join 4,009. I finally crossed the 4,000 mark. I’m going to, I’m going to overtake a few Amiga repair channels at some point.

Uh, you can join 4,009, uh, data darlings, uh, and, and getting notified when, when I, well, actually, you know, I mean, some people might have subscribed and turned notifications off.

I got Delta Miles for you too. All right. Um, so we’ll, we’ll, we’ll get that done. All right. Anyway, uh, I think that’s about good here.

I’m going to get this uploaded and scheduled and all that good stuff. Uh, thank you for watching. Uh, all, all of my data darlings. I, I appreciate you. Oh, so very much.

All right. Goodbye.

Going Further

How String Columns Can Mess Up Query Performance In SQL Server

Posted on August 7, 2024January 22, 2026 by Erik Darling

How String Columns Can Mess Up Query Performance In SQL Server

Thanks for watching!

Video Summary

In this video, I delve into how string data in your tables can significantly impact query performance, particularly focusing on memory grants. I explore why SQL Server estimates that half of a string column’s byte length is full, leading to potential issues when you select multiple text columns or have non-ideal indexes for sorting operations. By running through two example queries—one including the text column and one not—I demonstrate how this can cause substantial differences in execution time and memory usage, with spills to tempDB being a common issue. I also highlight more extreme cases where selecting large string data can lead to severe performance degradation, even causing SQL Server to run out of temporary disk space and fail queries.

Full Transcript

Erik Darling here with Darling Data. And, you know, I’m having a nice day. I returned from my family vacation. I say family trip because when you travel with kids, it’s not a vacation, it’s a trip. Just careful distinction there. There is not a lot of rest and relaxation. There is just playing referee in a different place. So, you know, here I am. And in this video, we’re going to talk about, well, basically how string data in your tables can mess up query performance. Now, there are a couple other ways that we’re going to discuss today. But today, this one, we’re going to focus on how string, columns can mess up query performance in the area of memory grants. Now, if you’ve watched other videos of mine, you may recall me talking about how when SQL Server estimates memory for a query, it starts with a serial execution plan. And it figures out how much memory would take for a single thread to execute your execution plan, right? Whatever the executable plan for the query is.

The things that primarily impact memory grants are sorts and hashes. Okay? It’s because of the way SQL Server estimates string column fullness, where whatever the byte length of your string column is, SQL Server assumes that about half of it will be full. So, for a VARCAR 100, SQL Server assumes that every row will have 50 bytes full. The more rows you select and the more text columns you select, or the more string columns, I’m going to say text because text is an actual data type. The more string columns you select, depending on their unicode-ness and other things like that, you can really blow up memory grants.

Now, I have a pretty non-ideal index for the query I’m going to run, but that’s because I want to show you what happens when we need to sort data, right? So, I have two different, well, I mean, basically, I have the same query twice. The only difference is up here in the select list. I am including the text column from the comments table, and the text column isn’t in VARCAR 700. And the second query, even though I’m doing a select C dot star here, since I’m not pulling the text column out in the final select, SQL Server is free to ignore it. So, and I have recompile hints on these queries, just because I don’t want any sort of automation or, you know, behind the scenes, you know, intelligent query stuff messing with my demos, right? I just want to show you, like, out of the box, you run this thing, what stinks about it, right?

So, the first query runs, and we select the text column, and the second query runs, we don’t select the text column, and let’s look at some differences between these two execution plans. Well, right off the bat, you might notice that this query runs for about twice as long as this query, right? Let’s just call that close enough to twice as long, right? 4.7 and 9.1 something, good enough. A couple hundred milliseconds off, who’s counting?

The databases, do we care about milliseconds? Yes, we do. We care about everything. And the primary source of the slowness is in the sort operator, where we spill here, right? And that takes about 8 seconds, and this takes about 3.8 seconds. And of course, it’s going to be because the spills are different, right? This one, oops, I didn’t go down far enough. Story of my life.

And this one, we have the warning. We spilled to level 2, and all 8 threads went kaboom. And that’s a big number of pages. If we put some commas in there, because SSMS, I don’t know, comma allergy. Let’s just put a little dot here. Oh, wait, that didn’t dot. That is close enough to a dot. And then 3 more, and that’s 411 million pages or something like that. I can’t even count that high. I’ve run out of fingers very early on. So that’s a pretty big spill.

And if we compare that to the spill down here, even though this is still spill level 2 and 8 spilled threads, that’s a lot fewer pages. And the reason we have more pages spilling is, of course, because we have that text column. And even though, well, let’s look before we say even though, right? Let’s make sure that we understand.

This one asks for a query of 4,296 KB. And this one asks for, well, the exact same memory grant. SQL Server asked for the exact same memory grant for both. And this one spilled a lot more, didn’t it? Interesting stuff there, right?

Even though we don’t have the text column in this one, right? Creation date, post ID score. And this one, we have our text column. Even though SQL Server knew that there was that text column in there, it asked for the same memory grant. That memory grant just didn’t do well for either of these.

Now, if we redo the index a little bit, and the point of redoing the index is to help out our windowing function on post ID and score. And just keep in mind, of course, that the order of the sorting for the windowing function makes a huge difference. Now, granted, I don’t, I mean, at least I’m pretty sure.

In all of the documentation I’ve read, you can’t partition by something descending. You can only partition by something ascending, probably. You can order by things descending.

So if we really do need the score descending here in order to make this index effective for the query that we’re running. And what sucks about both of them, too, is that if you don’t include all the columns that you’re selecting out of the table, you really, really knock down the chance of SQL Server choosing a narrow, nonclustered index to help with this stuff.

Because it just straight up doesn’t want to do lookups for row number a lot of the time, right? Just forget about it, right? Unless it’s a very, very small number of rows.

So looking at these two queries again, right? And this time, we’re going to completely get rid of the sort and the need for, well, the need to sort data to apply that dense rank windowing function in here. There we go.

With that in place, both of these queries take just about the same amount of time, right? We don’t have any big difference here because we don’t have that sort that asks for the same amount of memory in spills and all hell breaks loose, where one spill took twice as long as the other spill.

So we completely got rid of the portion of the sort and the query plan that used to live in this section, right? And now we just have the seek and the building of the windowing function stuff in there. So a lot of the times, you can fix problems with these sorts of things just by having the better indexing in place, right?

So not all that, like, you know, you can’t always just go tweaking indexes on the fly. But if you spot these things, then, you know, indexing can be one way to help out with the pains of string columns, especially when sorting data. The other thing that we want to look at is a more egregious example.

So I have two different queries here against the post table. And now the text column in the comments table is an envarkar 700, which is pretty small beans in the grand scheme of things, right? 700, you know, I guess 1,400 bytes, seven gizzard unicodes, and blah, blah, blah.

But the body table in the post table is an envarkar maximum. It’s a big one. Big boy.

Big boy. And if we run these two queries, we also might be able to spot some differences here. Some significant differences. So that first query finished.

And now we’re waiting on this second query to finish, aren’t we? We’re going to wait a little bit longer on this. Not going so well.

Second query. But that first query finished, I don’t know, it seemed like about eight, nine seconds maybe. I don’t know. This one’s really dragging on with that body column. I’m starting to think that strings are a mistake.

Especially, you know, maybe, you know, storing entire potential two-point-something gig strings in a database was maybe not an awesome idea. Maybe we shouldn’t be doing this. Maybe we should, you know, put them somewhere else.

Let’s put them in a text file on disk, right? And just have a pointer to the text file on disk. Or I don’t know.

Perhaps there is some other storage engine that stores that sort of large text more effectively. But holy cow, we have some differences here, don’t we? We have some big differences.

We have another very big spill difference. Let’s edge these two things a little bit closer together. There we go. We have one sort that took 5.5 seconds and another sort that took almost 49 seconds. Well, I mean, not technically 49 seconds.

Because we did some stuff over here. We did some stuff over here. And we’re going to ignore the time in the repartition streams because the timing code for parallel exchanges is absolute garbage. It just looks dumb and stupid and weird everywhere.

So just ignore that. Just forget that part for this example. Now, let’s compare memory grants. All right.

1214 megs for the one where we just select ID. And for the one where we also select the body column, we end up with a 17 gig memory grant. 17 gigs of our precious memory.

All dedicated to a sort. All right. Just one sort operator was like, hey, 17 gigs, pretty please. I could use that for sorting things.

You have an Envarcar Max. What do I know? Right? That’s pretty foul. And the differences in the spills are, of course, you know, spill level 1, 8 spilled threads, 40,000 pages. And here we have spill level 1, 8 spilled threads, but 3881976.

So 3.8 million pages. But this has had a significantly worse time because we have that body column involved. And we have to write that body column, this sort of stuff out to disk and reread it back in.

And gosh, what a pain that is. Now, at the far more extreme end of that, I’m going to show you two queries that ran over here. And if I lift up my left armpit and we focus in on the time down here, we have three minutes.

Let me move over. Three minutes and 30 seconds of execution time. Now, you may also notice that there is a little message down here that says query completed with errors.

What errors, pray tell? What happened? Hmm.

I don’t know. Let’s find out. Let’s go to the messages tab. And let’s see. Wow, that’s a lot of red text. I’m going to have to duck a little bit. But so basically what happened is that first query spilled so much to tempDB that I blew up tempDB.

I exploded it. I ran out of tempDB. I ran out of drive space on my VM.

I didn’t just run out of like, I didn’t just like fill up tempDB files because I have like an unreasonable max size on them. No, no, no. If you look at, I have a load disk space message on my VM.

And if we look at Windows Explorer and let’s look at, oh, let’s go to this PC where we see it. You know, I’ve got a 400 gig drive on this thing. And I have filled up tempDB nearly to the size of the drive.

Right. So I blew up a lot of tempDB space with this query. Now, of course, because of that, I don’t have an execution plan for this query. But I can show you, of course, that in the second query, if I quote out the body column, right, just I green text that body column.

Again, sometimes green text is very, very good in SQL Server, right? Sometimes we have wonderful uses of green text by getting rid of envarchar max columns from our select list. Well, go to the execution plan over here.

This query still ran for a minute and 14 seconds. All right. It’s a pretty long running query. So if we subtract the minute 14 from the 330 over here, that other query ran for like, you know, I guess a couple minutes and then failed.

Right. Filled up tempDB and failed. All because we were, you know, selecting that body column.

And that body column had to pass through a couple different sort operators. Well, actually, just one sort operator. The votes table down here. I mean, you know, I think there was another video that I recorded where I talked about a very, very offensive execution plan.

Where SQL Server chose a merge join with the votes table and chose to sort the entire votes table. All right. But it’s not this sort’s fault.

This just sucks in general. Right. This just makes all sorts of awful things happen. But the big problem is in the final sort. Because this final sort is where the body column from the post table would be involved.

The body column from the post table has nothing to do with sorting the votes table down here. It’s all this sort where, you know, even without the body column, we still have a spill. And we wrote, let’s see, 1904184.1.9 million pages out to disk.

And that’s just without the body. So, like, SQL Server just, you know, kind of chose a, I don’t know, just not good enough memory grant for this thing. Let’s see what it chose.

17 gigs again. Again, so, why 17 gigs? Well, on my server, which I believe, I forget what I have max server memory set to. I think it’s like 90 something gigs.

SQL Server is willing to give out 25% of your max server memory setting to any query, any single query running. So, 17 gigs is about the max memory grant that I can hand out to anyone. And 17 gigs is not big enough for some of these queries.

All right, just 17 gigs, we still spill out to disk. Whether that’s a function of, you know, the way memory is shared between operators and query plans, which, you know, it totally can be, or something else, just, you know, the amount of data we’re selecting.

You know, that’s a fairly large number of rows and a fairly big number of columns. I mean, from the post table, I mean, realistically, we do have a couple other text columns in here, like tags and title and last editor display name.

And if we quoted those out, you know, maybe we would reduce the spill. It might still spill. I don’t know.

It’s too lazy to check on that right now. Got other things to do. But, yeah, so when you run into these things, a lot of the times the worst spills that you will find are because you, either you did something that required SQL Server to sort data.

You have a presentation order by. You have top with an order by or offset fetch. Or you have a windowing function without an index to support the partition by order by part of the windowing function in order to get you this data in the order that you need it to, you know, apply whatever windowing you need to your data.

You know, those are things that will add, you know, sort problems in. Now, I think what’s really interesting with this one is that, and I sort of wish I had saved it, but it wouldn’t really have done you any good.

This query will still error out if I put a hash join hint on here. Because, again, the only part of this query that, you know, had like a dumb injected sort is when SQL Server chose a merge join here, right?

So SQL Server costed a merge join as being the more effective join type here, which requires sorted data. So SQL Server sorts the votes table, and so this does sort and spill, and this does require memory, but the problem sort is the one up here because we don’t have an index to support the windowing function that we are applying to the post table.

So this is the real big one because this is an absolutely phenomenal number of rows, and a lot of this stuff spills out. So even with just a couple text columns in there, we can still have some pretty significantly bad query performance.

So, in summation, as much as you can, as much as possible, you know, if you’re going to put strings in your database, avoid max data types wherever you can, right?

I understand some queries require them. So rather, some data structures require them. You might be storing XML or JSON or, like, you know, notes, like, you know, whatever. Some, like, open text entry field.

You know, there are reasons why you might need in VARCAR or VARCAR max columns in your data or in your tables, but, you know, just don’t use them unnecessarily. Please try to right-size your string columns as much as possible.

And when you’re doing things like if you really need sorted data for some reason, the two things that you can try to do are pre-sort your data and then fetch text columns later. I do have videos about that, informational versus relational columns.

It’s a very important concept. You know, like, you know, usually in order to provide a result set for someone, your joins, your where clause, your sorting can be done in a different stage of the query processing than retrieving all the columns you need to show someone.

So, like, not doing everything in one big swoop like this is, you know, usually a pretty good idea for query performance. You know, sometimes temp tables are a good use.

Sometimes you can use derived queries in some manner, whether it’s a CTE or, you know, it’s a, you know, sort of a derived table expression. There are different ways to separate those things out so that you don’t have to, you don’t have to select every single column and apply sorting to all of those columns.

You can do the initial set of data joining, filtering, sorting, and then go get the columns that you’re interested in displaying, right, the informational columns after you process the relational columns.

I have videos about that somewhere on my channel. I’ll go try to find them when I publish this. But no promises because, I don’t know, searching is hard, isn’t it? A little difficult.

So, anyway, thank you for watching. I hope you enjoyed yourselves. I hope you learned something about how awful strings are in databases. I hope that you have now started a project to move all your strings to text files on the file system and just have pointers to the text files.

Kidding. I mean, if they’re PDFs or, like, Word docs, fine, like, file stream type stuff, fine. But, you know, if it’s just something for the table, I guess, I guess you could deal with that. Yeah.

Anyway, I have things to do, so I’m going to go do them. Again, well, I guess, I was going to think of one another. Oh, yeah. If you like this video, you can subscribe to my channel. And you can join the nearly 4,000 other data darlings.

Actually, maybe by the time this thing actually gets published, there will be 4,000 data darlings out there in the known universe. If you like this video, give it a thumbs up.

If you have anything to tell me about this video, you can comment on it. Good, bad, or ugly? Just don’t call me ugly. That’s personal.

Then we have to fight. But, yeah, you know, strings and databases are a very perilous thing. The number of problems that they can cause. So I’m going to be recording another video soon about other problems that max data types and string filtering can cause in queries, especially query performance issues.

We’re going to have one of those coming up soon. It’s maybe even in the text file right over here called don’t ignore filters. Might even be right in there.

Yeah, but, you know, strings and databases are a perilous thing. The bigger they are, the harder your queries fall. And just like we saw in this query where I was able to completely blow up 10DB with a sort, you know, it’s probably not why you want your queries failing.

You probably don’t want queries failing at all. Right? So, yeah, watch out for those string columns, George. Boom.

All right. Goodbye. Thank you.

Going Further

The Decline in Microsoft SQL Server Support: Why Your Business Deserves Better, And How I Can Help

Posted on August 6, 2024September 23, 2024 by Erik Darling

What Clients See

One of the chief things my clients complain about when opening support cases with Microsoft is that the people working support tickets on the Microsoft side… Aren’t actually Microsoft employees.

They’re third party vendors.

When they end up working with a third party vendor, the lack of expertise and poor communication is obvious and painful.

Many support cases are handled by third party vendors, and my clients often complain that their staff has totally inadequate SQL Server knowledge to do the job.

Even when my clients do get someone from Microsoft, the person they get is juggling an enormous caseload. The truly knowledgeable folks are worked to the bone and scattered across hundreds of cases.

The amount of turnover in support roles at Microsoft is also pretty astounding. I used to know a lot of people who worked in them, and… they’ve all moved on to other roles.

At this point, you’re lucky to get someone with 18 months of SQL Server experience handling your support case. That’s a far cry from when 18 months was about the minimum experience you’d find.

The worst part? I have clients who have paid Microsoft upwards of $75k USD for support contracts, and they can’t get anyone to answer basic questions or help with basic troubleshooting.

If you want to read a Great Post™️ about how bad things have gotten, head over here: How to Survive Opening A Microsoft Support Ticket for SQL Server or Azure SQL

You Deserve Better

But you won’t get it any time soon from Microsoft.

They’ve cratered their once excellent support infrastructure and replaced it with a frustrating, pass-the-buck, often indifferent group of third-party vendors, and overworked internal staff.

The decline in Microsoft SQL Server support quality can have serious consequences for your business. Inefficient support can lead to extended downtime, data loss, and decreased productivity.

Your business deserves better, and that’s where I come in.

With years of experience in SQL Server management and support, I understand the complexities and the importance of prompt, effective support.

Here’s what I offer:

Expert Knowledge: As a recognized SQL Server expert, I have a deep understanding of SQL Server and can provide the expert support that outsourced teams often lack.
Clear Communication: I prioritize clear, effective communication to ensure that I fully understand your issues and provide the best possible solutions.
Tailored Solutions: I take the time to understand your specific environment and needs, providing solutions that are tailored to your business.
Consistent Support: You’ll receive consistent, high-quality support that you can rely on to keep your systems running smoothly.
Help Dealing with Microsoft Support: You need someone on your side who can work through support cases, provide necessary technical details, and call support staff on their BS when they give you lousy reports.

Don’t let declining support quality affect your business. Hire me to handle your support tickets and ensure that your SQL Server environment receives the expert care it deserves.

With my help, you can focus on what you do best — running your business — while I take care of the rest.

Reach out today to discuss how I can support your SQL Server needs and provide the expertise and reliability that your business deserves.

Together, we can ensure that your technology works for you, not against you.

Thanks for reading!

Going Further

Join Me At Data Saturday Dallas Sept 6-7

Posted on August 5, 2024August 1, 2024 by Erik Darling

Spring Training

This September, I’ll be presenting my full day training session The Foundations Of SQL Server Performance Tuning for Data Saturday Dallas.

All attendees will get free access for life to my SQL Server performance tuning training. That’s about 25 hours of streaming on-demand content.

Get your tickets here for my precon, taking place Friday, September 6th 2024, at Microsoft Corporation 7000 State Highway 161 Irving, TX 75039

Here’s what I’ll be presenting:

The Foundations Of SQL Server Performance Tuning

Session Abstract:

I work with consulting clients worldwide fixing complex SQL Server performance problems. I want to teach you how to do the same thing using the same troubleshooting tools and techniques I do.

Before you get to the cutting edge, you need to have a good foundation. I’m going to teach you how to find and fix performance problems with confidence.

Event Details:

Get your tickets here for my precon!

Going Further

A Performance Mystery With Parameterized TOP In SQL Server

Posted on August 2, 2024January 22, 2026 by Erik Darling

A Performance Mystery With Parameterized TOP In SQL Server

Thanks for watching!

Video Summary

In this video, I dive into an intriguing and somewhat obscure performance issue in SQL Server related to parameterizing the `TOP` clause within stored procedures. You’ll see how using a literal value for `TOP` can lead to more efficient query plans compared to parameterizing it, especially when dealing with small numbers under 101. This behavior is due to exchange packet sending rules that allow earlier data transmission in certain scenarios. I explore this issue through various examples and demonstrate the impact on query performance, showing why this problem might go unnoticed until it causes significant delays. By the end of the video, you’ll understand how to mitigate this issue with better indexing strategies, using `OPTIMIZE FOR` hints, or by rewriting your queries for optimal performance.

Full Transcript

Erik Darling here with Darling Data, cleaning up a few silly little things in my way. And in this video, we’re going to talk about the dangers of parameterizing top in SQL Server. And this is maybe the weirdest query performance problem. that I have ever run across that wasn’t like a bug. It was actually sort of known behavior by like three people in the world. So that was fun. So what I have here is a store procedure. And this store procedure really only takes one parameter because that’s all it has to. And that one parameter is used here. where we select a top number of rows from the users table. And then we cross apply to the badges table to find the top one. Well, actually just mean to put this into English, to find the most recent badge that a user has gotten. And for this, the purpose of this query, we only care about users who have a reputation over, I think that’s 10,000. And we were going to order by reputation descending. All right, cool. So that’s the store procedure version. We also have the just the plainly written version with a literal value here for 38. And then we have the sniff top version here where we’re saying at top equals 38. The query plans for these, well, I had to run these ahead of time, because I don’t want to stand here for a minute and 13 seconds waiting for this query to finish. Much like we’re probably going to stand here for a minute and 13 seconds while I try to grab the right part of the query plan to mush things together. So looking at these two queries, and I I think, I mean, you know, probably the funniest part about all this is if like, if you saw this query, like in the plan cache or something, or in query store, monitoring tool or something like that, it would be really hard to decipher, maybe why one time it ran for seven seconds, and the other time it ran for a minute and 13 seconds. You know, cost costing being what it is and all that.

Nothing, nothing is very clear about this. And I know that there’s some green text up there. And the green text isn’t totally wrong. The green text is right. This, this, both of these would be faster if I had an index on the badges table. But I need to show you the behavior without the index, where things are really terrible and crappy, so that you understand what’s happening and what you need to fix. All right. So yes, an index would be helpful here. But that’s not that’s what I’m trying to explain the behavior to you. Not teach you another, not teach you another, this is not just, this is not another stupid video about adding an index. Any, any half-witted consultant in the world can make another post about just adding an index.

Me, I try to give you a little bit more, because you’re nice people and you deserve more than half-witted consultants just telling you to add indexes. So I need to move over a little bit to make sure all the green stays on the screen. Unofficially, when a top uses a constant, and the constant is a small number under 101, the exchange packets are allowed to send earlier than normal, as long as the exchange is below the top operator. The exchange packets are allowed to send as soon as they have a row, rather than waiting for those exchange packets to fill up completely.

This can only happen with constants or another thing that I’m going to show you in a minute. And this behavior is true going back to SQL Server 2005. I’m of course doing my testing on SQL Server 2016. Sorry, 2022 version 16, which you can see if you look right there under my armpit.

It may change in the future. So if you’re reading this at some far-off date and Microsoft has decided to fix this stupid issue, then I’m wrong all of a sudden. Well, you know, sorry. I was right for a long time, though. So I had that going for me.

When you parameterize top, it’s not considered safe to send exchange buffers early anymore, because you could stick any number in there up through the big int max, because top takes up to the big int max for a number of rows.

In cases where you’ve got a big top, say the big int max, which is this nine, this is that number right there that starts with a nine and has one, two, three, four, five, six commas in it. A lot of commas.

If you were sending that many rows one row at a time, it would be significantly more awful than sending over a smaller number of very full exchange buffers. If you’re surprised to hear that parallel exchange buffers can send at different times, depending on different situations, you’re not alone.

I was also very surprised. Not in a good way. Not in a, I get a golden ticket. It was more like, like when the doctor calls you on a weekend.

You’re like, what happened? So in the second query, where the exchange buffers are only sent when full, we spend a lot longer waiting for those exchange buffers to fill up so they can send.

This isn’t exposed anywhere in the query plan. You’re not going to be able to see this anywhere else in the world, unless you have a good, unless you’re handy with a debugger or you pay attention to things I say.

So obviously that’s not a good situation. And there are a couple ways to fix it. Of course, there actually, there are a few ways to fix it.

One way, you know, in my case, better indexing would mask a lot of the problem, but not completely solve the problem. One way of doing it is to add an optimize for hint so that your query, so that you tell SQL Server to optimize the query plan for top being equal to one.

And if you do that, this isn’t going to make the query faster than the original version with the literals. It’s going to make it on par with the original version with the literals.

That’s all. This, this thing is still going to take seven or eight seconds because I don’t have any good indexes, but this at least gets us a plan that doesn’t take like almost a minute and a half. Right.

And then another way of doing it is to rewrite the query to. And so, you know, like one thing that I try to stress whenever I’m teaching people about SQL Server stuff is there are a number of different ways to write a query.

And if you’re not getting very good performance with one of them, you should try another one. That’s query tuning. Top one is, you know, a sort of a shortcut, you know, top one with an order by.

You know, it’s sort of a shortcut for like getting a max or something or using row number. In this case, if we use a row number and we pump up the optimizer cardinality compatibility level, it’s not, not cardinality compatibility level to 150.

or 160. So that we get batch mode on rowstore. Oh, this, this formatting murder, murder on me.

Did I fix this one? I didn’t. Oh boy. SQL prompt is really letting me down. I’m going to mess everything up here. Another. So if we rewrite the query to get the top one using a slightly different method, and we allow SQL Server to use batch mode, this is, this will completely negate the need for an index because this finishes just about as quickly as one could hope for.

Sans and index, right? Cause this, this gets batch mode. You can tell by the window aggregate there. That’s a good thing. All sorts of good things happen in this query plan using batch mode that weren’t happening in the other plans.

So, uh, you don’t always need an index, do you? You don’t always need an index, do you?

You know what else you don’t always need? Half-witted consultants. You need full-witted consultants. Full of wits. Lots of wits. So, if you’re out there in the crazy borderlands of SQL Server, and you run into a query with a top that’s parameterized, and the execution is real weird, the execution time is real weird, well, you might be running into this.

And if you’re running into this, things you can do, one, um, explore, uh, better indexing scenarios so that, uh, you can at least mask the problem a little bit.

Uh, two, uh, stick an optimize for, uh, on the query, uh, optimize for the top being equal to one. That will at least help you understand if you’re hitting this problem.

Uh, that actually, that would really help you understand if you’re hitting the problem, and it might actually, uh, improve performance. And of course, number three, you can rewrite the query in a way that gets you, uh, you know, things differently, right?

Like one thing, like I said, I always try to teach people. There are many ways to write queries. Uh, if the first way isn’t fast enough, try the second way. The second way for me here was using row number rather than, uh, top one, because the row number, uh, gives you different sort of set of, uh, optimizer stuff that the top one thing, usually won’t get you, like the batch mode on rowstore for the row number thing there.

So, a few things you can try if you’re running into this problem. I hope you never run into this problem. It is a terrible, awful, no good, very bad problem.

Uh, uh, and when I ran into it, I almost cried. But, then I remembered, Robert Smith told me, boys don’t cry.

And so, well, didn’t, didn’t cry. Never, never had, never will. It was, I don’t, I don’t want to let Robert Smith down. He seems, he seems very sad.

if he, if he, if he knew I was out there crying, I don’t know, I don’t know that he could take it. I don’t know how long he would be for the world. So anyway, hope you enjoyed yourselves.

Hope you learned something. I, I actually, I truly hope that all of you have learned this from this video. Cause if you knew about this before, I don’t know.

I, I would, I would feel terrible. It’s, God, this sucks. Uh, if you like this video, which you might be, if you, if you like watching videos of car accidents and, um, like early, early road runner videos, uh, you, you can, you can give me a thumbs up.

Uh, you can, you can leave a, uh, motivationally, uh, I don’t know. Nice. Just a nice comment. Something, something sweet roses and stuff.

Uh, if you like this sort of SQL Server performance tuning content, uh, you can subscribe to the whole channel and you can get what can watch all the videos and you can get notifications every time I post a new one, which will be pretty frequently because I have nothing else to do except post videos for you.

Cause I love you and I miss you. And it’s been, it’s been too long. We haven’t hung out. Anyway, I’m going, I’m thinking I’m going to find something to do. It’s not standing in front of my computer for a little bit because I think the lights are beginning to affect me.

So thank you for watching. the industry is. Thank you.