create temp table sql Archives | Page 31 of 139

All About SQL Server Stored Procedures: ANSI Settings

Posted on February 25, 2025January 22, 2026 by Erik Darling

All About SQL Server Stored Procedures: ANSI Settings

Video Summary

In this video, I dive into the nuances of ANSI settings within SQL Server stored procedures, highlighting why these settings are crucial for maintaining database integrity and performance. I discuss how setting up these options correctly can prevent errors when working with index views, computed columns, and filtered indexes, ensuring that your queries match the correct expressions in the optimizer’s plan cache. Additionally, I explore other important settings like implicit transactions, row count suppression, and exact abort, providing practical advice on when to use them and why they are beneficial for analysis procedures.

Full Transcript

Erik Darling here with Darling Neda. I was talking to my friend Batts about what we should do next. Aside from Batts’ fantastic idea to take some office hours style questions, Batts also said, you know what, Erik, you spend a lot of time fixing store procedures that have all sorts of problems, all sorts of problems. Not just performance problems, structural problems, integrity problems, just the things that people do within store procedures are completely unhinged. I find developers untrustworthy to the point where I sometimes just wish that they had used an ORM for that. It’s that bad. It’s that bad. People are not kind to their databases. It’s amazing. Amazing some of the things you see when you get into the world of consulting. If you are the type of DBA or developer who has only ever had full-time employee positions and you’ve worked at relatively respectable organizations, you might see some of the things that I talk about and think, that’s the most unrealistic preposterous thing that I have ever heard that I have ever heard. No one would actually do that. Lo and behold, every time I think it can’t get worse, it gets worse. So, in this video, we’re going to cover the very beginning of the store procedure, the ANSI settings. ANSI settings, of course, have a couple interesting outcomes in SQL Server. A lot of the ANSI settings to me are funny because a lot of them, even though they will still have these impacts, don’t actually change anything with the queries. Some of them, like a Rith abort or numeric round abort, don’t even actually do anything really. They just end up with different plan cache entries and you get errors if you do stuff with index view, computed columns, filtered indexes. You try to make modifications to a table and you have the wrong ANSI settings. You get errors.

If you have the wrong ANSI settings and you query a table that does the index view, computed column, filtered index thing, you don’t get the matching for it. I guess you get different plan cache entries for different ANSI settings, which makes reproducing performance problems a lot harder. But some of them just don’t even do anything. Some of them only do stuff if you have the ANSI warning settings set to something specific.

So, a lot of this stuff is just stupid and there for backwards compatibility, but this is the kind of stuff that builds up when your code base is as old, wretchedly old, decrepit as SQL Server is. Not that I don’t love it for all its intricacies, but anyway, this is where we’re at. But now Bats is saying it’s time to make some money.

That’s not what Batsmuro actually sounds like. So, if you like this content, if you like this channel, if perhaps you just hit mute and stare at me while I wander around on the screen a little bit, and you want to support my work here to bring you this extra high quality SQL Server content, you can sign up for a membership. Something that needs to be said is that the video description contains helpful links for everything that I talk about.

Whether it’s signing up for a membership, asking me questions during office hours, of course, the liking, commenting, and subscribing are all separate button functions in the YouTube interface, which you are free to click. Just make sure you only click the subscribe button once, because if you hit the subscribe button more than once, I think you unsubscribe. We don’t want.

We want more subscribers. It recently passed the 6,000 subscriber mark. Also fairly recently passed the 50 member mark for the channel. So, thank you to all, I think it’s 53 or so people, for your continued support of four plus dollars a month to my antics. If you would like to pay me more money because you need more help with SQL Server, you can do that.

I am a consultant. That is how I make the majority of my income. I do not make it from YouTube, thankfully.

Otherwise, we would be in quite a destitute situation here at Darling Data. But if you need any of this stuff done to your SQL Server, health checks, performance analysis, hands-on tuning, performance emergencies, or developer training in order to prevent further SQL Server performance emergencies, I am the best consultant in the world outside of New Zealand for all of these matters.

Do not get suckered into paying anyone else to do this who will make things worse, probably. If you want some very high quality, very low cost SQL Server training content, boy do I have a lot of streaming stuff. You can get 75% off of it.

That brings the total cost down to just about 150 US dollars. And again, helpfully assembled links are in the video description. I suggest if you haven’t yet clicked on that, you click on it now.

If you’ve already clicked on it, send it to a friend. You might have friends who need this help too. And by gosh, the more of us who get better at this, the better off we all are. Upcoming events.

Saturday, May 10th, 2025. SQL Saturday, New York City is back in action. It’ll be taking place at the Microsoft offices in Times Square. I think there’s a Dave and Buster’s nearby, so if you get bored, you can just duck out.

Or you can go make friends at Port Authority, bus station, whatever. Whatever floats your boat. There’s vibrant communities in both places.

Also recently announced, I haven’t added the information here yet, but I will. There is a performance tuning pre-con on May the 9th by Andreas Volter, I think. I think that’s how you pronounce it.

I’m just going to say it like that because it’s pleasing to me to say it like that. But tickets are on sale for that as well. I’ll get the necessary background information added in here for the next video. So I messed up and didn’t do it for this video.

So sorry about that, but the information is now out there in the world. But with that out of the way, let’s talk about store procedure stuff. Now, this is the proposed list of topics that I’m going to be covering during these videos.

I’m not going to read them all because, by God, we have things to talk about. But this is the stuff that I’m going to be covering. Most of it is already written.

Some of it’s going to need some additional tinkering and tweaking. But what the hell? I don’t… It’s not like I have to do… It’s not like I do this live. Because there are a lot of times when I’m recording stuff and something happens.

And I don’t know, I have to stop and curse at things for a while or, I don’t know, weird noises sometimes. Sometimes there are ghosts. But this is all the stuff that I’m going to be talking about in there.

Some of these topics individually I have talked about in other videos. But it’s sometimes helpful to have them as part of a larger series of things. So that people can have everything all kind of grouped together in a helpful thematic playlist.

If you’re going to make a playlist, it should have a theme to it, right? You don’t want random jumble of stuff in there. Not a good way for a playlist to be handled.

So we can get rid of this thing now. And let’s talk a little bit about ANSI set options. Now, like I said before, a lot of these things don’t really do much of anything, right? But there’s a kind of specific way that you want to set up your ANSI set options for a store procedure to make sure that all of the correct options are in place for when that procedure runs.

ORM connection strings often are not exactly the way they should be to be compatible with index views, computed columns, and filtered indexes. I’ve mentioned that a few times.

But there are a few settings that are stored with the procedure definition. And then there are a bunch that are not. So when I’m setting ANSI settings for a procedure, I need to make sure that all these things are set this way so that when I want to make sure that I don’t get errors, when I am modifying tables that have indexed views, computed columns, and filtered indexes, I don’t get errors when I try to modify those.

And that when I query tables that have any of those things, the optimizer is able to match the expressions in my queries to those different facilities.

There are a few other things that I like to set up in SOAR procedures as well. Setting implicit transactions to OFF is a little bit of a weird one, but comes in handy if you are calling store procedures from the JDBC driver or the PyODBC driver, the Python driver, connecting to SQL Server, because both of those use implicit transactions by default.

And you often don’t want that to happen. Now, granted, a lot of the store procedures that I write from scratch are my analysis type store procedures where I go in and I run stuff. So I don’t often have to worry about this, but I do try to set them up to be good examples.

Within my store procedures, purely for me, because there are times when I need to performance troubleshoot or debug or work with my store procedures in different ways, I usually put these in there.

You may not care about either of these for your situation. That’s totally fine. They’re worth talking about because I tend to use them. And then within a store procedure, I usually set both of these things to ON and ONE GO because I don’t really want to get row counts from every query that runs.

And I want to make sure that if one of my queries hits some sort of issue and fails that all of the queries stop happening. Now, exact abort ON is actually something that I don’t use in my analysis procedures because it’s not critical if they run through to the end, if there’s an error or not.

There are often transient errors that might happen at one point or another. And I don’t want to not get a result, like any result set back from them because I hit an error during one point in processing. So for me, this is a silly example.

In my human events block viewer procedure, sometimes a query will fail trying to look up agent job information or something like that. But the rest of the results are fine.

And the agent job information isn’t like super critical to me getting like, like to me figuring other stuff out. So I’ll just like, like, I don’t care if that fails.

I still want to get all the rest of the stuff back. So exact abort, you know, if you’re writing important production quality code, you most likely going to want that one on. But if you’re just writing some, like, you know, if it’s just a single select query or something, you probably don’t need that.

So before we go any further, let’s talk about the three settings that are stored, like persisted with the procedure in SQL Server. So since I already have all this stuff as I want it, when you look at the sys.sql modules, dynamic management view in SQL Server, if you look at what’s stored in there, you’ll see ANSI nulls quoted identifier is recompiled.

Now, this is only when you have a store procedure level recompile up here. If you just have option recompile on a query, like down here, like it’s not going to say that the store procedure is made with recompile.

That’s only when you create it with the recompile at the like crater alter with recompile, just adding an option recompile. Even if there’s one query in there, we’ll not have the store procedure marked that way.

Cause you could, you know, take that out something, coat it out. I don’t know. I mean, granted you could coat this out too, but then it would just disappear from down here.

So these settings are stored with the procedure definition. So these, these ones you tend to worry about a bit less, but then within the store procedure.

So like if you like, you know, just say set these for the connection and then create or alter your procedure, like those get stored in the sys.sql modules thing. The SQL Server is like, this is created with these settings.

Then there are the runtime settings that we care about for making sure that we get index views, filtered indexes and computed columns used correctly. And so that we don’t hit errors.

If we try to modify tables that are attached to any of those things. So anti-padding, anti-warnings, erythaboard, concatenal yields, and all of those all have to be on and numeric, numeric rounderboard has to be set to off.

And then, you know, like I said, JDBC and Python drivers, PyODBC, I guess, both use implicit transactions by default. So if you’re using either of those drivers and you’re calling store procedures, you probably want to set this off in the store procedure.

If you’re writing, you know, queries that, well, honestly, these are just for me and my analysis procedures.

There’s really no reason for you to put these in your store procedures, especially because at some point you might, like turn on query plans and want to like get the actual execution plans for your procedure, which is completely fine.

A normal thing to have to do for me, why I put these in my analysis procedures is because sometimes I’ll leave actual execution plans turned on by accident and go to run my procedure. I’ll be like, why is this taking so long?

And, uh, you know, that’s not a good time. So there’s a good reason for that. For me, there may not be a good reason for these for you. And then of course, uh, set no count on is a fun one, right?

We do it. Cause we don’t want row counts from every single thing that runs. And, uh, exact abort is a good one to turn on so that you don’t have like, like, like say you have a store procedure that runs and it does like an insert and then an update and then a delete. And let’s say the insert and the update run fine, but the delete doesn’t run.

So let’s say the delete hits an error and you don’t want the insert and the update to have completed and committed and like leave that data in there. You want those to roll back too, because something that you should have deleted maybe from that or from something else, uh, it will still be lingering around.

So you need to want to make sure that all three of those roll back so that you don’t have like dangling data sitting around in your table. So, uh, that’s all that.

Now, when I was writing this, I came across kind of a funny thing, uh, with the ANSI padding setting that, uh, that caught me off guard and was not in the documentation either for the ANSI padding, ANSI padding, ANSI padding.

I’m just gonna, good ANSI, good ANSI, uh, for the ANSI padding setting, uh, that I didn’t see documented anywhere. And that is that ANSI padding behaves differently with, um, uh, table variables and temp tables, uh, and that table variables do not respect your ANSI padding settings at all.

So, um, what I’ve done is I’ve, I’ve actually stolen some code from the documentation because I was like sanity checking myself on all this and being like, this is, this is too weird.

This, this can’t actually be this way. Uh, and, uh, let’s see, uh, I, just to compare what happens with a, with temp tables and with table variables, with ANSI, the ANSI padding setting changed to on or off.

Uh, I’m just gonna run this whole thing at once. I don’t think there are any errors involved, which is nice. Uh, uh, and so this is what happens. Uh, now with the table variable, you can see up here that, um, no matter what I do, the results, oops, the results are precisely the same between these, right?

And this is testing with the tape with ANSI padding on and off for the table variable. ANSI padding behaves in the exact same way for both of these. If you look down here with the temp table one, you’ll notice that with ANSI padding off, things do look different, right?

Like just to compare the differences down here with ANSI padding on, uh, and I, and on, I get trailing blanks, right? Uh, I get, right, like, like trailing blanks when I should.

And this, uh, this adds the zeros in, um, rather than, uh, not have them in for the bar binary column. But when I said ANSI padding off for the temp table, look what happens.

Much different, right? Like this, this row, uh, this row here looks a lot different from this row here, right? So like ANSI padding has an effect, has an effect on temp tables.

ANSI padding does not have an effect on table variables. So this is stuff that, you know, probably doesn’t make that much of a difference in the world because ANSI padding is a fairly rare thing to see people like messing with. But, you know, uh, just thought it was kind of a little bit of sort of interesting SQL Server trivia.

Probably the, like, I don’t, I don’t have a good reason why this is the case, why table variables don’t respect ANSI padding when temp tables do. The closest I got to an explanation was that table variables were added, uh, in SQL Server 2000 and did not come with the backwards compatibility for the ANSI padding setting, which was already like deprecated or something by then.

So that’s probably why, but, um, you know, that’s, that’s, that’s as far as I got with it. Anyway, uh, I think that brings us to is about as much as we can talk about for the ANSI settings in SQL Server. Uh, we’re going to talk about, I don’t know, whatever the next topic was next.

I think it was, I think it was commenting or code formatting, one of those two, but we’re going to talk about that next. So, uh, I hope you enjoyed yourselves. I hope you learned something. I hope that you will be careful and judicious with your ANSI settings when you’re creating and executing store procedures. And, uh, yeah, I don’t know.

I think that’s about the end of that. Um, cool. Yeah. All right. Well, we’re going to, we’re going to close this one out. We’re going to say goodbye. Uh, actually, you know what?

I, I realized I forgot, I forgot to sort of tell you that the reason why I set these at, at runtime here, there’s probably an important, important factor with this is that, uh, coming back to like what executes a store procedure.

If it’s an agent job, if it’s an ORM that doesn’t use the correct connection string settings, you can correct them within the store procedure and override the, whatever the connections, uh, context was from the ORM.

So that’s why I do that there. In case I didn’t say that before, I’m saying it now. Anyway. All right, cool. I’m going to leave now. That that’s, that’s enough. All right.

Goodbye.

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Some Thoughts On The Future Of Query Optimization

Posted on February 24, 2025January 22, 2026 by Erik Darling

Some Thoughts On The Future Of Query Optimization

Video Summary

In this video, I delve into my thoughts on query optimizers, specifically focusing on those found in database engines such as SQL Server and Oracle. I start by clarifying that this discussion is not about the human query tuners who optimize queries or my personal approach to optimization; instead, it centers around the software within a database engine that generates executable query plans based on cost models. I highlight how these optimizers make assumptions, often assuming a cold buffer cache and using generalized cost models that may not fully account for modern hardware capabilities or specific system configurations. Throughout the video, I explore various feedback mechanisms like memory grant feedback and adaptive joins, discussing their limitations and potential improvements.

Full Transcript

Erik Darling here with Darling Data. And in today’s video, we’re going to talk about some of my thoughts on query optimizers. Now, I don’t want you to get this confused. I’m going to make this very clear at the outset. I am not talking about the people who optimize queries like query tuners, like, you know, people like me and maybe like you, if that’s your thing, too, who, like, you know, rewrite queries, fix indexes, you know, change things around. So that they magically run faster. Nor is this my thoughts on query optimization generally. This is not going to be about, like, my approach to optimizing queries or what I look for, what I do. This is, this is more about, like, like, the piece of software inside of a database engine, a lot like SQL Server, that does the costing and the modeling and the query plan stuff that that gives you the cost of the software. gives you an executable query plan. And sort of like, how they can be improved in like the future of those things. Because I don’t know, I just I think I think what I think is interesting. So I’m going to share what I think with you. If you like this, this sort of content, and you want to support me recording these videos, I got to pay off this this camera somehow. You can sign up for a membership down in the link, the link in the video description below for as few as $4 a month. If you also have a camera to pay off, and you don’t have four extra bucks a month, you can like you can comment you can subscribe and gosh, seeing those numbers go go up just well, it doesn’t help me pay off a camera, but it does does do my sad heart some good. If you want to ask me questions about SQL Server performance, you can go to this link, which is also the video description below and you can you can you can insert as many questions as you’d like. Just please try to stay on topic. It’s all I it’s all I require of you. Now if you need help with SQL Server, from a consulting point of view, well, gosh darn it, I am still the reigning champion of SQL Server consultants. If you need help with SQL Server, checks, health checks, performance analysis, hands on tuning, query optimization. If you’re having a performance emergency, or if your developers just need some training because you’re having too many performance emergencies, well, guess what I do at a reasonable rate? That stuff, you’re in luck.

If you would like some high quality low cost SQL Server training content, again, all about query optimization, you can get all of mine for about 150 USD. Just go to that link, enter that that coupon code and all will be applied. There is a fully fully formed link for you down in the video description as well, just in case. SQL Saturday, that’s what it’s called. New York City is taking place in this year 2025 on May the 10th. I highly encourage you to attend. It’s going to be good. All you have to do is drag your behind the Times Square on Saturday, May 10th. It’ll be good for all of us. With that out of the way, let’s talk a little bit about this stuff. Now, query optimization. You have a query optimizer. I’m not talking about a query planner like Postgres has. I’m talking about a query optimizer like SQL Server has, like Oracle has, like many other good database engines have.

They make a lot of assumptions about stuff. I think the first thing that if you’ve spent any time thinking about or reading about query optimizers, like SQL Server is one of the first things it assumes is that you’re starting with a cold buffer cache. Like no data is in memory. Like any read you do is going to be a read from disk. And it has all sorts of costing things built into that. Like some are fixed costs like, you know, doing an I.O. Like what’s a random I.O. What’s a scan of an I.O. Like how much does it cost to like compute this scalar? A lot of things have fixed costs. Other things have sort of dynamic costs based on how many rows, the size of those rows, you know, stuff like that. Right.

And the thing that you run into with those sorts of cost models is that they are very, very highly generalized. SQL Servers is no exception. SQL Servers Query Optimizer does not know anything really about your system. Like it like there are some things that like SQL Server itself is aware of. Like when it starts up, it looks at your CPUs to kind of figure out like like what kind of SIMD support you have. It knows how much memory is in there so that it can work out memory grants, you know, stuff like that. But like as far as like, you know, what kind of storage you have, SQL Server has no idea. SQL Server doesn’t interrogate storage to figure out what’s underlying there. So I.O. costs are very, very highly generalized.

I.O. costs are also ancient, right? Like SQL Server still like costs I.O. like you’re on a spinning disk that has a huge penalty for random I.O. That’s why things like key lookups are costed very highly because the random I.O. So and Microsoft has been getting better about some aspects of the query optimization model by sort of like providing feedback to the model after a query or while a query executes.

So there are three three stages, right? There’s stuff that happens before a query executes. And you can think of that as stuff like table variable deferred compilation where SQL Server will be like, oh, there’s like you’re like like let’s just say you’re in a store procedure. You declare a table variable. You’re not using it. It’s not a table value parameter. It’s a table variable. You declare the store procedure. You insert some rows into it. So SQL Server will stop query optimization at that point and be like how many rows are in this table variable. Let’s come up with a query plan based on table cardinality.

Again, you still don’t get column level histogram information about what’s in the table variable, but you do know how many rows are in there. Perchance SQL Server might choose a different query plan if there are 10 rows in your table variable versus a million rows, right? That’s the hope anyway. Then there are some runtime feedback mechanisms like adaptive joins where SQL Server is just like, well, I’m going to do this loop join.

And if I hit enough, if I get enough rows out of this loop join, I’m going to switch to a hash join, right? So that’s like during intra execution. And then there’s post execution stuff. Memory grant feedback is the one that I think has been around probably the longest, but like there’s cardinal, cardinality estimation feedback in the product now too, where after a query executes SQL Server will be like, wait a minute.

I was way off on this thing. I should probably figure something else out. Other examples, you know, like the parameter sensitive plan optimization. That’s a pre-execution sort of mechanism to make things a little bit more, a little bit less ride or die for the optimizer.

So like, like, like, and I think these, these are very cool things because like one, one thing that a lot of people sort of miss about models. And I don’t mean like runway, sexy runway models. I mean like model modeling things, right? Like probably the most common thing that people will argue about models with is the weather, climate, stuff like that.

One thing that like a good model will take into effect is feedback, right? So like, let’s say you predict that like, like you have a query optimizer and you assign all these cost models to it. If your cost models don’t get any feedback on how good they did or how well they did when a query ran, your model is just going to keep making the same mistakes over and over again.

That’s why, you know, for a long time, you know, SQL Server would choose a query plan. And if that query plan was good, you were real lucky. If that query plan was bad, you were real unlucky because SQL Server would keep using that query plan and you would keep getting the same bad plan.

And, you know, it would be up to you to figure out some scheme of rewrite, indexing, you know, breaking stuff up into temp tables, whatever, in order to improve that. So I really like that Microsoft is actually providing more and more feedback mechanisms into the product. But at the same time, a lot of those feedback mechanisms are very, very conservative in how they’re like conservative in how they’re applied.

So, you know, any sort of like automation for that stuff, it’s like SQL Server has to collect a lot of metrics about how things are going generally, whether it’s whether it’s right about cardinality estimation, whether it’s right about memory grants, all that stuff. And then it has to say, OK, well, you know, most of the time I’m right about this.

Sometimes I’m wrong about this. Or, boy, I was just wrong right out the bat on this. But I think one thing that is going to have to start happening is it’s like it’s going to have to start rethinking alternative plan costing.

It’s cool if, you know, you figure out that cardinality was wrong for something in a query and that maybe costing and plan choices after that were incorrect. But just improving the cardinality estimate may not improve the overall plan that you get. Like you’re going to have to start thinking about like completely like like how that cardinality affects other plan choices.

And if perhaps a completely different query plan would be more suitable. This is because like especially like zooming in on this part, you know, you can have like a like and like I’m going to say this off the bat. I think like if you if you look in the world of query optimizers out there, I think Microsoft probably has the best one like across the board.

There may be specific things that other query optimizers do better or they have different like ways of handling things that are better. But I think just as a general cost model, I think I do think that SQL Server has the best one regardless of how much time I spend complaining about it. But, you know, like when you think about sort of like the quality of the assumptions you start with, Microsoft could probably do probably do better with IO costing.

And for that, the quality of the quality of the data you start with is mostly up to you. Right. Like, you know, how how up to date your statistics are, like, you know, how like if they’re full scan versus like the default sampling, things like that. And then like making sure that you’re gathering sort of the right metrics about the, you know, like the runtime metrics of the query when it runs mean like, wait a minute.

These are all the places I was wrong. This is where stuff took a long time. I need a different I need to like I need to like recompile this branch of the query plan because this was completely bonkers. I shouldn’t have done a loop joint here at all.

Like I should have done something way different. So there’s a lot of stuff that, you know, like and this is these are things that have been improving as server hardware improved. Right. Like, you know, if you think about when these sort like like query optimization first started, computer hardware was very, very limited.

Like you just didn’t have like any extra like there was no extra gas in the tank for stuff like this. Right. Like you like your hardware was dedicated to that workload. And, you know, I hate to say it, but the cloud does allow for a lot more flexibility with these things because you have like other hardware you could do this on.

Right. Like copies of the data, things like that. And, you know, you can start considering all sorts of like alternative plan stuff like early aggregations. Maybe, you know, I should have used like a batch mode on rowstore here.

I was wrong about this thing. Maybe I should have used a hash join instead of a merge sort merge join, things like that. There are lots of there are lots of like alternative costing scenarios that like need to start going out there.

There are some query optimizers I’ve learned that will come up with a set of like candidate plans and then run all of them. And whichever one comes back first, that’s just the winner. Right. Like that’s that’s just the one that gets used for the future.

So that that’s kind of a cool thing, but I don’t expect Microsoft to do that because who wants to run three copies of the same query on the same database? And like, you know, one one finish it like let’s say like like one finishes in 30 minutes and one like the other two are still running. Like you just going to like like let that one run for 30 minutes and kill the other two off and then start thinking about that one that ran for 30 minutes being like, OK, well, what can I fix in there?

Now, what was I going to say? Oh, man, I lost it. I was on a I was on a I was on a roll there and then I completely, completely, completely lost it. I talked I talked about this already, how all optimizer model assumptions are generalized to work across lots of different hardware.

Basically, that means like if your if your server had like four cores and 12 terabytes of memory or if it had, you know, like 128 cores and four gigs of memory, you would get the same execution plan on either set because SQL Server would have costed any like all that stuff equally like regard again, regardless of the hardware. There are lots of things that are sort of like out of model constructs. Just probably the simplest example of that is like let’s say that you pass in a blob and it could be XML, it could be JSON.

It doesn’t really matter which one. Like SQL Server has no idea what’s in that blob. Even when you parse that blob out, it’s not doing any statistical analysis of what actually came out of the blob.

You get you very much get fixed cardinality estimates and fixed costs based on just like SQL Server knows that it has to do something with XML or JSON. It doesn’t stop to look at what was in there. That’s why a lot of stuff with like string splitting or like like, you know, runtime XML and JSON parsing where you’re like joining to the result of it or like it’s an aware clause or something.

You just get terrible plans and cardinality estimates from because SQL Server is not like pausing to do that. Is it possible to do that? Yes.

You could have a rule where like any XML that you parse like or JSON that you parse or string that you split, you know, like would do some. Well, I mean, with XML and JSON, you have like pretty explicit with the types. You can be wrong about the types, right?

You can parse an integer out of JSON as a varchar max if you’re stupid, but SQL Server could like put that into a temp table, look at what’s actually in there and then do some work with that. But it just doesn’t do that yet or may never do that. I don’t know.

There are also like, you know, I want to say impossible to model constructs. Like, because the XML and JSON stuff, maybe that would fall into it because you would have to then trust the, like whoever parsed or pulled the stuff out of there to make the right choices, which is tough to do on a good day. But, you know, that would certainly fall into difficult, maybe not impossible.

I think sort of impossible to model stuff are things where like there are like there are no good statistics on things and there are will probably never be any good statistics on things. Table variables are probably the easiest example of that where, you know, like I’ve talked about a billion times on this channel. You get like there are ways to get table level cardinality to know that if there are 1,000, 10,000, 50,000 rows in a table variable.

But there’s not a good way to figure out if there’s like what’s like there are no statistics on what’s what actually makes up those 10, 50, 100,000 rows. So at some point, the optimizer is, I think, going to be less based on like like improving the cost models. And there are just going to be more rules that get applied and there are going to be more optimization choices where there are feedback mechanisms for it and where the model starts getting like more and more feedback about how it did when things ran.

Because that’s the only real way to improve things. Right. Like you can only model costs for so many scenarios.

At some point, your cost model needs to get some feedback and use that to adjust costs so that like when queries run, SQL Server has a much better idea of what to expect. Right. Like SQL Server has a much better idea of like, OK, like this choice, like when I run this query, this choice always turns out bad.

I need to make a different choice here. Right. I need to do something different.

So I don’t know. That’s about the end of that. But I think there’s a lot of room for adding in mechanisms like this. There are probably there’s probably a lot of room for adding in additional rules that you could apply to this sort of feedback.

And there’s probably a lot of rules that Microsoft could include. Or there are probably a lot of like planned exploration spaces and heuristics that Microsoft could include to do things a bit smarter. You know, my favorite example is when you join on an OR clause.

SQL Server should be able to unroll that into like a union or union all of like both sides of the OR clause with the correct sort of filtering out of things for one side. Because that’s a that’s a really big thorn in a lot of people’s sides when they start writing these queries. But anyway, I got to get going.

I forget. I have something to do at 2.30. We’ll figure out what it is in a minute, I guess. But thank you for watching. I hope you enjoyed yourselves.

I hope you learned something. And I will see you again in the next video. Goodbye. Goodbye. Bye.

Going Further

A Little About Catching Queries That Fire Triggers

Posted on February 21, 2025January 22, 2026 by Erik Darling

A Little About Catching Queries That Fire Triggers

Video Summary

In this video, I share a simple technique for figuring out which queries are firing triggers in SQL Server. I walk through creating a logging table and a trigger that captures the query text as XML, allowing you to see exactly what statements are causing your triggers to execute. While this method isn’t suitable for legal or compliance purposes due to its batch-level capture of queries, it can be incredibly useful for troubleshooting and understanding which operations might be problematic in your database environment. If you’re interested in more SQL Server content, consider supporting my channel with a membership for just $4 a month, or engage with the community by leaving comments, subscribing, or submitting anonymous questions.

Full Transcript

Erik Darling here with Darling Data. Another beautiful, sunshiny day here. I can’t wait for winter to be done. I grow weary of this weather. I don’t like being cold. Probably the biggest impact it has on me is that I have to wear full-length sweatpants to the gym instead of my usual very comfortable 5-inch jogging shorts which allow for a much more comfortable range of motion across the barbell lifts. But I guess the one thing that keeps me hanging on right now is that there’s about two weeks until pitchers and catchers report to spring training. And that’s… I just wish I would have to go to the gym. I just wish I could sleep until then. Anyway, enough about me. Let’s talk a little bit about how to figure out which query fired that trigger. Now, the technique that I’m going to show you, I’m going to say this multiple times. If you need to audit things for legal compliance reasons or anything else, I would not absolutely swear by this technique. But if you just need to get a sense of which queries are firing this trigger, trigger off for some other troubleshooting purposes, this will probably get you where you need to go. Alright? So, for any lawyers out there, or for anyone out there who’s going to have something smart to say about what I’m going to show you, make sure we temper this expectation. We set this level appropriately before I even show you a lick of code. Alright? So, just remember that.

If you like this channel, if you think, wow, that Erik Darling sure has useful tips about SQL Server and how to dress to do barbell lifts, you can sign up for a membership. And for as few as $4 a month, you can support my efforts to bring you this amazing SQL Server content. If, for some reason, I don’t know, maybe you spent all your money on something that you thought was cool, like, I don’t know, gym membership. You don’t have $4 a month for SQL Server training. I totally understand.

You can do all sorts of free things that help this channel grow and expand, like your muscles at the gym. You can like, you can comment, you can subscribe. If you are interested in asking me questions, either about stuff that I say on the channel, like, not just leave a comment on the video, but if you’re just interested in asking me questions generally, you can go to this link right here, and you can submit a completely anonymous question that I will answer publicly.

Keep in mind, this is not private consulting advice. This is stuff that I’m going to answer live. Well, not exactly live. I’m going to record answers to it, and then I’m going to put them on YouTube. I’ve recorded my first one, and you’ll see it soon. So, keep that in mind.

All right. If you need SQL Server consulting, I am the best SQL Server consultant in the world outside of New Zealand, especially when it comes to these things. And, of course, my rates are reasonable. Just like my shorts.

If you would like some very high-quality SQL Server training content, by golly and gosh, I’ve got that too. You can get all 24 hours of it for about 50 USD. Again, down in the video description, this is all assembled for your lazy behind.

It’s probably not going to the gym. You’re probably already completely neglecting all of your New Year’s resolutions, and I’m very disappointed in you. You should take your health seriously.

As far as upcoming events go, well, SQL Saturday, New York City. The one, the only, the yearly event is taking place on May the 10th of 2025 at the Microsoft offices in lovely, crime-free Times Square. So, I look forward to seeing you there if you happen to find yourself in the area.

If not, well, I don’t know, I guess we’ll have to meet again some other time. But with that out of the way, let’s talk about how we can accomplish this amazing feat. Now, I’ve got a couple of tables here.

One of them is just to have a table to put a trigger on. We need a table to put a trigger on. Well, I guess we could use a view, too, but then I’d have to create a table and a view, and, well, I don’t know.

For four bucks a month, you only get a table, okay? All right? That’s what you get. And then we have a table that we are going to log some stuff to.

Now, there are lots of other columns you could put in here. There are many things you could put into this table that would give you more contextual information about the user’s stuff that fired the trigger. But I’m just creating a simple three-column table here.

And in this table, I am putting, if, again, zoom it, zoom it, zoom it. It only takes me hitting control and one four times to get it to work these days. That’s a nice change of pace.

At least it’s an even number. That’s what the number four has going for it. But the stuff that I’m going to stick in there is pretty simple. I’m collecting the username.

I’m collecting the time that it happened. And I’m collecting the query text as XML because I like, this is me personally, I like being able to click on a query and have the whole query appear in front of me. I just like the simplicity of it, so I’m using XML for this.

All right? And zoom in is going to keep messing with me today. Great. And then I have this trigger that I’m going to be using. And, you know, again, not production ready, not legal compliance ready level of stuff here.

But just to show you how this can work, this is what I’m using. Okay? So I’m going to catch inserts, updates, and deletes with the trigger.

IRL, that’s what the kids call at work. I would maybe not want to have inserts, updates, and deletes all caught by the same trigger unless I was doing something very, very simple. And in this case, I am.

All right? This is simple enough. Now, one thing that I think is critically and crucially important to start every single trigger off with is this little block. All right?

Before you do anything else, if row count big equals zero, if nothing actually changed, bailout. I like to have in my if blocks, even if I’m just doing the one thing after, I like to have begin and end in there. You only have to type it once, and it makes very clear what one thing you are doing.

All right? Sure, it’s a little verbose. It’s a little extra typing. But the nice thing here is your intentions are perfectly clear.

All right? Even though it would work to just say if row count big equals zero return, well, I like to make sure. I like to aim for very clear code as much as possible.

That may not show in all of the code that I’ve ever written in my life, but the older I get, the more I appreciate it. All that stuff. All that stuff.

And then, of course, we want to set no count on just in case. And then all I’m going to do in here is a, well, what should have been a relatively simple insert. Should have been.

But it turned out I had to do a little bit of extra typing here. I’m not sure that this is living up to my recently professed standard of code clarity. But, you know, it got me where I was going.

Where I’m going to be inserting. So, like, you can’t put for XML path out here, like in the insert. You have to nest it a little bit in here.

So, I’m just taking the event info column from this. Oh, boy. Where did you go? Ah! From this lovely DMV, sys.dmexec input buffer.

And I’m feeding the current SPID and the current request ID into sys.dmexec input buffer. So, I get a little bit more precision from the statement that I’m executing. And I’m going to put the text in there.

And this should all be pretty well set up for us. You can just make double sure there. And then down in this section, I have a begin transaction fully spelled out. And a rollback transaction fully spelled out.

So, and then within those, within the confines of that transaction, I am doing an insert, an update, and a delete. You might notice that I am inserting a row. I am updating a row.

And then I am, what do you call it, rolling that back. All right. Now, that’s about it. You might be wondering why I’m inserting the value 3 there.

Well, it’s because I had already inserted two rows and I don’t want any primary key errors to show up and make me look like a fool in front of my 6,000 dedicated, devoted data darlings out there. And I’m just going to run this real quick. And so, it’s going to run an insert and update delete.

And that is going to select from the logging table. And, you know, why I said this is kind of quick and dirty and why I said that it would probably not be great for legal slash compliance reasons is because it does capture, like, a batch of text. Right?

Like, we have the whole batch in here. And if you look at the query, and part of the reason why I like the XML is because I can just click on this and we can see everything. So, like, it has everything in here. So, if you had multiple statements in here that were, like, inserting, updating, or deleting from a table, then you would have, like, the whole thing.

Right? So, like, it captures a block of text. But, like I said, if you just need to kind of figure out where some stuff is coming from, or, like, you know, like, if you have, like, an ORM doing this.

Like, with the store procedure, you would get a whole lot of text back. But if you have, like, an ORM doing this, a simple CRUD app that’s sending, like, one statement in and that’s the end of it, this is probably going to be good enough to get you what you need to figure out which statements are firing the trigger and maybe causing problems for you in some way. Or if you just want to figure out, like, what actually does something that this trigger cares about?

Well, this is a good way to do it. So, pretty simple, pretty simple logging mechanism there. Again, not great for legal or compliance standards, depending on how rigorous those standards are for you.

But I don’t know. But, yeah, you can totally use this to just, you know, quick and dirty kind of figure out which statements are causing your triggers to go, pow, pow, pow, pow, pow. I don’t know.

I don’t know. Do something that you care about because you shouldn’t log things if you don’t care about them. If you’re never going to look at them, if you’re never going to analyze them, if you’re never going to review the data, I would not recommend logging the data because it is an extra step. And under high concurrency, you would be inserting, you know, you would be doing some extra work in the trigger.

So, you know, just be judicious in all things that you do in your database. The funny thing about databases is there’s just as much to get right in them as there is to get wrong in them. Takes a thread that needle sometimes.

Anyway, thank you for watching. I hope you enjoyed yourselves. I hope you learned something. I hope that you will be judicious in all things that you do in your database. And I will see you in another video, another time, another place.

All right. Cool. Thank you. Goodbye.

Going Further

SQL Server Performance Office Hours Episode 1

Posted on February 20, 2025January 22, 2026 by Erik Darling

SQL Server Performance Office Hours Episode 1

Do you like me? Y/n

Does Erik prefer nested loops or fruit loops?

If Erik was an operator in an execution plan plus, what operator would he be?

Sometimes my sql server quits responding so I reboot it but then it takes forever to come back online. Is there an easier way to corrupt my data? PS I’m running on prem with azure disks so latency is a little high but usually performance is good. Also hi Erik, this was a test to see how many words I could fit into a form.

Can you fix my server performance for free? Here’s my execution plan <?\stmtsimple(@2$?@38,!@/&!;,$-“2(,&~¥’c/&1@,’]+_•

Can I edit my response?

Are you free later

What is the best Champaign to buy someone when you find out they don’t have to use Microsoft Fabric?

Hi Erik. Im having huge problems with a table that stores text files in a varchar(max) column. Can you suggest an alternative to that? Thanks

Where’s the beef?

What do you wish that more DBAs have read? Specific books, blogs, etc.

To ask your questions, head over here.

Video Summary

In this video, I dive into some interesting and varied questions submitted by viewers for our inaugural Office Hours episode. We tackle topics ranging from the nature of serial nested loops in execution plans to more complex issues like performance tuning and data storage strategies. Whether you’re curious about SQL Server’s behavior or looking for practical advice on improving your database management skills, there’s something here for everyone. I also share details on upcoming events like SQL Saturday New York City 2025 and offer insights into how you can support the channel through membership or other means. Join us as we explore these questions together and continue to grow our community of data enthusiasts!

Full Transcript

Erik Darling here with Darling Data. And we’re going to try something a little bit different today. I believe that I have fielded enough high-quality questions to do our first Office Hours episode. So we’re going to try this out. We’re going to see how it goes. So I hope you’re all prepared. It’s been a little while since I’ve done one of these. But here we go. If you like this channel’s content, if you like watching Intel, time out looking for drivers, which is always a good time, you can become a member of the channel. You can join 50 some odd other people who have said, Erik Darling, thanks for all you do. Here’s four bucks a month. And I appreciate every last one of you, as individually and as a group. If you don’t have the four bucks a month, I mean, you know, Christmas did just pass and a lot of people are still struggling with credit card debt. There are all sorts of free things you can do that help me grow this channel, like liking and commenting and subscribing. If you want your question to appear on a future episode of Office Hours, you can go to that link right there, which is also in the video description. It’s amazing how this works. And you can you can put your high quality question about SQL Server performance in the question bar and I will answer it right here. If you need help with your SQL Server, if maybe the answer you got was not quite enough to get you over the proverbial finish line, I am available to do all of these things with SQL Server health checks, performance analysis, hands on query and index tuning, dealing with your SQL Server performance emergencies and training your developers so that you can avoid SQL Server performance emergencies in the future. If you would like to get people trained, if you would like to get yourself trained, you can get all of my performance tuning content for about 150 USD. Again, this whole fully assembled link for you is available in the video description. For upcoming events, SQL Saturday New York City 2025 is still taking place on March 10th, May 10th. Good Lord, I can’t even read today. It’s gonna be interesting reading questions, isn’t it? May 10th at the Times Square building in May 10th.

So you should go to Manhattan. So you should go there and come learn some more stuff about about about data. It’s not just SQL Server, it’s data in general. So with that out of the way, let’s, let’s, let’s go answer some of these questions. Now, these are the questions that I have fielded so far from you, the public. And we’re gonna we’re gonna work our way through these. So the first question we have here is do you like me? Yes or no? Well, I mean, the obvious answer is no, I don’t like you. I love you. Why? Why would why would we stop it there? Why would we stop this? Why would we be such a low ambition? Why would we stop it there? Why would we stop it there? Why would we stop it there? So why would we stop it? Why would we stop it? And I would try to get it, and I would like to answer it.

serial nested loop on its own. So as long as the loops are parallel, I’m happy with either one of them. They are equal in my eyes. If Eric was an operator in an execution plan, what operator would he be? Well, I think that’s obvious. It would be delete. Let’s see here. Sometimes my SQL Server quits responding, so I reboot it, but then it takes forever to come back online. Well, accelerated database recovery would be a pretty good option for you, assuming that you’re on SQL Server 2019 or up.

Most of the reboot time of SQL Server comes from all that pesky transaction log stuff, and accelerated database recovery helps you avoid all that. Then there’s another question here at the end. Well, I mean, I guess that first one was more of a statement. Is there an easier way to corrupt my data?

Well, you know, you can always use no lock hints. You can always use dbcc write page. There’s many easy ways to corrupt data. All depends on how far you’re willing to go. Here we have a really good question. Can you fix my SQL Server performance for free? Here’s my execution plan. Well, believe your execution plan has become corrupted. I’m not sure what is going on there. It looks like it started off strong with a parameter called two. So this looks, this looks to me like perhaps an issue with simple parameterization, or perhaps this is an ORM generated query, because this is, this is not the type of parameter that I would expect a human being to name right there. And then we have, oh dear, we go up to parameter 38. That’s a, that’s a big one. 38 parameters. Perhaps you’re dealing with long in clauses. I don’t know. Something is amok there. Yeah. All right. Let’s, let’s, let’s move on a little bit. Let’s see. Uh, can I edit my response? I guess, I guess we never, I guess we never got around to testing that. Uh, yeah, the, the, the actual answer is yes, you can edit your response. If you, if you type in your question and then you’re like, wait, did I make a typo? You can edit your response. So that is a big resounding yes there. Are you free later? Well, this is not the type of question that someone would ask if they were confident in their knot work. I would suggest if you’re wondering if I’m going to be free later, you should learn how to tie better knots or buy stronger rope or do something else that would help you be more confident in whatever scenario you’re envisioning. What the, someone who really maybe perhaps should have edited their response. What is the best champagne to buy someone when you find out they don’t have to use Microsoft fabric? Well, assuming you meant the champagne that you drink and not the, the champagne that is, uh, I think, I think that’s how the lake is spelled. I was never quite that good at geography. Uh, but I assume that if you mean the, the, the, the drinkable potable sparkling wine, um, I, I, I enjoy Krug quite a bit. Uh, I enjoy Ruinart quite a bit. Uh, they make a very good Blanc de Blanc. Uh, so either one of those would be sufficient, but I just, I just want to point out that no, no one has to use Microsoft fabric. In fact, I would, I would do my best to dissuade anyone from using Microsoft’s beta version of Databricks. It’s, it’s, it’s a real joke. Um, it’s, it’s probably got about 16 months left until it’s something else. So, you know, don’t get too attached. Whatever, whatever bottle of champagne you buy will probably last longer and have a longer shelf life than Microsoft fabric. All right. So, uh, hi Eric. Hello. Hi. Uh, I’m having huge problems with a table that stores text files in a Varkar max column. Can you suggest an alternative to that? Well, um, you know, it’s, it’s, it’s a fairly well, well trod path, uh, that you should not be storing, uh, uh, blob files or blob data in your, in your database. It’s going to end up, it’s going to end up being rather painful for you. Uh, the smart thing to do there is of course to, to store a pointer to the file system in a reasonably sized, it could be a Varkar 500 some odd thing. Uh, usually that’s usually about enough to store a file path depending on how, how deep your directories go. Uh, and you could just store a pointer to that file on disk and, and just access that file on disk rather than try to store that in your database. Uh, file stream is a real lousy feature. I wouldn’t, I wouldn’t, I wouldn’t expect anything out of that if that’s what you’re using. Uh, if you’re just storing the, the, the data is, uh, well, I mean, you’re, you says you’re storing it as a Varkar max and not a, not a binary. So it’s not even anything that you could convert. If you’re just storing the, the text of the text files in there, well, God help you. Um, uh, I think if, if, if you’re, if, if you’re storing the text of the text files in the column, which is a different scenario, uh, what you would probably want to do is, uh, put that into a different table and maintain the primary key from both tables and just do lookups to, uh, the, the, the max column in this sort of lookup table for that. Um, that, that’s, that’s usually the best solution there. Uh, there’s also an SP table option option to store, uh, all blob data, uh, off row, uh, which can sometimes help with some things. I’ve messed around with that a bit in the stack overflow database and the posts table with the, the, the, um, the body column, cause that can be very long. That’s also an, that’s an Embarkar max field. Uh, and it, it certainly has some interesting stuff that it does, uh, to performance there. But, uh, yeah, uh, in general, um, depending on what your exact scenario was, is not a little bit light on detail here, but if you’re storing like blob data, like, like, like a file is a file in there, like file stream or something, uh, I would rather store a pointer to the file on the, on the, on disk somewhere. Uh, if you’re storing the contents of text files in a column, uh, I would prefer to separate that blob data out to a lookup table with maintain the primary key between the two. And then, uh, just make sure that whenever you, only if you ever need that stuff, uh, to, to, to join to that table, if you don’t, so that way it just gets in the way of less things.

All right, moving on here. Uh, where’s the beef? Hopefully in, hopefully in the fridge. Uh, or if it’s not in the fridge, then hopefully it’s, it’s in the process of being cooked. Uh, I, I, I would recommend a, a cast iron skillet for that. If you’re cooking beef, it’s probably the best, the night, the best way for you to treat your meat. It’s with a cast iron skillet. It’s well seasoned. Uh, if you, if you need, if you need fuller cooking instructions, I’m happy to provide them for you. All right. And our final question today, uh, what do you wish that more DBAs have read?

Specific books or blogs, et cetera. Well, that’s an interesting question, uh, because I don’t think that the problem is that, uh, there’s a lack of things that DBAs have read. I think there’s a lack of things that DBAs have comprehended. Uh, the problem isn’t often with like putting your eyes on words.

The problem is often actually comprehending what those words mean and how those words translate to action in the database. Um, I, I suppose it would be helpful if DBAs read some books on economics, they could understand that really nothing is free. That’s an important concept to learn in databases.

Everything has a trade-off. Um, I suppose, I suppose Milton Friedman would be useful in that regard. I don’t know. He seemed, he seemed, he seemed keen on that concept. Um, but as far as like specific SQL Server stuff, I mean, selfishly, of course, my blog over at erikdarling.com, uh, you know, from a performance point of view, uh, I think that, um, you know, if you’re, if you want to start like way back foundational material, uh, Craig Friedman, uh, F-R-E-E-D-M-A-N, like Friedman, Craig spelled the normal Craig way, uh, wrote a lot of stuff that is still very applicable about how a SQL Server works today. Uh, of course, Paul White, um, still, still writes to this day. Craig hasn’t written in quite a while, but Paul writes quite a bit. Um, uh, Paul, Paul is one where you, you do need to engage reading comprehension fully, though. That’s, that’s, that’s good there. Um, and is, I think another thing that’s very important to read is actually the documentation. Um, you know, the, the, the, the doc, the documentation for SQL Server is certainly flawed and it is certainly incomplete in some ways, but, uh, I do think that, um, many of the docs articles are at least reasonable attempts to, uh, teach you exactly how, well, teach you how something was designed, perhaps not how it works, uh, perhaps not what it was intended to work with, but at least to tell you how it was designed and, you know, give you all the, you know, the syntax for things. And there’s a, there’s a reasonable amount of stuff in there. Uh, as far as books goes, um, I would actually, uh, go back to, um, Craig Friedman there. Um, uh, so Kaylin Delaney, uh, used to write a lot of books about SQL Server. And back when she was writing these books about SQL Server, uh, a lot of people from Microsoft would contribute. Craig Friedman being one of them. Connor Cunningham was, uh, gotten involved with one or two of them, but there were a lot of smart people who were involved. I think Kevin Farley was in, uh, at least one of them. There’s like a whole list of like smart people who would contribute to her books on SQL Server. Uh, and you know, people who worked for Microsoft and had certain insights, uh, into how the engine worked that would be beyond what a normal person could reasonably surmise just from using it. So a lot of those books are still very valuable for foundational material. Granted, a lot of the specifics on Slava Ox was in on one of them.

Uh, but there was a lot of, uh, there’s a lot of good foundational materials there that sort of gives you a good idea about how SQL Server works as a database, how it functions, um, how a lot of, you know, the, like, you know, the, uh, like monitoring, uh, query plans, query execution, storage engine stuff. Granted some of the, like the details of that has changed over the years because of something, a lot of stuff about SQL Server has changed over the years, but it’s a very good way to get, um, a good sort of just deep dive into how databases work. Um, if we want to go beyond books and blogs and you just want like good general database knowledge, um, I think I mentioned this recently in another video, but, um, the Carnegie Mellon University database group, really smart guy named Andy Pablo pub, like puts all his lecture, his class lectures and notes and slides and like class projects. Like they’re all available on YouTube for free. You can sit there, you can watch them. It’s about an hour of class. Um, he has, you know, like, you know, he has spring fall semesters, there’s advanced, there’s intro to databases. There’s one going on right now about query optimization.

And if you just kind of want to learn a little bit about like, uh, like, like real date, like database internal stuff, not necessarily specific to SQL Server, but good database internal stuff, that’s, that’s a, that’s a very, very good resource. I can’t think of anyone who, uh, who has material, the caliber that Andy does, um, with like the, like the up to dateness of it. Cause it’s all, it’s all like current, he’s like still teaching. So there’s a lot of great stuff there. So, uh, that’s, I mean, I realize that’s somewhat of a roundabout answer. Um, and many of the things in there, you probably would have, uh, come across on your own already. Um, I think another good resource, uh, would be the database administer, administrators stack exchange site.

Um, a lot of questions that people have about SQL Server or databases in general have already been answered there. Um, and you know, like it’s not always, uh, always the easiest to find the exact thing, like granted, like site search is not, not a hundred percent there, but if you type in like some basic, uh, keywords and you, or, you know, you, you’re looking like for like kind of like a, like a subject or, you know, there’s like just something like some specific thing you’re after, you can probably find it there.

So always check the, like, you know, like the database administrator stack exchange site has a lot of great resources on there as far as Q and A goes, you know, where, you know, and it’s not just like, there’s like crappy fly by night answers either. There’s a lot of good detailed responses in there.

So that should, that there’s another good place to look when you are, um, when you’re trying to figure out, uh, or figure out a database problem, or you have a question about, uh, how to do something in a database or how something in a database works. It’s another very good resource. Anyway, uh, that brings us about to the end of this, this here Q and A. So, uh, I’m gonna, I’m gonna get going, uh, wait for some more things to filter in here. Again, if you would, if you would like to add anything to the Q and A, if you would like to be part of the Q and A, if you would like me to answer your question, uh, you can go to this link. The, the, the, the, the, the link is fully fleshed out down in the video description.

So you can click on that and you can, you can, you can submit your question and I will, I will put it up on the screen and answer it. That’s, it’s about how it works. So anyway, thank you for watching. I hope you enjoyed yourselves. I hope you learned something and I will see you in the next video and, uh, hopefully I will be reading your questions on a future office hours.

There we go. All right. Goodbye.

Going Further

A Little More About Parameter Sensitive Plan Optimizations In SQL Server 2022

Posted on February 19, 2025January 22, 2026 by Erik Darling

A Little More About Parameter Sensitive Plan Optimizations In SQL Server 2022

Video Summary

In this video, I revisit a subject that I explored in an earlier recording—parameter-sensitive plan optimization introduced in SQL Server 2022. Despite initial excitement about potential improvements and changes, my recent re-examination revealed that Microsoft did not make significant advancements to address the issues I highlighted previously. The feature still faces challenges, such as its reliance on a specific skewness threshold for histograms, which can lead to unexpected results. To help viewers understand these nuances better, I also discuss new insights shared by Paul White in his Twitter article, providing additional context and detail that were not covered in my initial video.

Full Transcript

Erik Darling here with Darling Data, and I am revisiting a subject. Of course, Intel likes to revisit checking for drivers, but I’m going to revisit a subject that I recorded a video about. I forget when, but I was very excited because it was about the parameter sensitive plan optimization that Microsoft introduced in SQL Server 2022. And, you know, I’ve had some, you know, general complaints about it. Stuff like the extended events for it not being so great. It not kicking in in situations where I really thought it should. And, you know, like, a lot of the stuff around the way that it decides which query should use which plans and the way that it, like, detaches the queries from the dynamic, from the SQL SQL, they call them. So, like, like, like, like, like, usually with parameter sniffing, you know, like, my, like, my go-to for many years has been to use dynamic SQL, which, when you use dynamic SQL in the context of the stored procedure, that SQL is no longer, like, closely associated with that stored procedure. There’s nothing in the resulting plan XML for the dynamic SQL execution, the execution of that, that ties it to the stored procedure. That’s why you always helps to put a little comment in there that says, I came from the stored procedure.

And, you know, you would think that, you know, Microsoft being in charge of everything involved in it could do something crazy, like, you know, overcome that. But no, they stuck an object ID in there. But, like, you have to parse out a bunch of XML and get that. It’s a nightmare. So, like, Microsoft really dropped the ball on that part. But I got excited because, you know, one thing that Microsoft says quite a bit is, like, well, we’re not fully documenting this because we want to be able to make changes to it and, like, you know, cumulative updates. It’s, they don’t, they don’t do service packs anymore, because that was too hard, I guess. But, like, they want to be able to, like, like, make tweaks to things. Not that I’ve ever seen much evidence that they actually make, like, a lot of tweaks to things without having to, like, redocument stuff or have, like, you know, oh, this documentation applies to SQL Server 2022 up to cumulative update nine. But after that, it’s this different. It’s this. Like, so I understand why they don’t want to do that.

But, like, here I was thinking, wow, hey, maybe they fixed something, like, an accumulative update that made this better. But they didn’t. It turns out that Erik Darling of Darling Data left some dangling data in one of his tables. And that is why we got the parameter-sensitive plan optimization. Of course, this, some of the, I’m also re-recording this because there was some new information that my dear friend Paul White blogged about, or, I don’t know, wrote an article on Twitter about, or whatever you want to call it, and documented a little bit of the internals of it that I, like, I wasn’t aware of. So we got that stuff to talk about. So before we do all that, let’s get into our relationship, how you and I can successfully coexist and cohabitate this plane together.

I forgot to, I forgot to move on to that slide, didn’t I? I’m not re-recording all that. Sorry. We’re just going to have to deal with that. If you would like to support my efforts in bringing this incredibly high-quality SQL Server content to you on YouTube, you can become a member of the channel and join the 50-some-odd other folks who are giving me $4 a month, or sometimes more, by going to the video description right down here.

If, you know, you ran out of money, I don’t know what happened, maybe the recent downturn in the stock market has taken its toll on your finances, and you no longer have $4 a month to spare, you can do all sorts of free stuff like like and comment and subscribe. If you would like to ask me questions that I will answer on this YouTube channel, I just started collecting them, so I haven’t done one of these yet, but if you would like to ask me questions, you can go to this link, which is also in the video description, and submit a question, and when I’ve piled up enough of them, I’ve already got a few good ones in there, right?

When I piled up enough of them, I will record answers to them, and I will answer your questions. Just remember that these are being asked privately, but answered publicly, so don’t put anything in there that might make you feel bad at a later date. If you enjoy the things that I talk about here so much, and they hit so close to home for you, that you’re like, wow, we could sure use the kind of consulting this Erik Darling does, well, you can hire me to do any of these things, and as always, my rates are reasonable.

If you would like to get some training content from me, you can do that too. Again, all of these links are way right down there, all you have to do is scroll and click a little bit. You can get all 24 hours of my performance tuning content for about $150 US dollars.

Try beating that. It’s a tough one. Upcoming events, we have SQL Saturday in New York City on May the 10th of 2025. Exciting times ahead.

You can stay in a business class hotel, like a Hampton Inn, or I don’t even know what else is over there. Like a Doubletree? I don’t know. Whatever. There’s lots of hotels.

It’s at the Microsoft thing in Times Square. I don’t know. If you can’t afford a hotel room, you can always sleep in the Port Authority bus station or any subway station. Everyone sleeps in the subways in New York.

It’s very safe, very accommodating. With that out of the way, let’s party. Let’s talk about this thing here. This was the demo that I was really pumped on because it’s one that when I was first looking at the parameter sensitive plan stuff, I was like, it’s going to be great if we can fix this store procedure, and it never did.

This one came along, and I was like, nah, not doing this. But then I installed the cumulative update, and all of a sudden, this started working, and I was like, they did it. They listened to me.

They did a thing. They didn’t. They didn’t. Microsoft continues to not listen to anyone except themselves, whatever burned-out, toasted-bud executives are just like, yeah, more AI. Really, really, really doing your job there.

Good job. All right. Okay. So what I had done was I had – there’s a demo that I do where I show a deadlock between a read query and a write query. And I have a reset because, like, one of the things I do is I have to do an update in there.

So I do this update, and then I have another update at the end that’s supposed to reset things back to where they were. The thing is, apparently, I didn’t run that. So here’s what happened.

And it’s kind of funny. And this ties back to some of the stuff that Paul White talked about in his Twixtr article, is that the skewness of – that has to be met, the threshold that has to be met for the parameter-sensitive plan optimization to kick in is if you look at the histogram for a table.

So, like, this is the histogram for the votes table. This is a full scan update, full scan, like, index creation, like, quality histogram. The lowest value in the histogram, I believe for a quality rose, has to have 100,000 times that number by the highest one.

But in this – in the votes table for the vote type ID column, the highest one is 37 million. Right? That’s what this number is here.

3.7 plus E – sorry, E plus 07. That is 37 million. That’s 3.7 million. That’s 37 million. So this is the lowest one.

We don’t have 100,000 times that. If we – if we had 100,000 times that, we would need something with 73,300,000 in there. So I’m going to – I’m going to show you what happened.

Right? So when I was – when I was doing my deadlock demo, one of the things that I do is, like, flip the vote type ID column around. And I do that for this one particular ID.

Right? So now if I run this and we look at the statistics, of course, we’re not going to see anything useful. Right? There’s going to be nothing good in here. I’m going to update these statistics.

Now, I’m using a full scan just because I don’t want to leave anything weird to chance. So, like, normally when I finished – when I finish my, you know, read query, write query, deadlock demo, I’ll run this update down here to reset stuff. This is what I forgot to do.

So what happens is when I – when I had this row in here where vote type ID was zero. So I had one row with one equality value sitting in there. So when I created my indexes up here and did all this stuff, what SQL Server saw in the histogram was this.

Oh, I’m updating statistics again. I’m not showing. That was supposed to be DVCC show statistics.

Let’s do that. So what I – what happened when I created the index on vote type ID was I ended up with this one row with the range high key of zero with one equality row. You can imagine that it’s a lot easier to get – hit that skewness threshold when you have something with – why did that take two seconds to run?

That’s bizarre. All right. I guess format really does suck.

So we would only need one other value with 100,000 rows in there. Of course, we have lots of that stuff going on, right? If you look at – shut up. So if you look at, like, this, like, that one equality, we have lots of stuff in there with over 100,000 rows.

And that’s exactly why it worked. Microsoft didn’t actually improve upon the feature. They didn’t relax anything.

They didn’t, like, do anything better. They’re just letting this thing lie. I don’t know why. But they didn’t actually improve it. They didn’t actually fix anything. They just said, yeah, go ahead. So when you have that one row in there, you do get the parameter-sensitive plan optimization, right?

So, like, when you get the parameter-sensitive plan optimization, you know it because you get this additional stuff at the end of your query that you didn’t put in there that shows you, like, the minimum and the maximum stuff and then, like, which column and parameter match they chose to give you the parameter-sensitive plan optimization for, that they’re going to generate multiple plans, like, depending on which bucket this parameter falls into based on the histogram.

And the thing that will come up… Oh, boy. What’s all this?

Oh, we got a lot of stuff in there now. Oh, okay. Apparently, I’ve just been letting this run for a while. That’s my fault. Well, but the original one that we hit into, that I hit, was this skewness threshold not met, right? That’s this thing.

Zoomit is not listening to me once again. So when I don’t have that row, this is the message that I get, right? We did not have enough skew in the histogram in order for the parameter-sensitive plan optimization to be triggered. Do I think this is good?

No. If you have a value with 733 records and a value with 37 million records, you sure as all get-out have skewness in your histogram. You just don’t have 100,000 times the skewness.

You don’t have 73.3 million. You have 37-point-something million. So you’re, like, halfway there? About?

Not, you know, somewhere in that fuzzy approximate count distinct area. You’re about halfway there. But you are not there there. So Microsoft did not fix anything.

It did not give us any joy or love. I’m going to close this because that is just weird. But I just want to show you the difference real quick here. And that’s going to be for this store procedure, right?

So if we run this and we execute this just for vote type ID 4. I don’t need to show you the whole parameter sniffing thing for it. This is with that additional row in there.

With that additional row in the table for where I accidentally left a vote type ID set to zero, we get the whole parameter sensitivity thing. Kicks right in immediately. Wonderful.

You did a great job. You worked. But, you know, if we come back up here and we did what I should have done when I finished the reader writer deadlock and we set that back to eight and then we update statistics, we’re not going to get that anymore. Because SQL Server is not going to see that one row with the one equality value for vote type ID zero.

It’s just going to see that lowest one for vote type ID 4 that has 733 rows. And we’re not going to get it anymore, right? We’re going to lose that whole thing.

So if I go and I rerun this procedure, I just like to do the creator alter just to make sure that we’re getting fresh plans for everything. But now if I run that for vote type ID 4 and we look at the execution plan, we don’t have that whole, we don’t have all that stuff at the end. We just have me forcing the compat level to 160 so that I would be able to use the parameter sensitive plan optimization.

So first I apologize. One, to you because I got this wrong. I was incorrect about this in my video.

Two, I’d like to apologize to Microsoft for accusing them of getting something right. Big sorry there, all of you. Your record remains untarnished.

And I’d like to thank Paul White for publishing a nice article on Twitter, Twixtr, that I will put a link to in this video description. So you can go and read it with all the great joy that I read it and you can learn more about this stuff. And yeah, I think that’s about it.

All right. It is time to go to the gymnasium so that I can, I don’t know, I think squat and deadlift today. So that’s a fun combo.

Anyone wants to see workout videos, maybe it will be more popular than the SQL videos. Who knows? Right? I don’t know.

If you’re tired of seeing my face and you want to see my butt instead, the squat and deadlift videos are pretty good for that. So anyway, thank you for watching. I hope you enjoyed yourselves.

I hope you learned something. I hope that I learned something. Don’t count on Microsoft to make stuff better. It’s the first one. And yeah, all right.

We’re going to hit the gym now. Anyway, thank you for watching. Thank you.

Going Further

A Little About Choosing Index Key Column Order

Posted on February 18, 2025January 22, 2026 by Erik Darling

A Little About Choosing Index Key Column Order

Video Summary

In this video, I dive into the fascinating world of index key column order in SQL Server indexes. After a bit of an uncharacteristic “big arms” moment that seemed to take over my screen, we get right back on track as I explain how different types of predicates can significantly impact which index is chosen and why. We explore scenarios where equality predicates might not always be the best starting point for indexing, especially when dealing with skewed data like in our example table `post`. By comparing query plans using forced seek hints, I demonstrate the performance differences between seeking on a non-selective column first versus a selective one, highlighting how these choices can dramatically affect application response times. The video also touches on dynamic SQL as a potential solution for handling unpredictable query patterns, making it a comprehensive guide for optimizing your index design based on real-world querying behaviors.

Full Transcript

Erik Darling here with Darling Data. I don’t know why I did the big arms thing. It was just like that. Sometimes you just lose yourself in these moments. You get all worked up. You get all riled up and excited and next thing you know it’s big arms. Whatever. Stop looking at me like that. In today’s video, this is a video of the video. This is gonna be, I guess, I guess this is gonna go on the a little about playlist because we’re gonna talk a little bit about choosing index key column order. I’ve done a lot of videos sort of about how SQL Server stores data in B-Tree indexes, how you can take advantage of that for various things like seeking and sorting and all that other good stuff. But this one is a little bit different because in this one we’re gonna talk about how different kinds of predicates can end up being, like, how different kind of different kind of shut up until this thing all the time. Yeah, just scooted it out of the way with my elbow. Pretty good, right? Different, like, types of predicates can be more selective than others. And how the sort of general advice to index purely for equality predicates first isn’t always the best advice. And then finally, how sometimes you might even need to maintain multiple indexes for different types of predicates.

types of queries. And, you know, going even beyond that how you might need to use dynamic SQL, in order to get SQL Server, or rather to direct SQL Server to the correct index sometimes. Particularly because at least as of the 2020 SQL Server 2022, iteration of the parameter sensitive plan optimization, like it only works with the quality predicates, it does not work with inequality predicates. So if you’re dealing with a parameter sensitivity issue, because of like a greater than, equal to less than, less than, equal to situation with like dates or something, it doesn’t, it’s not going to help you at all there. So that’s pretty screwed up. I have another video that I have to do. This can be a follow up on the last parameter sensitive plan optimization video that I did. That’s going to be fun too. So yeah, we’re going to talk about that stuff. But before we get into all that, of course, it is time to talk about the health and wellness of this, this here YouTube channel. If you want to sign up for a membership to support my endeavors and bringing you this, well, let’s call it high quality SQL Server content. Because that’s that’s a nice way to that’s a nice way to frame it. You can do that right down in the video description. There’s a link to become a member, you can join the 51 other people who have chosen to hand over some small quantity of money to me every month. And be I guess thankful for that.

If you don’t have four bucks a month, I don’t know, maybe maybe you got robbed on the playground, you can like you can comment, you can subscribe. And if you want to ask me a question, if you have a like what a good SQL Server question about SQL Server performance, you can go to that link right there, which is also down in the video description. And you can click on that. And you can ask a question. And eventually, I will do a round of office hours questions and answer it. And won’t that won’t that won’t that just be grand? If you need help with SQL Server, if you are ready to ready and willing to bring in a young and handsome consultant with reasonable rates to solve all of your SQL Server performance maladies, you can hire me for any of these things. And once again, BeerGut Magazine rated me the best SQL Server consultant in the world outside of New Zealand.

If you would like to get access to my training, you can get all 24 hours of it for about 150 US dollars. Again, that URL, that discount code, coupon code, whatever you want to call it, right down there. Just click somewhere in this general area.

Upcoming events. Well, well, well, well, SQL Saturday, New York City 2025 will be taking place on May the 10th of 2025. You can go.

Times Square, Microsoft offices. I don’t know. Maybe you can make a friend in the Port Authority bathroom while you’re there. There’s all sorts of, you know, it’s a big, crazy, wild city.

Anything can happen. But with that out of the way, let’s party on, Wayne. So I’ve got my database set to compat level 140.

I forget why I did that. There’s probably a good reason, but it doesn’t matter too much now. And I’ve also got two indexes on the post table. If Zoomit will cooperate.

Aha, there we are. Finally, listen to me. One of them is very descriptively named PSC. It is on post type ID, score, and creation date.

Hence, PSC. The other one that I have is called… Oh, Zoomit’s going to do one of these things again.

All right. All right, Zoomit. The other one is called CSP. Mmm. Very descriptive again. Creation date, score, post type ID. I don’t know why I’ve decided to give them all funny little toupees.

Or maybe they’re eyebrows. I don’t know. Unibrows, I guess. Monterbrows, whatever they’re called. So I’ve got these two indexes. And we’re going to look at how different queries can use these indexes more effectively depending on the selectivity of the sets of predicates that we’re passing in.

Now, if both sets of predicates are equally selective, it almost doesn’t matter a lot which index you use. But in certain cases where post type ID is not selective, but the creation date filters are, or the reverse of that where creation date is selective, but post type ID is not selective, which index gets used can make a big difference.

And then we’re going to talk about one other funny little thing that can happen along the way as well. I’m just going to make sure that I have that set up. I’m going to make sure that these indexes are created, but they should be because I feel like I was pretty smart about this beforehand.

And thankfully I was. So this is our starting store procedure. And we have one instance of the query where we’re going to force SQL Server to use the, to seek into the index called PSC that leads with post type ID.

Kind of a funny set of stuff with the force seek hint. I don’t know if you ever read the documentation for it fully, but with the force seek hint, not only can you tell a SQL Server which to do a seek, you can tell it which index to seek into, and then which key columns, or which key columns you want it to seek with.

And it’s wonderful. You can, you can do all sorts of things. So if this syntax looks funny to you, I would highly recommend going and reading the documentation for the force seek hint, because you can do all sorts of neat stuff with it that you maybe didn’t know about, because you didn’t read the documentation for years like me.

Ta-da! But here we are. So one is going to fork, force seek, not fork, into the PSC index.

That’s this one here. The other one is going to force seek into the CSP index here. And we’re going to look at the differences in query plans when we have selective and non-selective predicates in play. So we’ve got query plans turned on up here.

Let’s run it for this one first. And what I want you to see is that post type ID 2 is very much not selective. There are 11 million rows that match the post type ID 2. And this date, but the date range we have here, the start and end dates are very selective.

It’s just a month, right? So 2008-12-01 through 2009-01-01. If you run this, we’re going to get both queries, but we’re going to see that the query plans are somewhat different in their speed and effectiveness.

So the SSMS will listen to me. And it finally did. We have our seek into the PSC index. So we sought to all 11 million post type ID rows.

And then we had to do a follow-up on that, a residual predicate after that, to apply the predicate on creation date. This all took 1.4 seconds.

And down here, where we did a seek into the CSP index, in other words, creation date first, and we just found that month of values first, that whole thing took about 19 milliseconds.

So we’ve already got a pretty big disparity here, where seeking to all 11 million post type ID 2s and then applying a residual predicate on creation date, that’s what this one is doing.

Oh, thanks for disappearing on me, tooltip. Very nice of you. So we do seek to the post type IDs that we care about. And then we have this residual predicate on creation date here. Now, this is not because score is in the middle.

Score is the middle key column, but range predicates like this, you know, often do end up as residual predicates anyway.

But notice that there is a difference between these two query plans. When we use the PSC index, and we did a seek to post type, all the post type ID 2s first, we didn’t need to sort the data afterwards.

Now, granted, this is 40,000 rows, so it’s not that big of a deal for us to sort in this case, right? 40,000 row sort is almost never going to be your biggest problem, unless like the next one is like 40 million rows or something, right?

You parameter sniff that initial memory grant and things go terribly. Because memory grant feedback, remember, that’s a post execution feedback loop. That doesn’t help you get more memory while a query is running.

That won’t happen. Memory grant is adjusted after the query finishes. So you might have a query run for like an hour spilling data everywhere. And then when it’s done, SQL Server is like, well, I can fix that for next time.

It makes a little note. So, but this happens because when we do this seek, we have, we do a seek in here, right? And we find the range predicate.

Then we have an equality predicate over here. But the way that the B tree is set up with creation date first, like even with an equality predicate on post type ID for two, like that doesn’t help us with the sorting of score, right?

Because score is a second key column. So it’s sorted by creation date. I did experiment with this with an index on creation date, post type ID, then score in descending order.

And there was still a sort because even with that setup, the, like you have like the creation date column and the way, and like I have other videos on how B tree index is stored.

You should watch those if you’re kind of not filled in on it, but like all the like individual creation date entries, like post type ID is sorted in there. So it doesn’t help to have the range and then the equality, but you still have to sort this column.

This column is still not going to be sorted the way you want it on the way out. So for this one, the, for the, the, the selective predicate on creation date, using the index that leads with creation date was the better choice, right?

By like, like just about a second and a half, right? So like looking at like a, like a fairly big difference there in like application response time. If we, if we do that backwards, right?

And we say where post type ID equals four, but then, right? Which is very selective, right? It’s about 700 rows, but then we have, oh, why is this red? This should stay pink.

For this, this is a, this date range is wide open. This is literally every date in the table, 2008 to 2014, right? So this is every single date. Now I could have made this a little bit smaller or something, but you know, for dramatic effect, I have left all of, I have left the entire range in.

And if we run this, things are going to end up just about backwards, right? So, and this one, we do a very fast seek into the index, into the index that leads with post type ID.

And in this one, well, I mean, look, look at the times, right? The last one was about like 20 milliseconds that was for the faster plan. This one is seven milliseconds. And this one is back up to like a second and a half.

So even though like, you know, even though like the top one is great for this set of predicates, it was not great for the, for the non-selective post type ID set of predicates.

So which index you use or like how you design the indexes really does depend a lot on how people query the data. If you have people who are always querying for a narrow range of dates, and like, you know, you have a quality predicates, especially on very skewed data, like the way the post type ID column is skewed in this table, you would probably want, you would probably want to do, have your index lead on creation date.

But if you don’t know, right? Like if we, if you look at the, the distribution of the post type IDs, like some of these are selective, like down here, like 25,000 rows down to two rows.

Those are all fairly selective. These two are not very selective. So like you probably don’t want to equal server, like you probably don’t want your queries going to find all this stuff and then applying the range predicate second.

At least, you know, the way the query is written here, there’s, there’s, you know, probably more, there are more of, there are more verbose ways you could write the query where you could probably, you know, have pretty, you know, reliable performance from either one, but it would just take, it would take way more typing and probably another index or, yeah, probably another, probably different index that has the ID column in it so that you could like, you know, kind of like correlate, you know, find the post type IDs, find the creation date, sort of join that stuff together or use exists or cross apply or something.

But anyway, like, so like really depending on how people query the data, you might have, there might be some other considerations as of like how you design your indexes.

Now, one thing that you could do if you’re just not sure how people are going to query the data is maintain both indexes and do some work with dynamic SQL in order to have SQL Server go to the right index, depending on some stuff.

Now, the way I’ve set this dynamic SQL up, like you don’t have to tell me, like I know, I just didn’t feel like making this the longest demo in the world that, you know, there are going to be cases where neither of these are true, right?

Like there’s going to, there’s probably going to be cases, there could be cases where, you know, post type ID is not in one or two and the start date, end date is still less than two.

And then, you know, and like the reverse of that for the bottom one. So like, I know these don’t catch every single possible outcome. This is just to show you that you can make it work with the set of variables, right, the set of predicates that I’m passing in.

So you could have like another outcome for this where like, maybe you just like say from posts and let SQL Server pick the table, right? Like if you, like, there’s some other outcome to this, you could just say, otherwise just, you know, hit the post table, do whatever you want.

Maybe you could still keep a foreseek hint on there and you could just leave it up to SQL Server, which index to foreseek into. But if we run this query, we will see, make sure that’s created, that for both of our dynamic outcomes here, right, for both of the sets of predicates we pass in, we can direct SQL Server to use the correct index based on how selective various predicates are, right?

So for one of them, like all we care about is, you know, if the post type ID is one or two, like those are not selective. So we want SQL Server to go with creation date.

And then if post type ID is not one or two, then we want SQL Server to use the post type ID indexes. That would be, that would be very selective for all those other ones.

And again, you know, you would just, you would just need like some like fallback for this, but like else, you know, just from post with foreseek, do whatever you want. So anyway, this is just a little bit about some of the index design stuff that you might have to think about.

Remember that, you know, not every scan touches the whole table. Not every seek is quick and direct to a few rows. If you look, think about the queries that ran before, we did have seeks, but we had seeks that span like almost the entire post table, right?

For the creation dates, they did span the entire post table. For the post type ID too, they did a seek, but they did a seek to 11 million out of the 17 or so million rows that are in the table.

That’s generally not what you want, right? A seek that seeks through most of the table or all of the table is not like much of a seek at all, right? Might as well be a scan, right?

Just a scan that they spelled wrong and pronounced wrong and labeled wrong and I don’t know, a bunch of other stuff. So anyway, these are just some of the things that I find it helpful to think about when I am helping people design and tune indexes specifically for queries.

You know, if you’re just looking at missing index hints, stop. Like, if you’re just looking at missing index suggestions either in query plans or in the DMVs, just stop.

It’s not going to help you do this sort of fine-tuned work. Like, the, it’s, like, SQL service is not good enough at telling you what indexes would be really appropriate very quickly, like, prior to optimization, right?

It just doesn’t know. Can’t do it. It’s not that smart. Even, like, the Azure auto, auto index tuning stuff is still crap. Like, don’t, it’s useless. I don’t know, I don’t know how many, I don’t know how many hours they buried in that, but, I don’t know.

You gotta keep the summer interns busy, I guess. Well, anyway, I hope you enjoyed yourselves. I hope you learned something. I hope that you, you have found this video to be enlightening, both from the, the point of view of index design and 4C Kent usage and dynamic SQL usage and all that other good stuff because we did, we did, we did cover some ground in here, didn’t we?

We did, we did get through a few things. Anyway, thank you for watching. I’m gonna go take a shower. Bye.

Going Further

The State Of Table Variables In 2025

Posted on February 17, 2025January 22, 2026 by Erik Darling

The State Of Table Variables In 2025

Video Summary

In this video, I delve into the state of table variables in SQL Server as we approach their 25th anniversary since their introduction in SQL Server 2000. With Microsoft’s recent announcements about the “try us out” program and potential new features for SQL Server 2025, it’s an interesting time to revisit how these variables have evolved. I explore the nuances of table variable cardinality estimation, particularly focusing on SQL Server 2017’s interleaved execution and 2019’s deferred compilation, which offer some improvements but still fall short in providing statistical information about the data within the table variable. By walking through practical examples using SSMS, I demonstrate how these variables behave differently outside of stored procedures versus inside them, highlighting the challenges and potential pitfalls for query optimization.

Full Transcript

Erik Darling here with Darling Data. And again, looking like it’s going to stay that way for a while. My repeated attempts to get acquired for a stack overflow amount of money have gone unanswered. So unfortunately, we’re just going to have to keep doing these videos for a while, I guess. In today’s video, we are going to talk about the state of the table variable. You know, it is the year 2025. SQL Server 2025 is, I guess, sort of around the corner. You know, usually, usually these things happen around the end of the year, but you know, since Microsoft has announced the, like, you know, the, the, the, you can try us out program, you know, we can, we can safely assume that something is in the works. And, you know, people, I, so this is actually going to be interesting. Nobody’s going to talk about it. So, November 30th of this year will be the 25th anniversary of SQL Server 2000. And in SQL Server 2000, table variables were introduced. And so this November will be the 25th anniversary of people being absolutely wrong about table variables. Now, a little bit has changed, right? Back in SQL Server 2017, I believe, think about it.

It was 2017. I forget how far it got backported, but I think it was one of the IQP features from 2017. We got something called interleaved execution for functions that, that, that, for multi-statement table valued functions that, that have, you know, return a table variable, where a SQL Server would no longer just guess one row. It would now guess a thousand rows or something like that. And then SQL Server 2019 introduced another cardinality estimation help, another, maybe it’s a hack. Another part of the thing that’s part of the intelligent query processing family called table variable, table variable deferred compilation, which means that in a normal query, like you were just running a query in SSMS, like, like outside of a module, let’s say, like outside of a store procedure. SQL Server would just figure out how many rows are in the table variable. It would not, like, so like if you put a thousand rows in there, SQL Server would be like, oh, there’s a thousand rows in here.

There, there is still, however, no statistics, even with an index on your table variable, there is still no statistical information about the data, the distribution of data within the, any column on the table, in a table variable. So you’re still missing that. But then the context of a store procedure that changed a little bit more. Because now, instead of just always getting that one row estimate, again, unless you like, you know, throw a recompile hint on there or something.

What SQL Server will do is sort of parameter sniff your table variable, not for any statistical information within the column, but just the number of rows in there. So like, if you put like a one row in a table variable, the first time a store procedure is executed, it’ll still keep guessing one row. But if on that compilation, you put a thousand rows in there, SQL Server will keep guessing a thousand rows.

That could be good or bad if the number of rows in that table variable has a wild swing to it. If like on one row, it’s, sorry, if on one execution, it’s, you know, a thousand rows, the next execution, it’s a hundred thousand rows. Obviously, that’s just another parameter sniffing problem for you to deal with.

So maybe you need that recompile hint anyway. I don’t know. But anyway, we’re going to talk a little bit about some of that stuff in today’s video. So as always, if you would like to support my grand endeavors and designs in producing this content, you can go to the video description and you can sign up for a membership for as low as $4 a month.

If you don’t have four bucks a month, your mom cut your allowance off, I don’t know, something happened, you know, dad lost his job at the mill. So if you want to ask me questions, go to that link and there will be like, you know, some stuff about question asking. Right. So more information there. Good things to know.

If you need help with your SQL Server, if, you know, you’re watching these videos and you’re like, man, Eric maybe seems to know what he’s talking about a little bit there. You would be correct because as Beer Gut Magazine has established many years in a row now, I am the best SQL Server consultant in the world outside of New Zealand. If you need any of help, if you need any help with any of this stuff, I am available and my rates are reasonable.

If you would like some very reasonably priced, very reasonably priced, apparently speaking clearly is not a reasonable offering for me today. But if you would like some very reasonably priced SQL Server training, you can, of course, get all mine. Again, video description has everything you need for about $150 USD when you apply that magnificent discount code right there.

Upcoming events, we, of course, still have SQL Saturday, New York City 2025 coming up on May the 10th at the Microsoft offices in Times Square. So be there or I guess you’re going to be a square either way, aren’t you? All right. Anyway, let’s move on with the show.

We have SSMS open. Let’s make sure our database is in the correct compatibility level to get table variable deferred compilation happening. And we can also see, as far as database scope configurations go, that this deferred compilation underscore TV.

Do not adjust your television set. They couldn’t be bothered to write out table variable or, you know, call this something a little bit more human friendly. But it is on.

By default, it is on. You can see that here. One thing that I am sort of excited about in Management Studio version 21, right? This is SSMS 20. Management Studio 21 has all of the data, has like a GUI page for all of the database scope configurations.

So, you know, I guess it is about time there. So let’s make sure that we have the right index in place. And while we do that, I’m going to walk through a little bit of what’s going on here.

Now, I’ve got this query with, I mean, there’s a little bit of sort of superfluous stuff in there that I use to make my life a little bit easier down below, which I’ll show you in a moment. But these are the top six parent IDs in the post table, right? So you can see parent ID zero.

Now, again, this is a normalization problem in Stack Overflow. They don’t have a questions and answers. Rather, they don’t have separate tables for questions and answers. They only have posts.

And posts is all the questions and all the answers. And you have this parent ID column that tracks which question answers belong to. So if you have a parent ID of zero, you are a question.

If you have a non-zero parent ID, you are an answer or some other type of thing that’s tied to a question. And, you know, there’s a pretty steep fall off here, right? So, like, there are six million questions and, well, six million fifty thousand eight hundred and twenty questions in the post table.

And then a few of the posts have a sort of high number of answers. What kind of post would generate five hundred and eighteen answers? I don’t know.

I mean, on the database administrator stack exchange, you’re lucky to get one answer that’s an answer. Well, I guess not lucky, but getting two good answers is, you know, extremely rare. But on Stack Overflow, actually, I’m not saying that any of these answers beyond one of them is good.

I’m not even saying that the answer that’s currently marked as the answer is even still correct. Who knows? This might be 20 years old. And, well, I guess 2008 can be 20 years old, almost 20 years old.

And it could be woefully out of date at this point. It could be all sorts of wrong. But this is this is what we’ve got, right?

Just none of that really makes a difference here. All we care about is that there are some numbers. So let’s outside of a store outside of a module like a store procedure. We’re going to look at what table variables do.

And I have a recompile hint here because I just I just want you to see that the recompile hint doesn’t help us statistically in any way. But I’ve got a table variable up here that has a clustered primary key on it. Right. So I’m creating an index on the table variable because I want you to understand that SQL Server does not have any statistical information about what’s in the table variable no matter what.

So but what what I what I thought was sort of interesting is if we run this query passing in just one parent ID. Right. And I’m going to save the zero one for later because that’s when things get sort of interesting. Like this is like this is a good enough query slash query plan.

You know, SQL Server like like one row is coming out of here. Yes, that’s actually correct for this. We only we did only put one row in the table variable. But then SQL Server guesses three rows are going to come out of this join to posts.

In reality, we get 518 rows. If we were using a temp table with no pound sign temp table where I’m sorry, pound sign temp table or however they do it, then SQL Server would be able to generate statistics, would be able to do better cardinality estimation for the join.

But what caught my eye a little bit was that every time you add a row to the table variable and this is I think I want to say this is only true for the new the default, the newer cardinality estimator. I want to say default because I hate I hate calling that thing the default.

But I want to say this is only for goes for this where they do the course join alignment. It could be true for the legacy one. I didn’t get around to testing that.

But for every row you add in there, SQL Server basically guesses that three more rows are going to match here. So now instead of one row, we get three rows. And instead of SQL Server guessing that we were going to get three rows as a result of the join, now we get nine. And if we put in the last two, the last two rows for this set that we might care about, you’ll see that now the SQL Server is going to guess 15 rows.

Right. So for for when we put five rows in there, SQL Server is like, oh, well, I think 15 are going to come out. Right. So we get that number down here. So this this still isn’t a very good guess.

Right. Because, you know, like, granted, this is a pretty small number of matches. So like this kind of being off by this, like even though it’s like 11,700 percent off, like the index is good enough and the query plan is good enough that like it’s not a big difference for this. Where that gets real messed up, though, is if we change this, of course.

Right. Because if we like this is like even though these these questions have the most answers as the current state of the 2013 SAC overflow database, like that’s still not that many. Right. We still get like like seventeen hundred and fifty five rows for all of them, which isn’t a ton. Right. So guessing 15 and getting 17, 55 is far from the end of the world.

Right. You’re not going to you’re not you’re not going to get the worst query plan in the world for that. Right. So let’s look at this now, though, with one row, but that one row is zero. So this is where things do start to make a difference, because now we have we put one row in the table variable, but SQL Server doesn’t know what that one row is.

So SQL Server is going to guess three rows again. But that’s that’s a way off guess. Right. So SQL Server is like, oh, well, I think one row is going to I think one row is going to come out of here, which it’s right about.

It just doesn’t know what the value for that one row is. So we can’t look at the statistics that are available on the post table and say, hey, how is the value coming out of this thing going to match up with the values down in this thing? Right. So SQL Server knows one row is coming out of here.

The one row estimate is correct, but it doesn’t know what the value of that one row is. So when we come in, when SQL Server is doing its initial compilation of the query and it’s like, well, how many rows are going to match in here? It has no way of like looking at the histogram in here and saying, OK, like I have a histogram value either for zero or that is like has zero in its range.

I can make a better guess about how many rows I think are going to come out of this. So it chooses the same query plan as it did for the smaller number of rows and things just get all mucked in here. Right. So this is a terrible this is a terrible execution plan for this for for the value of zero.

But because SQL Server doesn’t have anything statistically to line this up with this, but just like we just can’t we just can’t. There’s really no better way of modeling this sort of thing. There’s no better assumption you could make.

Right. So that’s it. That’s like keep in mind this is, you know, with a with a recompile hint and with the cluster primary key on this thing. So SQL Server can’t make a very good guess here at all. Right. This is lost in the woods.

Now, like and like like I said, that does change a little bit in the context of stored procedures. Right. And I’m taking recompile hint out of the equation here because I want to show you that the number of rows that the store procedure compiles within a table variable persists from one execution to the other.

So what I’ve done is I’ve created a table type called pattern ID so that I can pass in a table valued parameter here. Now, table valued parameters do behave somewhat differently than table variables. They actually had this behavior before.

So like if I so like even so like if I declared a table variable inside of the store procedure and dump this table variable into that table variable, the same thing would happen. Right. So I just want you to want you to understand that a little bit.

But I think I just did this correctly. We’ll see that table type is already created. I’ve already I’ve already run through and tested this. So let’s do this once.

Let’s declare we have to declare our table variable. We have to map it to our table type. Right. That’s what we’re doing up here. Then we’re going to insert our five row matches, five rows of matches into the table variable. And then we’re going to execute the store procedure passing in the table value parameter to it.

So when we do this SQL Server in the execution plan here is like, ah, there are five rows in you. Wonderful. We’ve got you’ve got five rows.

We’ve got that all figured out. Right. Cool. But now when we run this for zero, you know, we’re still we’re going to get the bad query plan again, because even in a store procedure where SQL Server, you know, uses the table value parameter, which is backed by a table variable, we are unable to get good cardinality.

Now, I think this is especially important because let’s just look at the query plan real quick. We get that same thing. This all takes about 10 seconds. SQL Server, again, it cached the query plan where it thought five where we had five rows initially.

But now we get we only have the one row in there. But SQL Server still doesn’t know what that one row is. Right. So even in the context of a store procedure, table variables are still pretty messed up. Now, where this, I think, gets important is sometimes people think that the table value parameter situation is different because they’re backed by a table type.

So notice when I created this type here, parent, I created a type called parent IDs as a table. People sometimes get this confused with this and think that this is creating a like a real backing table for the table value parameter. But it’s not. Right.

So even though we have a clustered primary key on this thing, SQL Server is not generating statistics on this either. Right. There’s no statistical information about this because we’re using the table variable to hold this or to sort of be assigned that type. Right. So the table variable here is really where you run into the issue because SQL Server is still not giving any love to table variables, histogramically, statistically, whatever you want to call it.

So like even table value parameters are backed by a table type. They still don’t get statistics. SQL Server 2019 deferred table variable compilation only gives you the table cardinality, the number of rows in the table, not the contents of the columns. Right. So still no histogram, still no table level distribution stats.

And you can still run into all of the same problems with table variables if you have skewed data or rather you join to that table variable in some way. Right. It could be a join, some query exists, whatever. You know, however you want to correlate to the to the table variable, you still run into the same problem where SQL Server doesn’t is unaware of the contents of the table variable.

And the cardinality estimates that you get are not going to reflect the sort of reality of what you’ve done in there. So anyway, that’s my spiel. Thank you for watching. I hope you enjoyed yourselves. I hope you learned something.

And I do hope that you will stick around and try to learn some more SQL Server stuff from me. I do enjoy I do enjoy talking about it, don’t I? You can tell because I just went on for about 18 minutes about table variables again.

But anyway, what can you do? Some some horses deserve to be fed. And because, you know, I still have to help clients with this table variable stuff, I feel like I need to keep talking to the greater world about this table variable stuff because God almighty, no one seems to be learning their lesson.

So anyway, I’m out of here. Thank you for watching.

Going Further

A Fun Plan Shape With Aggregates In SQL Server

Posted on February 14, 2025January 22, 2026 by Erik Darling

A Fun Plan Shape With Aggregates In SQL Server

Video Summary

In this video, I delve into a fascinating query optimization scenario that caught me off guard due to its clever and efficient plan shape. While exploring a query on the Stack Overflow database for an advanced databases course taught by Andy Pablo at Carnegie Mellon University, I stumbled upon this intriguing SQL Server query transformation. The optimizer’s ability to unnest subqueries in a way that avoids redundant table scans is particularly noteworthy. This optimization not only demonstrates the power of SQL Server’s query optimizer but also piques my curiosity about how other database systems handle similar scenarios. I discuss the specific trace flags and hints involved, as well as the unique plan shape that results from these optimizations. By sharing this insight, I hope to inspire further exploration into query optimization techniques across different databases.

Full Transcript

Erik Darling here with Darling Data, and we’re going to be looking at what I think is a really fun plan shape in today’s video. And I came across this because I was watching a video series. So a little bit, little background is a really smart fellow named Andy Pablo who works for Carnegie Mellon University. He is the professor of databaseology there. And he, every, you know, semester, he does like an intro to databases course and advanced databases course. He’s running running running one right now on, uh, query optimization and they’re all, they’re all available for free on YouTube. So if you list, look, look up like CMU database group, you’re bound to come across that stuff on YouTube. Uh, and he was, there was a video where he was talking about like, uh, you know, different query optimizers and like stuff that they’re good at and stuff that they’re maybe not good at. And, um, one of the things he said, is that like, uh, like Postgres and Oracle are, or some collection of databases are better at SQL Server, uh, at this one specific like type of query, like unnesting subquery thing. And I was like, whoa, I’m the one who gets to say Microsoft’s not good at stuff here. Not, not you. I can say whatever he wants, but, uh, yeah. So I was like, hmm, well, this, this, I gotta see. So I went and I, I wrote what I thought was like a pretty reasonable approximation of the query using the Stack Overflow database.

Now I was quite surprised by the plan shape because it wasn’t bad. And, uh, I think it actually has kind of a cool optimization in it. So we’re going to talk about that. Um, before we get into all that stuff though, uh, you, you may have noticed this slide has finally changed for the first time in like months. And the change here is that, uh, I will be, uh, accepting office hours style questions anonymously. Uh, if you go to that link right there, this, this will be in the video description.

Of course, like everything else is like when in doubt, refer to the video description. This will be in the video description. So if you, uh, would, if you would like to ask me anonymous questions that you would like answered on YouTube, this is given, keep in mind, this is not private consulting help. If you need that, you have to hire me and pay me. But if you want to ask an anonymous question, I will do my very best to answer it here on YouTube. Uh, but aside from that, all the same stuff applies. If you would like to support my endeavors with this channel, you can sign up for a membership again. Link for that is in the video description.

If you do not have the four bucks a month that it costs to show me a little bit of cup of coffee, love, uh, you can like, you can comment, you could subscribe. And now, now you can take up more of my time by asking me questions too. So lucky you, it’s your big day. Uh, it’s your turn to shine. You absolute champion. Uh, if you need help with your SQL Server, I am the best SQL Server consultant in the world. If anyone tells you different, they are liars, they are charlatans, they are frauds. Uh, do not, do not buy their line of nonsense.

Uh, so if you need help, these are my services and my rates are reasonable. If you would like some other reasonable stuff from me, like say SQL Server training, you can get all of mine. It’s about 24 hours of content. Uh, again, refer to the video description for, for, for more links. Uh, you can get all 24 hours of mine, uh, for about 150 US dollars. It is the best deal on the internet as far as I’m concerned, uh, upcoming events. Uh, I’ll have more of these soon.

I’m waiting on dates to get locked down for some stuff, but right now, SQL Saturday at New York City is happening on May 10th, 2025 at the Microsoft offices in Times Square. Uh, it’s going to be a lot of fun. Um, I, I’d like to say something funny, like I’m signing autographs, but, uh, A, no one wants my autograph and B, I’m not signing autographs. So we’re just going to have to deal with that. Anyway, let’s get on and let’s look at this fun plan shape.

Now, uh, I’ve got some stuff down at the bottom and the stuff down at the bottom, if we talk about this from the bottom up, uh, we’ve got some, we’ve got a couple of trace flags there. Um, one of them, 8619 is there. So we’ll output in the messages tab, uh, applied transformation rules to the query. Because I want to show you one of those, uh, we’ve got 3604, which will means that stuff will get output to the console, the messages tab.

I’ve got a recompile hint. So SQL Server has to apply rules every single time. And I’ve got the query compat level set to 160, not because you need 160, not because you need the new cardinality estimator for this plan shape to happen. And only because when I use 160, it runs in batch mode and it runs a little bit faster than it does in row mode.

And sometimes time is of the essence, isn’t it? But the plan shape that I want to show you today is, uh, from this section right in here. Now, uh, you know, you have this section of the query up here, and this is just a simple join from the post table to the votes table.

And then down here, you have this thing saying where the score in the post table is equal to the max score in the post table, right? So this is another reference to the post table. We have one, two references to the post table here. Now, what I think is really interesting about this query plan optimization is that when you look at it, you might be expecting that we will have, uh, two references to the post table in the query plan, right?

Because after all, we’re selecting from the post table up here. And we are saying where the score in the post table is equal to the max score in the post table. So you would expect to see where you expect to have to touch the post table once for the from, and then again for this from, but you, you actually don’t have that.

If you go and look at the execution plan, you have this, right? You touch the post table once here, you touch the votes table once here, and there is no other reference to the post table in here.

SQL Server applies kind of a neat transformation where, um, it grabs the top one with ties in here, and it sorts, uh, the data for that by score descending here. So we pull out all of, um, we pull out all of the, the score table stuff here, and that’s, that’s all, that’s actually all that we need and all that we use in the plan, right?

We don’t, we don’t have to actually do anything. SQL Server takes the, takes this query and transforms it into just a couple aggregates. And then way over here, uh, we have a filter, but, and you would think maybe, maybe this filter is where we’re applying this, but, but we’re not.

The thing that we’re filtering on is for the count. So like part of the query that I have down here at the bottom is where count big is greater than one, right? And that’s what this filter is applying.

We are not actually applying any filter regarding that, comparing the min score to the max score, right? Like that, like, or rather, sorry, just comparing the score to the max score. We don’t actually need a separate access of the post table to do this.

Like the reason why this plan shape caught me off guard is because A, it’s pretty good. It’s a pretty smart one, right? It’s, and this is not new. Uh, Paul White blogged about, uh, something similar 15 years ago.

This is not a new addition to SQL Server. This is not a new thing, but, uh, you can see the rules that got applied over here. And, uh, where is, uh, believe it’s this one, uh, that does that, uh, that does the, the, the, this is the rule behind responsible for the, the transformation, uh, that does this.

Right? So we actually just kind of aggregate this data once and then get the min from, from that as well. So SQL Server is able to do something really smart here and like not have to touch the post table twice.

And this got me thinking like, what are these other query engines doing? That’s better than this. Like I’m, I’m now, I’m now I’m just absolutely curious, furiously curious is what these other things do. So I don’t know.

I’m going to have to, maybe I have to go figure that out. Cause I, cause I’m, I’m just very interested now. Now there are a couple other things that I want to show you here. Uh, one of them is this is obvious.

This would obviously change the meaning of the query because this is correlating, uh, where the owner user ID out here equals the owner user user ID up here. This doesn’t, this will change the query results. Right.

Um, I’m actually not sure if I want to run this with it to actually get results. It might be smarter to just get the estimated plan. But notice that like, even with that sort of additional complication added, added in SQL Server doesn’t still doesn’t need to touch the post table twice. What does get added is this segment operator.

So sort of like when you have a row number with a partition by SQL Server adds a segment operator to mark those segments and do give you like the partitioning, uh, groups for like row number dense rank or whatever. So SQL Server just adds a segment operator to this. You still have the exact same plan shape just with this one additional operator added in.

Now this is only, uh, available for direct equality predicates. So like where score equals something, if you were to change this to like greater than or equal to, or less than or equal to, or any other, uh, non equality part, or even like not equal to, or something. Uh, then SQL Server would no longer have that available to it, right?

Like now we have two accesses of the post table where SQL Server like aggregates to get one part of it here. And then does like this whole part where it aggregates to get the next part of it. So there’s like, there, it can’t be applied to every single query form.

It’s really only a quality predicates right here. But usually this is, I don’t know, I don’t know, this is the way most of these queries get written anyway, unless, unless you’re dealing with like average and you want to find like where something’s greater than the average. Finding where something is greater than the max is pretty, uh, pretty kind of a stupid query actually.

And just finding anything that’s greater than the minimum, maybe a little bit more realistic, but usually when you see stuff like this, they are using averages. And, you know, usually you don’t want to find things that are like exactly average. You want to find things that are either like above average or below average by, you know, some amount.

So anyway, I thought this was a cool optimization. I thought this was, um, not, not the worst thing that I’ve ever seen SQL servers query optimizer do. And, uh, yeah, so no, I, I was, I was honestly, again, just a little bit puzzled as to what these other query engines could be doing.

That’s so much better. Uh, especially because in the video there was like this little teaser moment where he’s just like, but, but Microsoft’s about to get better at this. And now I’m like, well, I mean, it would be hard for you to get much better than this.

I think, I think this is a pretty ideal situation for the optimizer. So we’re going to give Microsoft a thumbs up for this query optimization. We’re going to say, good job, optimizer team.

You did a good. We’re proud of you. Keep up the good work. Keep stuff like this. Great to see. Anyway, thank you for watching.

I hope you enjoyed yourselves. I hope you learned something. And I will see you over in the next video, which will be, um, well about something else entirely. Because, uh, there’s only so often you can discover, or there’s only so often that you can be introduced to, uh, to new plant shapes, I guess.

Um, at, at, at this point in my life. So, yep. All right.

Well, we’re out of here. All right. Thank you. All right.

Going Further

When Subquery Query Plans Are Suboptimal In SQL Server (And How To Fix Them)

Posted on February 13, 2025January 22, 2026 by Erik Darling

When Subquery Query Plans Are Suboptimal In SQL Server (And How To Fix Them)

Video Summary

In this video, I return after a three-week hiatus to Paris, sharing my experiences with you as I recorded content before embarking on my trip. The video delves into the nuances of suboptimal subquery plans in SQL Server, explaining how these plans can lead to excessive row returns and poor query performance. I walk through an example where a query initially returns 4.8 million rows, far more than necessary for practical use, highlighting the issues with row goals and estimated versus actual row counts. By tweaking the query and experimenting with different hints and index usage, we explore how to optimize these plans for better performance, demonstrating that sometimes subqueries can be effectively tuned rather than outright avoided.

Full Transcript

Erik Darling here with Darling Data, and this is my first video back after a three-week hiatus in Paris. And I know that it was seamless to you because I recorded three weeks’ worth of content before going away. Intel is back up to its old tricks. Not being able to find drivers. Some things never change, I guess. But if I look any different or I sound any different, if I have perhaps developed a slight French accent, that’s why. So anyway, today’s video, we’re going to get gently back into things and talk about when sub-query plans are suboptimal. I know, happens to the best of us, especially when you have to deal with SQL Server being its silly self. But anyway, let’s carry on with a little bit about how you and I can cooperate together. If you would like to become a member of this channel, a paid member, you can join like 50 other people who have been so kind as to donate $4 a month to the content that I produce here. There’s a link right down in the video description where you can do that. Very easy, very clickable, like me. If $4 a month is just too rich for your blood, wherever your blood may originate from, you can like, you can comment, you can subscribe, and you can make different numbers go up in different ways.

that make me a happy, healthy human being. If you need help with SQL Server, if the things that I talk about during these videos, ring a bell, and you’re like, wow, my SQL Server could use some help in that area. I do all of these things and more. And as always, my rates are reasonable. If you would like some more reasonable stuff from me, you can get all of my training content, performance tuning training content, I don’t have anything else, for about $150 USD. It’s all available at that URL up there, with that discount code down there. And of course, as usual, there’s a very clickable link down in the video description for you to do all that. Upcoming events, there is still SQL Saturday, New York City 2025. Taking place May 10th at the Microsoft offices in Times Square. So if you haven’t been robbed by someone dressed as a Sesame Street character in a while, or someone dressed as Spider-Man in a while, come to Times Square, come to Times Square, get your money’s worth, or lose your money’s worth, I don’t know, whatever. There’s probably a joke about congestion pricing in there somewhere. I just don’t know what it is yet.

But anyway, let’s talk about these suboptimal subquery plans. Now, I’m going to show you the starting version of this query, because there are some important things in there for you to understand about it. But this is not going to be the final query that we end up with. Now, this query runs for about 12 seconds, and it returns 4.8 million rows, which is far more rows than you will ever need to return to an end user for them to do anything reasonable with. Way too many, right? Way too many, right? It’s an insane number of rows. Why would anyone ever need that many rows for anything, right? Unless you’re going to put it in an Excel file and do something else with it, right? It’s just an absurd number of rows.

So this is obviously not a very good example of when subqueries have problems, because we return a lot of rows, and stuff like this part of the plan are completely sensible. So these two index seeks into the votes table, which have absolutely correct estimates, right? Absolutely. We had 100% on both of those, come from this part of the query, right? We have an upvote count and a downvote count. And of course, upvotes are signified by a vote type of 2, and downvotes are signified by a vote type of 3.

So, like, this is the part of the plan that we’re going to concentrate on for now, or rather for most of the video. But then I just want you to understand down in here is where the rest of the stuff happens. So, like, in the rest of the query, we have another subquery that goes to the badges table, and another subquery that goes to the comments table.

And down here in the query, we join users to posts, right? So there’s some stuff going on in here. But all of that, like, the two things that hit the votes table are up here, right? So, like, the order that we wrote the query in is not the order that SQL Server arranged the query plan in, which is going to come in handy for us to understand later.

And then down in this part of the query, this is where all the rest of the stuff happens. We hit badges, we hit users, we hit posts, and we hit comments, right? And SQL Server, for better or worse, has chosen parallel merge joins for all of these things.

But we don’t necessarily care about this part. Most queries that you’re going to have to tune are going to limit rows in some way, whether it’s by top or offset fetch. And for the sake of typing, we’re going to change that up here to top 5,000.

Oh, not 54,000, top 5,000. And that’s going to be nice and sensible. The thing is, when we change the query to the top 5,000 version, things don’t necessarily get all that much better.

So let’s run this, and let’s wait very patiently for SQL Server’s suboptimal optimizer to come up with a query and query plan for us and return some results. So the query that I ran that returned 4.8 million rows, the query execution time was about 12 seconds. And we had a big parallel plan with lots of joins and stuff.

In this plan, SQL Server has changed its mind about parallelism. It has not changed its mind about this part of the plan, though. Notice we still have a merge join.

The two subqueries that hit votes are still up here. The other funny thing that has now happened to our query is we have very, very bad estimates in here. But these are not bad estimates in the typical sense.

They’re not like SQL Server had bad statistics or out-of-date statistics. Of course, I just created these indexes, so the statistics are full scan up to date. The poor estimates here come from row goals.

Now, whenever you do something like top or offset fetch, sometimes exists or not exists. There are all sorts of things in SQL linguistically that add something called a row goal to your query plan. And if we right-click on these operators, what you’re going to see, not in the thing up here.

This is where it doesn’t show you anything useful. But down in here, we have an estimated rows without row goal. And this estimate is correct, right?

See, this is the right number of rows. What SQL Server made a bad estimate on is how many rows it would have to read out of here in order to get, in order to meet the row goal that we set by saying top 5,000. So, and it did that for both of these, right?

So, if we look at this one, this one will also have an estimated rows without row goal. And this is estimated rows without the row goal. And that is absolutely correct. But where things fell off was, of course, between the estimates and the actuals.

That’s a little bit easier to see in here, where you have the number of rows hiding somewhere. And then the estimated number of rows. We have the actual number of rows read up here.

Sorry about that. Got a little bit lost in my own thoughts. And then we have the estimated number of rows to be read. But this is, of course, without the row goal, which it doesn’t say. But the estimated number of rows is 3,300.

But the actual number of rows is 2, 1, 4, 1, 8, 1, 9. That is a seven-digit number of rows. So, we read a lot more rows than we thought we would. The row goal is kind of messed up in that.

But none of the other index accesses in here, if we look at the properties, at least, yeah. So, this one does. This one does not.

This one does not. And this one should not either. So, none of these have the estimated rows of the row goal. The post one does. So, where things get interesting is that we got a, probably when we shouldn’t have got a serial execution plan there.

Now, I’ve created an index on the post table, which it should have used. Because this index fully covers everything that we want to do. But SQL Server costed this index out of existence. Now, let’s look at the estimated cost for this plan.

Right? SQL Server thinks it would cost 48 query bucks. So, my cost threshold for parallelism is 50. So, this costs just under the cost threshold for parallelism.

So, SQL Server was like, ah, not quite parallel plan territory. Also, it didn’t use my nice index on the post table. On the post table, it uses the clustered primary key.

Right? So, for some strange reason. And that strange reason can be revealed by adding a hint to the post table. Now, the index that I created up here is called P.

So, we’re going to, down here, we’re going to say with index equals P. And then, we’re going to rerun this. And I’m going to show you something kind of interesting.

Because sometimes these are the wrestling matches that you have to have with the optimizer in order to get queries to perform well. Now, this query plan is parallel. And this takes, well, I don’t know.

Let’s just say about half as long. The last one was like 9 point something seconds. This is 5 and a half seconds. So, this one’s about twice as fast. Not quite. I realize my percentage math is not great, especially on the spot.

But if we look over here, the part that SQL Server got this query expensive enough to be eligible for a parallel plan was using the nonclustered index. And SQL Server deciding that it needed to resort the data that came out of that index to make this join more efficient.

So, it put the data that came out of our nonclustered index in order by the ID column in the post table. So, it reordered the data.

And now this query plan, actually, I forgot to show you this, has a cost of 596 query bucks. So, this is why SQL Server didn’t choose the nonclustered index in the first place. SQL Server thought that choosing the nonclustered index and sorting that data would make things too expensive, was way more expensive than just using the clustered index.

Of course, we can see a big difference in execution time here, right? So, again, about twice as fast when we get a parallel plan. We still have problems up here, though.

The problems that we have up here are that even though we’re doing seeks into these, right, we’re still doing seeks into our index, the row goal is still making a bad guess at the number of rows it’s going to take to satisfy the row goal set by the top 5,000, okay?

So, one thing that you can do if you want to affect how SQL Server is doing things and you’re very confident not only in your indexes but also in the plan shape that you’re after generally is you could say option loop join.

And what SQL Server will do is give us the plan shape that we would really want from queries that perform sub-queries. In other words, we don’t want SQL Server doing this part of the plan before we’ve done the part of the plan that actually does all our filtering, right?

We don’t want SQL Server doing this stuff. So, if we run this query again with the option loop join hint, we’re going to get the plan shape that we’re after. The plan shape that we’re after comes up here where all the stuff that we’re like doing our initial filtering on, right?

Like this is really the crux of our query. Selecting from users, joining the posts, and then filtering the post table on these creation date columns. So, this is really where we want SQL Server to start and narrow down the rows as much as possible.

And this is the plan shape that we’re after where taking that result, we now do loop joins across all of this stuff. And this query takes about one second, right?

That’s much better than the five seconds that it took when SQL Server chose a bad plan shape. And that’s much faster than the, well, I guess it was the same plan shape. So, we’re going to say this is a much better plan shape generally for our query, right?

And I forget, actually, you know what? We’re going to do a little experiment here. We’re just going to see what happens when I take that index hint out of the mix.

Let’s just do a little experiment because I actually forget what happens. So, this one takes actually 1.6 seconds. So, this one’s slowed down by about half a second.

And SQL Server goes back to scanning the clustered index on the post table. So, we really, but we still get a parallel plan, right? So, at least as far as query tuning goes, this is still better than what we were getting before. I’m going to leave this index hint in because I generally like the speed that I get with this index hint.

If you are not confident that the query plan shape should only ever use loop joins and you’re thinking to yourself, well, I want the optimizer to be free to choose other types of joins. One thing that you can do is set more row goals.

Now, let’s think logically for a minute about how many rows a subquery can return. One, especially when we’re doing an aggregate like count or sum or average or min or max or something.

Those are all things that have, that really just return one row. If a subquery ever returned more than one row, well, guess what? Throw an error. So, what we can do is rather than say SQL Server, you always have to use loop joins. We can say, SQL Server, we want more row goals.

Give us our row goals. And if we put a top one in both of these, SQL Server will finally see the light and will give us a query, give us the query plan shape that we want without having to say you always have to use a loop join.

Now, SQL Server does naturally choose loop joins here. But now, if we look down to where we access the votes table, we do have a top for both of these, right? It’s a tiny little top here and a tiny little top here.

And we have the same index seeks that we got before. Now, granted, the estimates in this query is still off a little bit, right? Like, I mean, we’re under by 10% here. We’re over by 43% here.

We’re over by, wow, like 1,700% here. We were, you know, over by 5,500% here. We’re over by 24% here.

Math, math, math, math, math. But we get a fast enough query with a good enough plan shape that we want. So whenever you’re tuning queries or whenever you’re, you know, having to deal with someone who’s like, wow, subqueries always suck.

Never use subqueries. Sometimes they just don’t know enough about databases to be able to say that. And they don’t know enough about query tuning to be able to, you know, make that make that judgment across the board about every subquery ever.

Sometimes you just need to give subqueries a little bit of help. So in the future, whenever you’re looking at LinkedIn or whatever other social media you like, and, you know, some, you know, data engineers like starts posting one of those, you know, LLM lists of top 10 things not to do with a query.

And you see things like avoid distinct and avoid select star and like the same list of things and like avoid subqueries, use CTE. You can just firmly know in your head that they are complete ding dong idiots, regardless of which giant company they work for.

Specifically, probably meta, which has like the highest quantity of idiot data engineers on the planet. You can say to yourself, hey, you’re wrong about all that stuff. Because Erik Darling taught me that you’re all wrong duty heads.

So you have that to look forward to. And you can, you can point them to this video where I will, I will gladly, I will gladly fight with them. Anyway, sometimes some queries just need a little bit of help.

Sometimes you have to grease the wheels a little bit, but you can always make a query better and faster. And it does not always involve taking subqueries out or adding in CTE or whatever other nonsense people tell you about on the internet.

Anyway, thank you for watching. I hope you enjoyed yourselves. I hope you learned something. And I hope that you will pick fights with data engineers online because it’s fun to do because they’re all cock-a-doodie heads.

Anyway, cool. All right. Goodbye.

Going Further

Better Filtering With YEAR and MONTH Functions In SQL Server

Posted on February 12, 2025January 22, 2026 by Erik Darling

Better Filtering With YEAR and MONTH Functions In SQL Server

This week, I’ve got a coupon that offers an even steeper discount on my training than the one at the bottom of the post.

It brings the total cost on 24 hours of SQL Server performance tuning content down to just about $100 USD even. Good luck finding that price on those other Black Friday sales.

It’s good from December 23rd-27th, or until I remember to turn it off. Hurry along now.

Thanks for watching!

Video Summary

In this video, I delve into the world of better filtering techniques in SQL Server queries, specifically focusing on how to handle year and month functions more effectively. Often, I encounter queries that dissect dates into their constituent parts (year and month) for comparison, which can lead to performance issues and slower execution times. By demonstrating a cleaner approach using the `DATEFROMPARTS` function, I show how to avoid these problems and improve query sargability. This method not only makes your code more readable but also ensures that SQL Server can leverage existing indexes efficiently, leading to faster execution.

Full Transcript

Erik Darling here. Guess who? Darling data. That’s what science says. In today’s video, we’re going to talk about better filtering with year and month functions. Now, I run into a lot of queries that do something like take a year and a month, very rarely a day, but usually year and month, is either parameters or parameters or sometimes they even go so far as like to dissect a date and pull the year and month out. I don’t know why. It’s blasphemous, but they do it. And then they take some columns in their table and they like a date column or a date time column and they say where year, month, that column equals the month thing and then where year column equals the year thing. And without fail, those queries suck and are slow and have problems and I have to fix them. So I’m hoping that by recording this video, I can avoid having to fix those problems in the future. Great. How much would you pay for that? How about four bucks a month? You click the link in the video description. You can sign up for a membership and get all of this for four bucks a month. You can sign up for a membership and get all of this for four bucks a month. You can sign up for a membership.

a month or more if you choose to be more generous, but four bucks a month is the minimum. Anything less than that, I don’t know. What do you do with it?

I live in New York. Four bucks technically just allows you to take a breath. If you are just a fan of this content, you can like, you can comment, you can subscribe.

I’ll thank you in some other way. If you are looking for help with your SQL Server, perhaps with some performance tuning issues like with wrapping year and month in columns, I am best in the world at all of these things.

You can hire me instead of some idiot who doesn’t know what to do and who gives you bad advice. So that’ll be cool, right? Imagine that you’ll actually get stuff done. If you would like some very high quality, very low cost SQL Server training that teaches you things like I’m about to teach you now and more.

You can get all of my training for about 150 USD with that discount code. And would you believe that a link directly to go to the training site and apply that very discount code is in the video description.

So, again, it’s the end of the year. I’m not doing anything. Talk to me in 2025. With all that out of the way, let us put on our party hats and commence the partying because that is what we are here to do.

So, this is the type of query that I see people writing quite a bit. Oh, is ZoomIt? No, ZoomIt is just a little bit delayed. This is what I see people doing quite a bit.

Now, I think that part of why they don’t do things correctly is because sometimes figuring out date things is hard. And I admit I also didn’t want to figure out date things.

So, what I did is if ZoomIt will finally just go away, is I looked at Stack Overflow. And Stack Overflow had a pretty good answer for how to figure out if you’re in a leap year. So, here’s what I’m doing.

I have an index on the votes table on creation date. I just didn’t want that to accidentally run again. And then up here, I’m assigning a year and a month. Let’s just pretend these could be parameters.

They could be things that you dissect from a date that gets passed in. Whatever you want to do with it. It doesn’t matter to me. And then in here, I’m figuring out if we are in a leap year. So, if we’re in February, then we start with, to find the last day of the month, we start with 28.

And then if these things are true, we add 1 to the 28 or else we add 0 to the 28, which makes sense. Because you don’t want to add anything if it’s not in a leap year. And then there’s a little bit of figuring out down here to say, if your month is 4, 6, 9 or 11, then you have 30 days in your month.

If you are not month 4, 6, 9 or 11, then you have 31 days. All right. So, we’ve got that part figured out.

Great. Great. Wonderful. So, this is really a much better, oh, why are you moved over? How did that happen?

Oh, that just looks silly, unprofessional, didn’t it? That was terrible. Just fire whoever wrote these scripts. Really, a much better way, a much cleaner way of doing this. And again, this comes down to the concept of sargability, of being able to take whatever your search arguments are and apply them easily to whatever index you have in place.

Of course, for us, we have an index on the creation date column. If we do stuff like this, SQL Server has to run this function for every column, figure out, pull out the month, pull out the year, and figure out if that month and year is equal to the month and year that we have passed into the query. If we do this instead, using the wonderful, magnificent date from parts feature, we can assemble a date out of the year and the month.

And to find the start of the month, we just put in 01. And to find the end of the month, we just use that day variable that we used up there. Now, because I am using local variables, and local variables have lots of performance side effects, particularly usually because of poor cardinality estimation, in order to shortcut that issue, in order to force the parameter embedding option that SQL Server has in its wonderful engine, I am using the option recompile hint here.

If these are big queries that run occasionally, and you are okay with them recompiling every time, please use a recompile hint. There is nothing wrong with it. If this is code that you want to be production ready, awesome code, you may want to explore two other scenarios. One might be to have a store procedure that accepts the local variables that you declare and assign values to as parameters, because then they are treated like parameters, because they are passed in to that store procedure.

Or, you could use dynamic SQL and parameterize dynamic SQL, which is the only safe kind of dynamic SQL, and you could pass in the local variables that you declare and set values to as parameters to that dynamic SQL, thereby transmutating them through the alchemy of parameterization into parameters. Wonderful. Things that you would just never think would work, but they work.

They work well. And of course, if we run these two queries, I am just going to hit execute on this. These both come back relatively quickly, but the first one comes back relatively slower.

This one takes a full second to run, has to scan the entire index that we created on the votes table. That is this thing right here. And if we hover over the operator here and we look at the tooltip, we actually have something kind of funny happen.

Year and month are really just sort of shortcut synonyms for the date part function. Right? So, you can see that we have evaluated…

Oh, gosh. Where did my cursor go? There it is. The date part month right here and the date part year right here. And we have said where the date part month equals three, just like we assigned to that value.

And the creation date and the year 2013. Oh, that’s nice. We still had to scan that index though.

We needed a parallel query for it to be fast. If we wanted to put these on sort of even footing, and I’ll do that in a second, we could run this query at max stop one to see how long this takes. Because down here, we do have a serial query that finishes in 178 milliseconds.

Right? And granted, the plan is a little bit more complex with all this stuff in it. But in this case, it’s okay because each one of these constant scans is really just creating a row for the date from parts function that we’re feeding in here.

And if we look at this, we just have a range seek to for expression 1008, which is the date from parts function that we created from year month 01. And then down in here, we have the end of that for expression 1009, which is the date from parts function that we assembled for year month day. Right?

So when we found the end day of the month, that’s where that was. So let’s actually go back and let’s redo that. Let’s say that we want to put these on equal footing. Right?

We want to make sure there’s no cheating. Parallelism clearly made things faster and better for that first query. Right? So let’s look at just how long, like single threaded each of these takes. Right?

Let’s look at these execution plans. This takes a full five and a half seconds now. So without the benefit of parallelism, this takes 5.6 seconds. Right?

We still have to scan that entire index. It’s still doing the same thing. And this query down here is still able to, in just under 200 milliseconds, do that same work. So if you’re out there tuning queries, and you see something like this, well, stop doing it. One.

The first step, stop. Second step, start. This is a much better way of approaching this. Now, this sort of thing won’t work if you have to do something real goofy, like the year and month from the date value in one column equals the year and month from a date value in another column. Maybe I’ll do a video about that someday, but no promises right now because my time is dedicated to other things.

You might notice that I have about 11 tabs open up there. I’m going to be recording some videos about stuff that I find interesting, about how to write query, how to write T-SQL queries correctly, how to use different things in T-SQL to write queries in the best way possible. So I’m going to be recording those.

Maybe I’ll get around to that one as soon as I can, but I got a lot of stuff to talk about before I get around to that. So anyway, thank you for watching. I hope you enjoyed yourselves.

I hope you learned something and I hope that this finds you in good health because nice, happy, healthy people, they want to buy training and spend time with young, handsome consultants and gosh darn it, that’s me. All right. Cool.

Thank you for watching.