Lost In Costs: Part 1

Lost In Costs: Part 1


Video Summary

In this video, I delve into the concept of “Lost in Costs,” a term that highlights how query costs can sometimes mislead us when tuning SQL Server queries. I start by explaining what query costs are and why they should be treated as estimates rather than definitive performance metrics. Using my favorite demonstration, I illustrate how even simple changes like adding a hint can drastically affect execution plans and performance. By walking through this example, I aim to show that while high-cost queries aren’t necessarily slow, understanding the underlying costing mechanisms is crucial for effective query optimization. Throughout the video, I also touch on various factors that influence costs, such as row projections, I/O operations, CPU usage, and memory requirements, emphasizing that these are all part of a highly generalized set of algorithms designed to generate good-enough plans across different hardware configurations.

Full Transcript

Erik Darling here with Darling Data. And it occurred to me while I was traveling overseas, while I was being the European correspondent for Darling Data, that there was some session material that I had presented that I had not yet recorded. And that includes this session that I’m going to do now, which I presented a few different places. And the Rogel session that I presented at the Pass On Tour events. So the next few videos, aside from office hours, are going to focus on those. So that’s a good time there. So this is the lost in cost, lost, actually, I don’t know if that’s supposed to be plural. Let’s skip that. Anyway, as usual, down in the video description, all sorts of helpful links. If you want to hire me for consulting, purchase my wonderful top-notch, best-in-class training materials at a reasonable price. Become a subscribing channel member. Ask me office hours questions. You can do all that down yonder. And, of course, if you enjoy this content, please do like, subscribe, tell a friend, all that good stuff. Since I just got back from Utrecht, I trekked to Utrecht, and I trekked all the way home.

The only thing that I have left through the end of 2025 as far as speaking engagements will be the Pass Data Community Summit taking place in lovely Seattle, Washington, November 17th to 21st, where I have not one, not two, actually exactly two, days of T-SQL pre-cons with Kendra Little. So that’s great for everybody. Except you if you’re not going. So that’s your fault. Anyway, it is still October, so we can still have this lovely Halloween database party going on, can’t we?

All right. So, oh, that’s a web browser. So if you ever wonder, this is how I generate my ASCII art, and this is the last thing I had to generate ASCII art for. So we’re off to a good start. Anyway, oh, see, it is plural. I didn’t mess it up. Anyway, this is going to be part one of Lost in Costs, and we’ll talk all about what this means coming up.

But, you know, so since I’m a professional presenter, I always have my contact info up so people can bother me, but we don’t need to talk too much about that. Because you are already at this one, and you might have gotten a link from this one or this one to get here. So it’s not a lot of point in that, is there? So the first question that we must ask ourselves when we are looking at query plans and we are trying to figure out exactly what we are trying to tune about a query, we must ask ourselves what are query costs. And of course, query costs, no matter where you are looking, are estimates.

Right? So cost does not equal time or anything else. They are unitless measures. Right? They are not durable performance metrics in any way, shape, or form. Even in actual execution plans, costs are all an estimate. There are no actual costs derived anywhere during query execution.

SQL Server does not change its mind and say, oh, I thought that was going to cost 25 cents and it costs $70,000. My bad. It just doesn’t happen. Costs are merely how we got to the plan that we are currently looking at. All right? So when you think about how SQL as a language works, you tell SQL Server what rows you want, what columns you want, which order you want.

Maybe you might even say, summarize these columns to the group by or something. But you basically tell the database, I’m looking for this, and the database has to go find it. Boy, I should probably fix that at some point. Right? It’s not 2024 anymore. Let’s pretend that’s a five.

All right. I obviously didn’t go through this with too fine of a comb before deciding to record it. Anyway. So queries are really just descriptions of what you want to see. Right? They are just a description of the data that you require from the database.

And then the job of the query optimizer is to figure out how to best enough find it within a reasonable amount of time. It’s a bit like how indexes contain data and statistics describe data. Databases contain a whole bunch of data and we describe what data we want to see from them.

Costs, as we know and love them, are just a bunch of internal algorithms that SQL Server uses to shape and choose the execution plans that it ends up executing. Right? The hope is that all those costing mechanisms are correct enough and the optimizer has correct enough information about the data underneath.

Right? The statistics, statistics, histograms and things like that to get you a, well, let’s just, let’s call it a good, a good enough plan. Right? Good enough plan found here. But all of this is to answer the question your query is asking as efficiently as possible.

Costing considers all sorts of fun things. Right? So like the number of rows that are going to like project out from various things. It might be a seek or a scan.

It might be like a where clause. It might be a filter. It might be a join. Like how many rows are going to come through stuff? What kind of IO is being done? Because there is different costs involved or there are different costs involved for random IO versus sequential IO.

CPU effort is also a factor in this. Parallelism is considered a CPU effort reducer because you have multiple CPUs working on the same set of data. Right?

Rather than having one CPU deal with all of the giant amounts of data that you are reading through. And of course, memory requirements are part of that as well. Cost is all just a very highly generalized set of algorithms. The thing that I always have to tell people is that cost has nothing to do with you.

Nothing to do with your hardware. It doesn’t matter how much your hardware costs because the costing algorithms are not looking at your hardware and thinking, Oh, that’s a nice computer.

I bet this would be easy. It just doesn’t do anything. It’s meant to come up with good enough plans on any set of hardware. Right? That is the whole point of a generalized cost-based system. So what I always like to show people is this image, which is, of course, fondly referred to as Nick’s computer.

There’s a note there about SP2, I think, of something. I don’t know SP2 of what. But there might be like an original SQL Server activation code on there.

But this is the computer that all of the costing stuff was originally done on. Now, aside from some very generic terms like CPU or motherboard or RAM or disk or something like that, that has very, very little in common with the type of hardware that SQL Server runs on today.

At least we hope it does. We’re not sure what’s actually going on up in Azure. It could be a whole farm of those things. We don’t know.

They won’t tell us. But anyway, all planned decisions are based on these costs. Some costs are fixed per unit, like CPU and I.O., like a CPU and an I.O.? N.I.O.?

N.I.O.? Whatever. Other costs are based on statistical information that our databases contain, which describe the data in them, like the size of tables, the histograms that are attached to columns or indexes. And, of course, uniqueness is a very good descriptor of data that we might have.

There are lots of things that we can add to our database in order to better describe it or better contain data for it, sometimes more efficiently. Useful indexes.

Useful indexes. We can write our queries with sargable predicates that do not require extra work to locate rows that we care about. We might want to think about adding unique constraints or even value constraints or, you know, unique indexes would be something else. We might want to consider having up-to-date statistics.

At least some of us might. And we might want to even do something as crazy as limit query complexity so that cardinality estimation is easier for the optimizer. We may want to even avoid things that do not have good costing support.

Maybe something like XML, JSON, string splitting, built-in functions, user-defined functions, local variables, and table variables, just to name a few off the top of, well, not my head anymore. It’s just stuff that I wrote down quickly.

So I’m going to start this off by showing you one of my favorite mis-costing demos in the world. Since I am smart, and I’ve made a note saying if I’m smart, but I am smart. So I already ran this.

This is like, this is one of like my grand mal, the optimizer is not always right type of demos, where I have two queries that do essentially the same thing, except one of them has a hint, not this one. This one does not have a hint, right?

There’s no like with or, you know, hey, do this thing here. And then this one down here, which does have a hint, it uses the force seek hint to tell SQL Server it does not always know better. But if we look at the query plans for this, we are going to see two, well, not very different, but different enough to make a big performance difference.

For example, zooming in over here, we can see that the first query does not have a force seek hint on it, ran for 25 seconds. And the second query, which ran for about one and a half seconds, this did have the force seek hint on it.

Now, what I want to point out before I go too much further is that what’s going on in the outer reply is a correlation. And the correlation is from the users table, which has a unique, sorry, a clustered primary key on the column called ID. It’s an integer, it’s an identity.

And I am correlating that to the clustered primary key of the post table, which is also an integer and an identity column. They are very, very similar columns in most ways, aside from which table they belong to. If you’ve been poking around the Stack Overflow database long enough, you’ll realize that this is not how you join users to post.

This is not how the users table correlates to the post table. So it would correlate either to the owner user ID column or the last editor user ID column. But I wrote this demo specifically to show you a failing of the optimizer, where essentially joining two tables together on two unique primary or clustered primary keys results in a really strange execution plan unless you supply a hint to it.

So what make what really slows this first query down is SQL Server looks at this, looks at this and says, hey, you know what? Man, what I’d really love to do is build an index off the index that I already have. SQL Server says, nah, you know, that clustered primary key wasn’t good enough for me.

I need a brand new index and I’m going to store that index in tempDB and we’re just going to hope that this goes well. Right. And of course, it doesn’t go well.

You can see that this like most of the time in this plan is spent here. Right. There’s about five seconds scanning the clustered primary key on the post table. And then since this is a row mode plan, right, there’s an additional 20 seconds here adding up to 25 seconds total.

Right. So we had we spent a lot of time building the spool. It’s a 17 million roll spool.

Part of the reason why the spool takes so long to build is because when eager index spools are doing their thing. Oops, that’s the wrong one. That’s the right one. What happens is all of the all of the rows end up on a single thread. Right.

You can see the rest of these. These are all zeros for the rest of the threads involved here. And this this isn’t just a quirk of this demo. This is every single eager index spool that gets built. Not a good choice, SQL Server. Right.

For the fast plan, SQL Server said, oh, yeah, I guess I could just use that index. Right. I mean, it’s a little it’s a little strange because SSMS does this thing where it cuts off the name of the like or like rather the what like certain operator text. I know Plan Explorer used to do this better, but I can’t use Plan Explorer anymore.

Functionally worthless for this type of activity. But this is a clustered index scan. Right.

And this is a clustered index seek because we said, hey, SQL Server, please force seek. Please force a seek into this index. And so it does. And we can when we tell SQL Server what to do, we end up with a much faster plan. Right.

We don’t spend 25 seconds in this sort of zone of the world. We spend about one second in this zone of the world. All right. And this is because when we ran this query, SQL Server applied a bunch of costing algorithms and said, I can make this faster by creating another index. No, you can’t.

No, you can’t. SQL Server. People always want to do ridiculous things with the various places that SQL Server stores query plans. Right.

So, you know, like back in the plan cache days when I used to do a lot of work on SP Blitz cache. And when I first started working on Quickie Store, which uses Query Store, people would say all the time, hey, I would like to order this result set by query cost. And I always say, why?

Say, well, I want all my expensive queries first so I can make them less expensive. Like, OK, well, you can have expensive queries that run very fast. You can have inexpensive queries that run very slow.

Query cost has nothing to do with how fast your query actually runs. Right. It’s just the amount of effort that SQL Server expects to use to run it. But expects and actually happens is two very different things.

Right. That’s the old everyone has a plan until they get punched in the mouth thing. So I’m not saying that you can’t take a high cost query and make it faster. That’s certainly untrue.

Right. Tune things all the time. High cost, low cost doesn’t matter here at Darling Data. But it does bring up an interesting question. If costs are stupid, what should we look at in our query plans? Well, the first thing that you should find are maybe some things that aren’t even the query plans themselves.

Right. Maybe you should look for things that run at times that you care about. Right.

But once you start finding those things, the stuff that I usually go after are things that have a high average or maximum CPU or duration. You know, there are a lot of metrics that you could look at to figure out various things about a query. Right.

You could look at reads if you were that type of person, but you would be reading the wrong thing. Because reads don’t often tell the story of how long a query executed for. If you do a bunch of logical reads, it’s stuff that’s already in memory.

It’s not going to be particularly slow. You might coincidentally reduce the CPU or duration of a query and also reduce reads. But that’s not necessarily guaranteed.

Not really something that correlates incredibly highly. But anyway, the things that I generally look for, because the things that people appreciate you doing when you tune a query is making it faster. It’s a great way to figure out if something is slow.

Look how long it runs for. Right. Look how much CPU it uses. These are the things that you can use to tell it like, hey, I made that this query took 30 minutes. Now it takes three minutes.

Right. I reduced 27 minutes from this query. You might not reduce a single read from that query, but you might greatly impact the total duration. You might want to maybe even focus in on stuff that executes a lot.

I don’t know if you’re into that sort of thing. If you’re very OLTP-ish. So once you figure out what you should tune. Right.

Not what you want to tune. Because there is often a pretty big gap between what people want to tune and what they should be working on. Right. It’s, you know, there’s lots of stuff that I should be doing right now, but I wanted to do this. So we’re here.

Right. But generally, you know, and I’m not saying that there’s no such thing as like a red flag in a query plan. Because there are certainly some query plan patterns that I can, when I see, I’m like, oh, that’s, that’s probably it. But getting an actual execution plan is the most valuable thing that you can do because that will tell you where the, like where the query is spending time.

No one is going to complain about the number of logical reads that a query is doing. People are going to complain about how slow a query is. What you can figure out when you get an actual execution plan is where things are slow.

If there were any weight stats that help you figure out why they were slow. If there are any big differences between estimated and actual plans that maybe because those estimates were incorrect, SQL Server chose a not so great execution plan. You might look at if the query had any spills.

And you might even look at like what we did earlier with the eager index pool plan about parallel thread distribution. Right. If all your rows are ending up on a single thread and a parallel plan, not a good sign, not a good sign at all. Logical reads are largely a vanity metric these days.

I say it’s like SQL Server 2008 mentality. It’s not, not what you want to focus on in the sort of modern world, especially where people put such a high emphasis on reducing cloud spend because reducing CPU is generally the best way to reduce cloud spend. Physical reads, maybe logical reads, not so much.

So we’re going to pause here before we move on and we’re going to, well, stop because I’m going to do this in three parts. I’m trying to limit the parts to about 20 minutes a piece. So we’re going to stop here and we’re going to pick up here in the next video.

All right. Thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

A Little About Scalar UDFs and Read Committed Snapshot Isolation In SQL Server

A Little About Scalar UDFs and Read Committed Snapshot Isolation In SQL Server


Video Summary

In this video, I delve into an interesting quirk involving recommitted snapshot isolation and non-inlineable scalar UDFs in SQL Server. Specifically, I explore how these UDFs can affect the row versions read by a query under recommitted snapshot isolation, leading to seemingly inconsistent results. By walking through a demo, I illustrate why this happens and explain that it’s not an issue with snapshot isolation itself but rather a consequence of scalar UDFs executing once per row. If you’re curious about more details on SQL Server isolation levels or want to dive deeper into the topic, be sure to check out my playlist titled “Everything You Know About Isolation Levels is Wrong.”

Full Transcript

Erik Darling here with Darling Data. And in today’s video, we’re going to talk about a little oddity with, excuse me, recommitted snapshot isolation and non-inlineable scalar UDFs. I don’t think this goes for snapshot isolation.

This is only recommitted snapshot isolation. And this could also like just as easily go for a multi-statement table valued function. But the basic gist of it is, is that like when you run a select query and you read row versions, like the normal select query, the row versions that you read are from if under snapshot isolation, when the transaction begins and you first access the row versions or under RCSI, if you like, you know, when you, when your query starts.

And the thing that, um, scalar UDFs mess with is that the query inside of them. Remember, because like we’ve talked about scalar UDFs a bunch of times here where they don’t like the non-inlineable ones don’t run once per query. They run once per row that the query has to like process and return in some manner.

I’ve gone into detail on that in another video, so I’m not going to redo that here. But the main thing is that like in your select list for every row that your UDF has to execute for, you’re going to read data for the point in time that the UDF starts reading data.

So like your outer select, you know, that’ll, you know, do its thing. But the UDF that executes once per row, that’s going to start reading data for each time that the UDF executes. So you could read different rovers. It’s almost like going back to read committed where, um, this is again, something, if you have questions about any of this, if you want to learn more about this stuff, I have a whole playlist on everything you know about isolation levels is wrong.

And that has a lot of background on this, but it’s like almost like with read committed where like the, like when you start reading data, it’s like a whole bunch of different points in time as you like read along an index. So that’s, that’s kind of what the gist of the demo is.

So you’ll see it in action in a moment. If you want to click links, I’ve got some links down in the video description. Uh, you can hire me for consulting. You can buy my training. You can become a member of the channel that, that donates a tiny fraction of money, like, like the office space, penny stealing thing, uh, every month and say, thank you for the high quality SQL Server content that I produce.

Uh, you can ask me a question for office hours. And of course, if you, if you like this content, uh, enough to, um, maybe spread the good word, uh, about darling data, uh, you can like, like subscribe. And of course, tell a friend or two or 10, uh, once again, I will be at past data community summit, Seattle, November 17th to 21st, uh, two huge days of T-SQL pre-cons with Kendra Little.

It’ll be the best T-SQL you’ve ever learned. So there’s that. Anyway, let’s get back to the database party. I love this picture. So spooky and database-y Halloween-y. I’m almost not looking forward to changing it when Halloween’s over. Maybe, maybe I’ll leave my Halloween decorations up until Christmas.

Who knows? Who knows what’ll happen? Anyway, uh, let’s make sure that, uh, recommitted snapshot isolation is on and we need to make sure that we can, uh, reload these tables here. So we’re going to create a couple of tables and you, you may have seen this demo before in other videos that I’ve done. If you have, I promise this one is a little bit different, but I am going to, uh, show the sort of basic, um, stuff first and then we’ll go back and look at, um, we’ll look specifically at the function thing to see, to see why it’s different. So if we look at what we run this query and we look at what we get back, um, we get rows one through 10 and all total is a thousand for these. Great. Uh, over in this window, I have an update statement. Then I’m just going to run in a loop to add one to the totals column. Where that’s going to get interesting, is with this query. So that update is not yet running. And if I, let’s put, let’s do the read committed one first. So we have that fresh in our brains, uh, right now read committed is telling us that, you know, the same thing as before, but if we start running this update, right. And we come and run this query. The main thing to keep in mind is that for every row that comes back, uh, we’re going to get slightly different numbers. It’s like 100, 200, three, four, five, six, seven, eight, nine, up until we move into a new one. And this will happen every time we run this, because like I said, um, under read committed, as you’re reading data, like, like, like, especially like seriously, if you haven’t watched the isolation levels playlist, go watch it. If you’re, if you’re, if you’re lost on that, as you’re reading data, data can change all around you. Read committed doesn’t guarantee a point in time. Read it guarantees many different points in time that you’re going to read from. So the, the totals for that are all whack, right? So if we, if we run this without the update running, we get a stable result back. So we’re going to do is quote out this read committed lock end, and I’m just going to reset this table to something, right? And show you, you know, just start from sort of zero. Now we get all one thousands, right? So this is our starting place.

And if I start running this update again, right, if I kick this off and we come over here under read committed, these numbers are all stable, right? So because we read a snapshot, read committed snapshot isolation, we read a snapshot of the data. Every time we run this, we’ll get a consistent result back, right? That wasn’t true under read committed. All right, cool. So why does the UDF change things? Well, first let’s create a simple UDF in here. And this UDF is, we’re going to do things slightly differently for this because it just to make it clear what’s happening.

Inside the UDF, we’re actually getting the sum here. So I’m going to change my select query a little bit to rather than sum a sum, I’m just going to get the max from a sum. So inside the this column right here, we are getting this is where our UDF kicks off. If I show the estimated execution plan for this, we get as usual for non-inlineable scalar UDFs, we get two execution plans back, we get the execution plan for the query that’s running and doing stuff, right? This is our outer query, but then we get the plan back for the for the UDF in here. This is where we’re grabbing the sum of stuff from table two. So what I’m going to do is we don’t have any locking hints on this one, right? So we’re not using read committed lock this because this query is executing without any locking hints. It’s using row versions when it runs. But if I come over here and I start running this update again, we’re almost going to see something as weird as when we were using read committed snapshot isolation, where every single row in here is going to be slightly different.

It’s like two 300 500, 879 94, 31, blah, blah, blah, blah, blah. It’s like every time we run this, we’re going to get what seems like non snapshot inconsistent results back from the query that we’re running in there. But it’s really because every time that query executes is once per row, right? Every time that query executes, it’s reading the row versions from a different point in time.

So this can make things look all skewed. So this is just another weird problem that you can run into with queries that that have non in lineable scalar UDFs in them. This is not the fault of read committed snapshot isolation. This does not mean read committed snapshot isolation is bad. This is just, uh, this is a byproduct of the, uh, sort of procedural black boxy, like row by row things that happen, um, when you use, uh, scalar UDFs and SQL Server. Um, so that’s fun, right? Great. It sucks.

Uh, scalar UDFs, stop using them. Get away, right? Just run screaming, go do something else with right functions that different. Don’t do it. Uh, it will cause nothing but problems and headaches for years to come. Your children will inherit these problems. Uh, they’re a nightmare. So don’t do it, right?

No scalar UDFs because they ruin everything except demos for consultants who, um, find weird things about scalar UDFs and isolation levels, which I suppose I should be somewhat thankful for, but, um, you know, it’s a little tough to find gratitude for things that are so annoying, but I don’t know.

Maybe it made a good video. Maybe it didn’t. We’ll find out. We’ll see how many people give this a thumbs up, won’t we? All right. Thank you for watching. I hope you enjoyed yourselves. I hope you learned something and I will see you in the next video. Au revoir and all that stuff.

Bye. Bye.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

SQL Server Performance Office Hours Episode 36

SQL Server Performance Office Hours Episode 36



Questions:

* Back in the days of yore, when you worked for somebody else, did you ever become despondent with your work (or SQL). If so, how did you re-motivate yourself?

* Why have I never heard you suggest ALTER DATABASE CURRENT SET READ_COMMITTED_SNAPSHOT ON WITH NO_WAIT?

* For a table-valued function, which would *generally* be the better return type: `RETURNS @tbldata TABLE(ID INT IDENTITY(1,1), col VARCHAR(MAX))` or `RETURNS @tbldata TABLE(ID INT NOT NULL IDENTITY(1,1), col VARCHAR(MAX), PRIMARY KEY CLUSTERED (ID) WITH (OPTIMIZE_FOR_SEQUENTIAL_KEY=ON))`

* if you had to do work in another database what would it be?

* what dou you think heaven purgatory and hell are?

To ask your questions, head over here.

Video Summary

In this video, I, Erik Darling from Darling Data, hosted a lively office hours session where I answered five community-submitted questions and shared my insights on various SQL Server topics. Whether it was re-motivating yourself in the face of work challenges or choosing between different table variable configurations for functions, you got direct access to my expertise. I also took the opportunity to discuss some of my personal preferences, like working with DuckDB, a database platform that has really caught my attention due to its innovative approach and impressive features. If you have any questions or want more detailed advice on SQL Server topics, feel free to ask in the comments or through the provided link.

Full Transcript

Erik Darling here, Darling Data. And of course, it is a day of the week that begins the work week, no matter what your language settings are, which means it’s time for office hours, where I answer five community submitted questions, and you get five Erik Darling submitted answers. It’s amazing. What a fine transaction that is. If you want to ask me a question, the link to do that is down in the video description. There are many other helpful links in the video description as well. You can hire me for consulting, you can buy my training, you can support this channel with money, and if you don’t feel like doing any of that stuff, but you still like the content, well, liking, subscribing, and telling a friend that this channel is the best, well, it’s a pretty good way to help me out a little bit. The only thing left on my calendar for the year, past Data Community Summit, Seattle, Washington, Washington, November 17th to 21st, where it’ll be two days of T-SQL pre-cons with me and Kendra Little, and a bunch of other days of other stuff. So you should come, and you should come to my pre-cons, and I don’t know, maybe I’ll dress like a pilgrim or something. Anyway, let’s do the office hours thing, because that’s what we’re here to do.

All right. Here we go. Back in the days of yore, when you worked for somebody else, did you ever become despondent with your work or SQL? If so, how did you re-motivate yourself? Well, I am in a weird place in life, because I love what I do. And you know what they say? When you love what you do, you’ll never take a day off in your life. You know, of course, there would be times when I would get frustrated with something I was working on, or, you know, maybe someone I was working with. I don’t mean like within a company. I mean like, you know, like externally. But, you know, I’ve always found it pretty easy to keep going and find something new and interesting that catches my eye or imagination with SQL Server. So I consider myself very fortunate in that regard. If you are having that problem, I would maybe suggest seeing if there is something else maybe that you could focus on a little bit. Maybe just, you know, change a pace, change a mindset, change a point of view is what you need. Other than that, I don’t know.

You know, there are certainly downsides to working for someone else, like always having to build something that someone else thought of. But, you know, there’s downsides to working for yourself too. Like, sometimes you have no idea what to build. So it’s all, the grass is not always greener. But, I don’t know. Really, I don’t know what you’re doing with SQL or with your life in general.

So, I can’t really give you more specific advice. But, you know, I consider myself fortunate that I have never really had that big of an issue. All right. Next question here. Why have I never heard you suggest alter database current set read committed snapshot on with no wait? Because it’s a stupid waste of time. Just use with rollback immediate and you’ll actually get what you want to happen.

No wait doesn’t really help that much. All right. For a table valued function, which would generally be better? Be the better return type. Returns table data, table ID identity, call varchar max.

Or returns table data, table ID not null identity, call varchar max. Primary key clustered with optimized for sequential key. Why the hell would you turn on optimized for sequential key for a table variable?

Inside of a table valued function. Son, put the database down. Optimized for sequential key does not help you with that.

And, I mean, I’m sorry. I’m hyper focused on how silly that is. So, other than that, you know, the primary key obviously can be useful in some cases if you have equality predicates or certain join types to the table variable.

Without knowing more about how you’re using it, I can’t tell you if the primary key would be helpful for anything. Of course, even indexes on table variables do not get any statistical information. So, and have all the same limitations within a multi-statement table valued function.

It’s been discussed many times on the channel. So, I don’t really think that either one is going to solve a giant problem for you. But if I had to opt for one, it would probably be the one with the clustered primary key on it.

Because it doesn’t really hurt you in any way. And at least for some types of queries, it would be less painful or less annoying. It would still be equally as painful, just maybe less annoying.

All right. Next question. If you had to do work in another database, what would it be? I don’t know. It would still be work.

I get what you’re asking. If I had to work with another database platform, what database would I want to work with? Well, I’ve worked in a few other database platforms. And I got to tell you, I wasn’t that impressed.

You know, there are a couple of nice things that SQL Server doesn’t have in some places. And there was some stuff that I was like, wow, I really wish SQL Server had, I really wish other database had this SQL Server thing.

But I think the database that I am most enamored with currently is DuckDB. I think they are doing absolutely fantastic things. I think the way that they have narrowcast the purpose of their database is fantastic.

I love their focused efforts. And I love the stuff that they’re doing to extend SQL and their database. They can do all sorts of fun stuff.

Like they can read like Postgres and MySQL database files. They have a CSV import that is so simple and so good at what it does that the first time I ever tried it, it made me emotional. It was like, wow, you just did that.

I was like, I got a little misty looking at it. So if I had to do work in another database, it would definitely be DuckDB. I just don’t know how to make money with DuckDB. So that’s another thing.

All right. This is apparently going to be a quick one. All right. What do you think heaven, purgatory, and hell are? Well, I am not the most spiritually scholarly person in the world.

So I don’t know that I have the best point of view on that. I can tell you that for me, heaven would be smoking cigarettes in a French graveyard. Purgatory would probably be being on a team’s call for eternity or however long purgatory lasts.

If I recall correctly, there are some rules and limits around how long you spend in purgatory depending on your situation. And, well, of course, hell would be waiting for a squat rack for all eternity. I think that would be my, those would be my leanings there.

All right. I think that I’ve answered all those questions sufficiently. This is a really short one.

Anyway, I hope you enjoyed yourselves. I hope you learned something. Thank you for watching. And I will see you, well, definitely in the next office hours, but certainly in tomorrow’s video as well. So we have that to look forward to.

So perhaps we are not in heaven, purgatory or hell unless one or more of those is recording YouTube videos, in which case I am in whatever one that is. All right. Thank you for watching.

Thank you.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

A Little About Serializable Escalation In SQL Server

A Little About Serializable Escalation In SQL Server


Video Summary

In this video, I delve into a fascinating aspect of SQL Server that often goes unnoticed: how certain actions can escalate to the serializable isolation level without explicitly setting it. I explain why you might not always see these escalations in standard monitoring tools like `sp_whoisactive` or deadlock XML reports and demonstrate with an example involving indexed views, which can trigger this behavior when updating referenced tables. By walking through the process of creating an indexed view and performing updates that cascade to multiple tables, I show how to identify these hints using my `What’s Up Locks` tool and trace flags, revealing the complexity behind SQL Server’s locking mechanisms.

Full Transcript

Erik Darling here with Darling Data. And today’s video is kind of a fun one because something that, you know, like you’ll hear a lot, but not actually be able to see very easily. And this is about how doing some things in SQL Server will escalate to the serializable escalation level, but not tell, serializable escalation level, isolation level. I mean, it is an escalation level too, but, but not really tell you. And how like, you know, like all the normal stuff that you would look at if you were trying to figure this stuff out doesn’t really show you like, you know, like in a, in a meaningful way that the serializable escalation happened. So like, like if you’re looking at the block pro, if you’re looking at like SP who is active, right? Well, there’s blocking going on. It won’t, it can’t tell you about this. If you look at, um, if you look at like the block process report, the deadlock XML report, it won’t tell you about this. And the reason why is because the, the serializable, uh, isolation level escalation happens as a hint. So like it’s not setting the transaction level for like the whole thing to serializable, right? When, when you set transaction isolation level something, then you can see that in who is active. Then you can see that in the block process report.

Then you can see that in the XML deadlock report. If you just change the, if you ask for something different at the, using a query hint, it’s not for the whole transaction. And so SQL Server usually like, if you’re using all the defaults, we’ll just report read committed. All right. So, uh, we’ll talk about that in this video. Anyway, uh, if you think this stuff is interesting and you have interesting SQL Server problems or even really boring ones that you just want someone else to work on, you can hire me for consulting. Uh, if you want to learn more about SQL Server, stuff like this, you can buy my training. Uh, if you like this content enough to, uh, the, you know, uh, like donate a bit to the channel, you can become a channel member. All of these things are down in the video description, uh, along with, uh, the link to ask me office hours questions, which I try to answer five of every week. And of course, if you enjoy this content, uh, do the kind thing, uh, like subscribe, tell a friend, all that good stuff. Uh, the only thing that I have left on my speaking schedule, uh, through the end of the year is past data community summit, uh, taking place in Seattle, November 17th to 21st, where I will be banging out two days of T SQL pre-cons with Kendra little. Uh, they will, are undoubtedly the finest T SQL content ever produced. Uh, so I will, I will see you over there.

Anyway, let’s, let’s get this party start. Do do do come over here. So I’ve gotten in, so the way, so this can happen for two reasons. I’m going to show you one of them because showing you both of them would be boring and redundant. So, uh, I’ve created an indexed view. Uh, it’ll, it will happen for this. Um, if the index view has more than one table referenced in it. So like, like there’s an, obviously a join between users and posts in here.

This will also happen if you have foreign keys with cascading updates or deletes. So there are two things that at least I’m aware of where this will happen, right? So, uh, modifying tables that are referenced by index views, assuming that there is more than one table in the index view. Otherwise you just get regular X lock hints, uh, and cascading foreign keys. So I’m going to show you the example with, uh, index views.

So this view is already created or altered, and this index is already created. When I run this, I’ll get an error. Great. So, uh, one way that you can kind of see this is for some, this doesn’t like these types of locks don’t always show up when you, uh, modify a table reference in the index view. Um, that, that, that I, that I don’t have full details on, but I can tell you that for this update takes around four seconds.

And if we use, uh, my little helper thing called what’s up locks, uh, this is available at my GitHub repo. Uh, the short link for that is code.erikdarling.com. That’s where you get all my other store procedures like quickie store and pressure detector and stuff. But if we run that update and we get the locks and then we, uh, you know, roll back the transaction within that, we can see range XX lock.

So range locks like this are, uh, hints about serializable stuff going on. So that’s the first thing, right? Pretty, pretty obvious that we got some serializable locks that way.

The only way to see the hints that SQL Server supplies that I’ve found is to run the update. Um, you need to recompile hint. So the stuff shows up, but then to use this sort of smattering of trace flags to get other information out.

So if we run this, this will also take about four seconds and, um, down to the messages tab, you’ll have all this crazy stuff, right? There’s stuff, this stuff goes on basically forever. But if we look, um, if we look through this enough, we’ll see some hints in here, right?

So if we scroll down and we find where, uh, we start looking at tables and things, then we will see, uh, parts where SQL Server started applying hints. But, uh, it’s, it’s really kind of a nightmare to find and all this stuff. Uh, you really have to go scrolling and looking and, uh, it’s, it’s not a lot of fun.

Um, so, um, I’m not going to make you sit through all that, uh, but this is what part of the output in that looks like, right? You’ll see something like, uh, FIOP, which is, you know, physical, physical operation and, uh, range. And then we’ll see for the POST table.

Remember that was one of the tables that was in the indexed view. Uh, we’ll see, uh, some information in here. And then of course we will see hints applied. Those hints will be, uh, serializable internal and detect snapshot conflict.

Uh, detect snapshot conflict is in there in case you have snapshot isolation level enabled. The snapshot isolation level enabled in SQL Server needs to figure out if your rights are going to conflict with another one. But the important thing is the serializable internal.

And then on the indexed view itself, user post score, right? So this was on the POST table, but on user post score, we get some different ones. We get an upd lock and along with serializable, uh, because we actually have to update, we have to maintain the index view.

We’re not updating the POST table in this. Remember, we were just updating the users table. Uh, so the, but the POST table needs to be read from using the serializable isolation level for the indexed view itself.

We, we take an upd lock cause we have to maintain the index view, right? Cause the column in the users table we were updating is in the indexed view. So we have to update the index view.

And of course we get the same serializable hint when reading from the POST, from the index view to figure out which rows need to be maintained. Uh, and we of course get the detect snapshot conflict, um, hint as well, just in case snapshot isolation is enabled. So, like I said before, you have to go pretty deep into SQL Server stuff to find where these hints get applied.

Uh, it takes a lot of trace flagging and other annoying stuff to, to get this information. And then looking at the trace flag output, which is also not fun. Um, so, you know, there’s that, but once you find it, you can prove to people that, uh, doing certain things in SQL Server, primarily, uh, uh, creating an index view that spans more than one table.

And, uh, having to update one of the tables in that index view and, uh, and, uh, foreign keys with cascading updates or deletes will also show this same stuff. So it does happen. This is the only way I’ve found to really prove that it happens, but it does.

And, uh, that’s my story and I’m sticking to it. Anyway, thank you for watching. I hope you enjoyed yourselves. I hope you learned something and I will see you next time.

Adios. I’m hitting the wrong button. There we go.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Why Partitioning Is Not A Performance Feature In SQL Server

Why Partitioning Is Not A Performance Feature In SQL Server


Video Summary

In this video, I dive into why partitioning a table is not inherently a performance feature but rather a data management tool designed to facilitate quick swapping of partitions in and out. I demonstrate how setting up a table for partitioning can negatively impact query performance, especially when using aligned nonclustered indexes. Through practical examples with the Stack Overflow database, I show that queries on partitioned tables can significantly slow down compared to their non-partitioned counterparts, even when using clustered indexes. The video also explores the trade-offs between creating aligned and non-aligned indexes, highlighting scenarios where non-aligned indexes might be necessary but come at a cost in terms of performance.

Full Transcript

Erik Darling here with Darling Data, and today’s video is sure to be a video. We are going to talk about why partitioning is not a performance feature. It is a data management feature. When you partition a table, the primary objective is to be able to quickly swap partitions in and out. Anything else that you do with partitioning, it’s not that much fun. Splitting merging and merging partitions, all that other stuff. Managing partitions. Not fun, not performance. But this video is specifically about how when you set up a table in order to facilitate data management with partitioning, the performance of some queries can quickly go south. What we’re not, what I’m not talking about in this one is mixing clustered columnstore indexes with partitioning, which can in some cases help performance by getting you additional segment or two additional segment or row group elimination, however you prefer to talk about it, because most people are not doing anything quite that exotic with their tables. Most people are in the world where they are, they got it through somehow got this idea through their incredibly fixed goals that if they partition the table performance will get better. Wrong. If you look down in the video description, you’ll see all sorts of helpful links. You can hire me for consulting by my training, become a paid up member of the channel. If you want to support this high quality SQL Server content that I put out here. You can also ask questions on office hours. Those are free. And I answer five of them every week. And of course, if you enjoy this content, I do I do ask that you like subscribe and tell a friend maybe even 1020 30 100. If you have another like 1000 friends or something, just send them all the links.

Spam them. Spam them. Screw them. The only conference that I have left on my schedule is going to be past data community summit taking place in Seattle, Washington, November 17th to 21st, where me and Kendra Little are going to do two days of probably the most spectacular T SQL pre cons you’ve ever seen in your life. So I hope I hope you’re prepared. Anyway, it is it is Halloween month. So we are getting we’re having a Halloween database party in here. So let’s let’s let’s get cooking with this partitioning stuff. Why don’t we. So what I’ve done in the stack overflow database is really the only table and stack overflow that lends itself to partitioning well, because it’s long and skinny is the votes table. And I partitioned by a column in the votes table called creation date.

Because creation date gives me sort of the most even spread of part of partition data, even though it’s not like perfectly symmetrical, it is far more symmetrical than if I did like vote type ID or user ID because lots of people who voted once. So like, you know, stuff like, you know, stuff like that. So like even post ID, things get like no votes, right? Like things get like one vote. So it doesn’t make a lot of sense. So creation date makes the most sense. The partitioning setup itself doesn’t matter too much. It’s a partition range, right? For every year that’s in the table. So it’s like 20 2008 to 2013. So if I say, give me the min creation date from the partition votes table.

This is very quick, right? This is very quick, right? This is very quick, right? This is very quick, right? This is very quick. This turns out just fine. I have no complaints with this. This is okay. Now I’ve created some indexes below that we’ll talk about in a minute. But I just want to show you a quick difference.

Now we’re going to be not using the partitioning column anymore, right? So but I want to force SQL Server to use the clustered primary key on both the votes partitioned and the votes table here. Okay, so we’re doing the same query. We’re just hitting the partition table versus the non partition table and using the clustered index for both of these queries. When I run these two, things start to get a little wonky. They both return the same row, of course, but the partition table takes nearly two and a half seconds to do the exact same thing that the non partition table did.

All right. And this is again, these tables are identical aside from the partitioning. They both clustered on the the I mean, this one is only clustered on the ID column. This one is, of course, clustered on creation date and ID, right? Clustered primary key. But we’re selecting vote type ID. So that’s a little immaterial to this. They’re both page compressed. So there’s nothing weird there.

This just the just immediately aggregating vote type ID from the partition table takes like two and a half times as long, right? Go from one second to two and a half seconds. So already we’re at a little bit of a detriment. So the indexes that I’m creating the nonclustered indexes I’m creating this to show you some differences are we’re going to create one on vote type ID on the votes table, right? So this is not partitioned. We’re going to create one on the partitioned votes table that is aligned to the partitioning scheme, right?

So this is aligned on the partitioning scheme on creation date. And then we’re going to create a non aligned index on the votes table. So this is on primary rather than being on the partitioning stuff. And the reason I want to show you this is because once you create a non aligned index on a partition table, you lose all the ability to swap things in and out. Having a line nonclustered indexes means you can quickly swap partitions in and out of the table. Having non aligned indexes mean you’ll get an error if you try to do that.

Now, if your environment can if it’s palatable to your workflows and environment, you could, of course, create non aligned indexes to help queries. And then when you need to switch data in and out, you could drop them, switch the data and recreate them. That still might be faster than whatever you were doing before. I don’t know. It’s all about if you’re if it’s tolerable to your workflows or not.

I can’t tell you that unless you hire me. Don’t worry. My rates are reasonable. Anyway, we’re going to compare a few different queries using the aligned. Well, we’re going to compare to the non partition table and then we’re going to compare using the aligned and non aligned indexes on the part on the partition table. So the first one we’re going to do the same thing as before, but now we’re going to tell SQL Server use like this is really just for demo clarity.

And also because sometimes the optimizer will choose the non aligned index anyway, because it’s like, oh, yeah, I’d rather use this thing. So like I’m forcing SQL Server to use these just to show you like what things might look like if you only have aligned indexes on your partition table. So we’re going to run this one. This is against the non partition votes table, right?

And we’re telling it to use the the vote type ID index. This all finishes very quickly. Zero milliseconds, yada, yada. Fine. If we use the non aligned index on the partition table now, right, this is the non aligned version of the non non aligned nonclustered index. This also goes just fine, right? SQL Server very quickly finds that once we go and use the aligned index on the partition table.

So this is the one that we created that is aligned to the partitioning scheme. It is not so instantaneous anymore. This takes nearly two seconds. This almost exactly reflects the scanning that doing when we use the clustered index for this query took two and a half seconds. So adding a nonclustered index shaves like half a second off this. Let’s just be fair and say like, well, 1.898. That’s almost 1.9.

So let’s just call it like 600 milliseconds. We save 600 milliseconds creating an index. Not great. Right. Not not moving the needle for me. So like we can see that like having aligned indexes on partition tables is immediately slowing some types of queries down. I’m going to show you another type of query that slows things down. So let’s just say that we wanted to get the top five rows. Right.

So we’re going to say get the top five from the regular votes table and we’re going to tell it to use the vote type by the index for this. Right. Just to get demo clarity. We’re forcing the index here. This very quick. Right. Execution plan. Just what you would expect. We get five rows. We loop join. We key look up. We get the top five. Right. Take zero seconds across the board.

So we’re going to say that. Right. If we use the non aligned index on the partition table. Also very quick. Does the same thing. We have a compute scale error in here now, but the whole thing takes zero milliseconds anyway. Okay. Well, what if we do this on the aligned index? I could run this here. I could. But we’d be waiting a little while.

So I’ve run this ahead of time. You’re welcome. I try to save you some time because Microsoft won’t. If we run this and we get the top five from the partition table using the aligned index. This takes a full minute and six seconds. This looks a little bit different, doesn’t it? Right. If we look at what this does, we will be scanned the whole index. So we go from six seconds to almost 13 seconds. That takes about seven seconds. And then we spend the rest of the time. Well, we spend 45 seconds doing a 52 million row key lookup.

And, you know, nested loops joined to facilitate the key lookup. Right. So one row comes out of here. We go look it up down here and we do that 53 million times just about. Right. And then, you know, I guess a minute and three seconds there and then another three or so seconds getting out to the end of the plan here. So that plan looks a whole lot different from the plan that we got using the non aligned index. The non aligned index just goes, finds the top five rows, goes and looks up the extra columns we need and we’re done.

You’re not do all that other stuff. We are not able to get down to just the five rows when we use the aligned index. Again, the query plan for that is over here. And we are going to end the video with this nicely framed up because that’s a good way to end this video. So to summarize, partitioning, it’s a data management feature. It’s there for you to quickly swap partitions in and out of your partition table.

If you combine partitioning with clustered columnstore, you may see some performance improvements around segment or row group elimination. We didn’t dig into that in this video. We’ll do that in another video, probably. But there are lots of queries that SQL Server might have a very hard time with in your workload. If you decide to partition a table and not not index things any differently, you might need to create non aligned indexes to make these queries faster.

And you may need to drop and recreate those indexes around your partition swapping, assuming that you’re going to do that sort of thing. If you’re not going to do that sort of thing, there is absolutely no sense in you partitioning a table and then creating a bunch of non aligned, nonclustered indexes on it anyway. All right. I think that’s good enough here.

Thank you for watching. I hope you enjoyed yourselves. I hope you learned something. I hope that you won’t partition your tables and I will see you in the next video where we will talk about something equally as interest. All right. Thank you.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

A Lot About Multi-Statement Table Valued Function In SQL Server

A Lot About Multi-Statement Table Valued Function In SQL Server


Video Summary

In this video, I delve into the intricacies of multi-statement table-valued functions in SQL Server, exploring their unique behaviors and performance implications. Starting with a practical example, I demonstrate how these functions can behave differently based on whether they receive literal values or columns as parameters, highlighting the impact on execution plans and performance. By walking through various scenarios, including interleaved execution, rebinds, and option recompile, I aim to provide insights that can help you make informed decisions when working with such functions in your database projects. Whether you’re a seasoned SQL Server professional or just starting out, there’s plenty of valuable information here to enhance your understanding of these often-overlooked features.

Full Transcript

Erik Darling here with Darling Data. And in today’s video, we’re going to talk about a multitude of interesting things around multi-statement table valued functions. That’s about as much as I can give you here because if I tried to explain it all here, the mind would boggle, you would implode, you would just have a bad day. You’d be like a fried egg in your head. So thank you, PowerPoint, for finally responding to me. If you look down in the video description, round about here, there are all sorts of helpful links. You can hire me for consulting, buy my training, become a paid subscribed channel member, ask me office hours questions, and of course, one of the most valuable things that you can do if you enjoy this content is to like, subscribe, and of course, tell a friend. Keeping it all to yourself is of course very selfish. So we want to avoid selfishness here at Darling Data. While you’re watching this, I will actually be in Utrecht. Amazing, isn’t it? I’m not going to be here where I’m recording this. I’m going to be way over there. I got on one of these things and flew. So that means the only thing left for me to talk about is past data community summit taking place in Seattle, November 17th to 21st, where I will have two days of delicious T-SQL pre-cons with Kendra Little. And then I’m home for a bit until someone else decides to pluck me from my domicile.

And send me spiraling out into the world to blather on about SQL Server stuff. Anyway, with that out of the way, let’s talk about multi-statement table valued functions and all of the interesting things that can happen. So I’ve got this function here, which is a multi-statement table valued function. You can tell because it returns a table variable, right? And I’m not saying that this is a good idea. Usually I prescribe against doing this sort of thing. But there are some interesting things to talk about if you ever want to know why. You shouldn’t do this sort of thing. So we’ve got that going for us. There are, of course, times when they’re fine and actually useful and stuff like that. But usually it’s in the hands of quite a skilled T-SQL practitioner. And if you do not, if you have not elevated yourself to that level, you should probably leave the dangerous stuff on the ground. Don’t pick it up.

This particular function has a thing that we’ll talk about later. We can call this some ominous foreshadowing here, where I’ve sort of treated the two parameters that this thing accepts as optional parameters. And I’ve done this. This is also something that I generally prescribe against. So don’t look at this and like aspire to this code. This is not what you should be doing. Okay, so leave this out, right? Like, say, Eric said, probably don’t do this unless you’re a skilled T-SQL practitioner who understands the full ramifications and is making an informed choice about writing bad code for a demo in Minecraft.

So what’s interesting here is actually, we should explain a little bit. Actually, I need to do one thing just to make sure that we don’t end up with anything funny. I don’t want to spoil any surprises. So we’re going to make sure I do that. But so what I’m doing is I am essentially selecting from the users table and I’m getting everyone with a reputation over 800,000. Right. And let me just add a column to this. Say c equals count big. So make things a little bit more clear here.

So the reason why I’m passing in the literal values that you see here is because of the data that I get back from this query. Right. So if I look at the users who have a reputation over 800,000, I want to know what their min and max creation date and last access date are. And just for you later, I want to show you how many of them there are.

So the min and max for those are 2008, 913 and 2018, 0902. And we get four of those rows back. Four. Remember the number four from this query. And now what I’m going to do is I’m going to execute this one and we’re going to look at what happens.

This takes about seven seconds. We did it. And we get eight rows back. But the eight rows that we get back are essentially for four users. Right. So these are the four users that we get back rows for. I know that it’s eight rows, but it’s four users that we get rows back for.

So each execution of each row that we produced from the four rows that we produced from the users table produces two rows from the function. Right. Four times two, eight. Cool. Let’s go look at the query plan. And what you might think looking at this query plan is, Eric, this took seven seconds, but I don’t see seven seconds of time anywhere in here.

And you would be absolutely correct. There is not seven seconds of time displayed anywhere in here. This all looks like it took 256 milliseconds. Well, if I were me and I were looking at this, I might think, well, did that thing just take a long time to compile?

But no, the compilation on that was very low, four milliseconds. All right. That’s not it. So what happened? Well, this function has a particular thing applied to it.

The particular thing applied to it is called interleaved execution. And we’re going to see that in the properties over here. I clicked on the wrong thing. Sorry about that.

So is interleaved executed is true. If it were not interleaved executed, we would simply not have this node. It would not say is interleaved executed false. It would just not be there.

Love how Microsoft does that. It makes you feel crazy. So this has a optimizer feature released in 2017 called is interleaved execution is true. And what that means is SQL Server ran the function and then kind of paused and then like tried to do some cardinality estimate stuff based on the function.

The way that it does that is by using this thing called a sequence operator. The sequence operator ensures that things happen in a sequence. It has sequenced the events.

So the sequence is it executes up here and tries to do some interleaved execution stuff where it tries to make a cardinality estimate based on what ends up in the table variable inside of the multistatement table valued function. Of course, the guesses aren’t really all that great, but we’re using a table variable. So you get what you get and you get what you deserve.

Okay, cool. Well, why is why is all this interesting? Well, let’s run this again. Let’s run this a second time.

And I know I apologize for making you wait another seven seconds. It’s not my fault. This is Microsoft’s fault. So if you would like to blame anyone or invoice anyone for your seven seconds, make sure it’s Microsoft. A second execution of this query plan, once it’s in the cache, now we see seven seconds.

Look, it showed up in here. This is no longer 200 something milliseconds from the stuff down here. Now we see the 7.1 seconds here.

So when you have a multistatement table valued function that receives interleaved execution, only the second execution shows the time spent in the function. Isn’t that fun? Isn’t that just great?

It’s wonderful. Doesn’t drive you nuts at all, does it? But what is interesting here, aside from that, is because we have passed in two literal static values here, we have done something. What is the thing that we have done?

Well, we have made sure that this function will only execute once. You can tell how many times this function has executed by the rebinds. All right?

So we have a rebind is something that happens with like spools primarily, where you might see rebinds and rewinds. A rebind means that you have gone out and you have fetched data into something. A rewind means you have used data in that something.

So primarily a spool. If you see like a lazy spool or something, you’ll see rebinds and rewinds. And the rebinds tell you how many times you went and put data into that spool. And the rebinds, like every time you like truncate the spool and put new data into it.

And the rewinds are every time you reuse the data in this spool. So it’s sort of like cache hit versus cache miss. Great.

What’s next? Well, if we run this a little bit differently and we use the creation date and last access date columns from the users table now, rather than the two values that we put in there, we’re going to use the creation date and last access date.

We’re going to do everything the same except that. This is going to take a little bit longer. And by a little bit longer, I mean 30 seconds.

And what are we going to do for 30 seconds? I don’t know. I could ask you how your day was. Ask you how school was.

What did you get up to, kid? Stuff like that. You can bond a little bit. Little father whatever you are bonding while this query executes. And, you know, you could answer me politely and say, I don’t know, whatever.

Whatever. School was school. But 30 seconds later, our chat will be over. Our bonding time will be done. And we’ll have a query plan to look at.

We get the same rows back, right? There’s the same, you know, four people with producing two rows apiece that we got back before. All well and good. But our execution plan is much different now.

There has been no sequence of events. And we get all the execution time up front. If we look at the properties of the multi-statement table valued function now, we have four rebinds, right? So we went from one rebind to four rebinds.

Because every time we correlated to this function, we had to pass in two new values. The two new values or whatever the creation date and last access date were from the user’s table. So not a good time here.

And another thing that’s not a good time here is we have this filter operator. And this filter operator is saying where the owner user ID column coming from the function equals the ID column coming from the user’s table. So this isn’t a great strategy in general.

Well, you know, if you’re going to do this sort of correlation outside, you should probably think about pushing this correlation inside, right? It seems like a reasonable thing. So we’ll add a parameter to this, to a new version of the function called multi-optional pushed, right?

And we’re going to add owner user ID. And we’re going to expand our bad idea where clause to include owner user ID here, all right? So let’s, and again, more ominous foreshadowing is the option recompile here.

So let’s create or alter this function. And now when we run this, right? So now we take the correlation out of here and we use the ID column here.

We’re going to start by passing in two literal values. Because what I want to show you is that as soon as you start passing in a column here, we have to start executing and rebinding the function more than once. This is where the extra time comes in, right?

So like when we had a situation earlier where, you know, the function took seven seconds, now we can sort of get that back by pushing the correlation in. And granted, we lose the multi, whatever, we lose the multi-statement table-of-added function interleaved execution thing. We don’t have that sequence operator anymore.

But now we have, and we do rebind this thing four times. Fine. Okay. Now we’re going to see the exact same thing here, right? And this is, the annoying thing is not that like this still takes seven seconds.

The annoying thing is that pushing that predicate in is not improving life for us, right? So like, we want to improve this in a way. What we’re messing up here is when, like, you can’t see it in here, right?

Because when you look at an actual execution plan for a multi-statement table-valued function, you don’t see what happens inside the multi-statement table-valued function. Like, up here, like, we can see what’s going on in the function. We select from posts.

We, you know, group by these things. We get a sum of this. We filter on the sum. Like, we can see what’s going on, right? If we get the estimated execution plan for this query, now we get an additional query plan back, right? The additional query plan that we get back is for the function.

We don’t get this back with the actual execution plan because then we’d see, like, four execution plans for the function, right? One for each row that comes out of the user’s table. If we’d returned a lot of rows from the user’s table, like 1,000 or 2,000, we’d kill SSMS.

But if we look at the scan over here, we can see that we’re doing the typical thing that happens when we use this sort of optional parameter query syntax, where we have, like, all of these predicates expressed out fully, like, owner user ID equals isn’t all this. Creation day is greater than isn’t all that.

Creation day is less than isn’t all that. So, like, this is why you don’t want to write queries that way. What’s particularly annoying is that if you add option recompile outside here, we still get the same problem, right? So, this is where things get a bit sticky and tricky, right?

We still get the same seven seconds here, and we still get the same index scan in here, where SQL Server does not get the parameter embedding optimization involved inside of the function. Even though we’re like, hey, query, option recompile, when we don’t have that inside of the function, things break, right?

Or rather, things don’t go all the way we planned. So, I’m going to add option recompile inside of the function, right? And we’re going to recreate this thing.

And now, when we run this, it’s very fast, right? And now, even when we put in all of the columns, it’s very fast, right? So, now, rather than seven seconds, this takes 200 milliseconds.

The cost is, of course, recompiling the function every time that it runs. But when you look at the execution plan in here, rather than doing that big index scan with all of the is nulling on there, we do get the seek predicates of the literal values in here.

So, there are ways that you can, like, if you’re just not allowed to rewrite multi, like, so, think of it two ways. If you’re not allowed to fully rewrite a multi-statement table-valued function, or if your multi-statement table-valued function is simply, like, too long and involved in too many, like, if, then, else, blah, blah, blah, things for you to turn into a single inline table-valued function, there are some things that you can do under certain circumstances to get your multi-statement table-valued function faster.

One of those things is, of course, like, option recompile, if you have the, like, sort of, like, is null parameter column thing going on. And the other is to start pushing, like, predicates that you would normally have outside of the function as things inside of the function.

It’s two ways of getting around that stuff. So, like I said, this was going to be a whole bunch of stuff about multi-statement table-valued functions all at once. We talked about the rebinds and the rewinds.

We talked about interleaved execution. We talked about sequence operators. We talked about all sorts of things that were interesting and fun. But now I’m out of things that are interesting and fun to talk about. And so it’s time to go.

Thank you for watching. I hope you enjoyed yourselves. I hope you learned something. And I will see you in the next video. And don’t forget to look in the video description for helpful links. All right.

Should I say that every time? Probably not. There’s no one over there. Don’t worry. I’m talking to myself. All right. Thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

A Little About Optimizer Nudging In SQL Server

A Little About Optimizer Nudging In SQL Server


Video Summary

In this video, I delve into optimizer nudging—a technique where we rewrite queries to guide SQL Server’s query optimizer towards the execution plan that best suits our needs. We explore various methods, such as using `TOP` in subqueries and leveraging primary keys for faster performance, while also touching on how index alignment can significantly impact query efficiency. By separating relational elements from informational ones, you can craft more optimized queries. Additionally, I provide links to valuable resources in the video description—ranging from consulting services to training opportunities—that can help you master this technique further. Whether you’re looking for professional assistance or want to support my content, there are plenty of ways to get involved and improve your SQL skills.

Full Transcript

Hello again. Welcome. I’m glad you’re here because we’ve got a couple interesting things to talk about. We are going to discuss optimizer nudging. And of course, by optimizer nudging, I mean the process of rewriting queries in order to get the optimizer to do what we want. So this is just a brief video about that. We cannot go into depth about that because then you’d have to pay me and then you wouldn’t be watching this for free. So we’re just going to, we’re going to just do a little bit about that. Uh, if you look down in the video description, right somewhere in here, uh, you’ll see all sorts of very helpful links. If you need help nudging the optimizer, well, you can hire me for consulting. Uh, if you want to learn how to nudge the optimizer further, you can buy my training. If you just want to support, uh, my, my continuing to provide free, free help with, uh, nudging the optimizer, you can become a channel member and contribute money to the channel. Uh, you can also ask me office hours questions. And of course, if you like this content, uh, the, one of the more, one of the most helpful things you can do is of course, like subscribe and tell a friend because if you keep it all to yourself, it’s very selfish. Uh, pass on tour. This will be one of the last opportunities for you to purchase a ticket for pass on tour Utrecht, October 1st and 2nd. So, uh, I forget.

I forget which day I forget which day this week. I have to stop saying that, but I’ll figure it out before, before I, before I mess anything up. And of course, past data community summit, Seattle, November 17th to 21st, two days of T SQL pre-cons with Kendra little. That’s me, me and Kendra little Kendra little. And I will be there teaching you amazing things about T SQL. And then I’ll be home for a bit until I don’t know, someone else decides to pluck me from my domicile and throw me out into the world and expose me to all of you. people again. So anyway, let us begin by nudging the optimizer and fun, various and fun and sundry ways or something like that. So, uh, I’ve got an index created here on the post table on creation date, uh, and then score. And, uh, you know, we’ll, we’ll, we’ll, we’ll talk more about this later, but for now, just stick with me. So, uh, we’ve also got this query and we’re saying, give me the top thousand rows, uh, where creation date is between these two dates or betwixt, not exactly between, because between would be less than and equal to two.

Right. Between is inclusive in that way. Whereas this is exclusive of the stuff. Uh, and if we run this query, uh, we’ll get what I would call a sort of ugly query plan. Right. Uh, we, we start by, uh, seeking into our index and we get kind of a lot of rows out of there. And then we, uh, we do a key lookup. And of course this, this is a row sort of a row by row operation because it is a nested loops join and the whole, and then we end up sorting the data over here.

So we still have, we, we, we created an index, but we still, we still have to sort data, which is, which is a bit upsetting. And then at the very end, we, we top out with our thousand rows and we, uh, we, this all takes about 750 milliseconds. Now the situation can of course get worse if we hit upon one of SQL servers, many tipping points.

For example, if we say select the top 1000 and we increment this date by one to 2013.0318. Remember this was 2013.0317 up here. Uh, SQL Server will neglect to use our nonclustered index entirely. And this will all take about 900 milliseconds after scanning the post table and all that other stuff.

So let’s say that we’re unhappy with this and we want to nudge the optimizer in again, various and sundry ways. Uh, we might decide to rewrite our query like this, uh, where rather than allowing SQL Server to choose what we’re, what the, between doing a key lookup and a clustered index scan. Let’s say we do this, right? And we say, select the top thousand rows, uh, from posts where P dot ID is in yada, yada.

And you know, we, we do all that same stuff. Well, we can get a much faster performance doing this because we’re kind of taking advantage of some stuff that we know about SQL Server. One, it digs primary keys. Primary keys make things very easy. So if we just, uh, say select star from the post table where the primary key, which is the clustered primary key, which is the ID column is in these top 1000 rows.

Well, SQL Server doesn’t really have much to worry about here. Cause this is the, this is a very narrow select list for the top 1000, right? And, uh, even if we were to bump this up to that 18 number, there was a tipping point before, well, SQL Server is still, uh, keen on doing what we want.

Right? So this is, this is all a much better arrangement than just writing the query like this. Now, of course, if when you are first writing queries, uh, you, you should, uh, arrange those queries logically in a way that makes sense, right? A simple and logical way that attempts to get data the way that you want it.

And then if you are unhappy with the performance of those queries at the outset, then you might experiment with different syntax. And this is kind of what this is all about. So, uh, adding in the, the, the select top 1000 to a sub query, we’re still almost doing a key lookup.

We’re just doing it a little differently. Right? Because we still have to look those columns up. We’re just writing the lookup in a manual way, right?

Cause we’re saying select star from posts up here. And then we’re saying, but you know, just where the ID is in this. And that’s what SQL Server does. It gets the thousand rows that we care about in here.

We still have to sort data, which we’ll talk about in a minute. Uh, and then we can see that this is where the top isolates those thousand rows we care about. And then this is where we go and do the lookup, right?

Cause we still, we still have, we have the nested loops joining. We do a lookup based on the thousand rows that we got based on the stuff we cared about up there. And we seeked into the index down there to get the columns we care about just for those thousand rows. The very, very handy thing.

And this comes, this comes down to something that I’ve talked about a bunch of times in other videos, which is sort of separating relational elements from informational elements. Informational elements are things that you’d want to show to people. And relational elements are things that you need to filter on and order by and other stuff like that.

So separating those two in your mind, like manually separating those in your mind, and then manually writing the query with those separations in mind can do very good things. Of course, one thing that is a little, a little annoying, and I only say a little annoying because, you know, it’s just how SQL works.

But you can’t, you couldn’t use exists for this. All right. So like a lot, a lot of like videos that I, where I’m like, Hey, exists and not exists. They’re pretty great. Problem is you can’t put top in an exists like this, right?

So like in, in, in this, in the, the query plan for this query up here, we very clearly have a top operator limiting the rows to 1000. But down here in this exists query, if we were, oh, scroll down a little too far, blew the whole gag. Uh, uh, if we do this, when we just hit control L notice, we don’t have a top here.

SQL Server completely throws the top away. Right? SQL Server is like, no, you don’t get a top. You think they’re not limiting rows in the exists.

Top is gone. All right. So we couldn’t do that here. We could use an outer top with exists. So top 1000 out here.

And then this in here, the thing is it was like, I mean, I’m gonna compat level one 60 for this. So I get batch mode on rowstore. So with batch mode on rowstore, of course, the query plan looks like this and I get, you know, uh, I don’t know.

It takes about 600 milliseconds and I get an adaptive join and all these other batchy things happen in here, which, you know, is fine, but it’s still not the plan that we’re really after. Right? Cause we want to limit those thousand rows before we do anything else.

Right? So like the adaptive join here is just like, and batch mode are just kind of like, oh, look, we saved the day. If we were using an older compat level, we might see a plan that looks like this with a loop join. Right?

And, you know, we kind of, we’re kind of just back to the original plan. We had that took about 700 something milliseconds, 750 milliseconds or whatever, but the 20 millisecond difference here isn’t, isn’t interesting. Uh, but it’s, it’s almost like we have, we’re still doing like the manual key lookup, right?

It’s just not called the key lookup here. It’s we’re just, cause we’re just joining the two tables together. Right? That’s what exists is doing.

It’s just joining essentially, or semi joining, depending on how you look at it. So we’re getting the top thousand rows from posts and saying P dot star where exists, but like this doesn’t like, I don’t know. I, the, the other, the other way of writing this that I showed you before with the in was, you know, with the, we could put the top 1000 in the in clause and just get those thousand rows.

We ended up way better off. It was like a hundred something milliseconds versus, you know, 700 something milliseconds. Now, one thing that is worth talking about is you could also consider this to be an index problem, right?

So like a very important concept when you’re performance tuning queries is aligning queries to indexes, right? So if we look, remember the index that I created up here was on creation date and score and score the, by default, this is going to be ascending order for both of these. All of the queries that we’re writing have creation date and ascending order, right?

So that’s why we still had to sort data across all of these. If you, if we were to think about this as like, you know, a query index alignment problem, and we took the descending, like sort out of score, right? We just said score ascending, then we would end up with a query that’s pretty fast naturally on its own, right?

Like this takes two milliseconds. Well, that’s even better, right? That’s, that’s pretty great. We don’t have to worry about sort of getting more in depth with things and rewriting queries and funny ways within and pondering why exists doesn’t allow top and all that other stuff.

So we might consider this just an index alignment problem. And if we create an index on creation date with score descending, right? So now score is in explicitly in descending order in this index, and we go and run that original query.

Well, guess what? Our original query doesn’t need rewriting either. So if you want to think about, you know, if you’re in a situation where you can’t really change indexes because maybe the tables are too big or you’re on standard edition and, you know, the blocking, you know, sometimes you have to think about rewriting queries in a way that takes better advantage of the indexes that you do have.

If you are allowed to change indexes, then you might, and you’re not allowed to change queries, which is a frequent thing for a lot of the sort of third party apps that I see people using SQL Server with, then, you know, doing something like this can make queries way better. Of course, if you’re allowed to change both, then it’s really, you know, chef’s choice as to how, what temperature that steak gets cooked at. You can, you can choose to either rewrite the query in a neat way, hopefully with some comments, or you could change, or you could add or change an index that would more better conform to the way that you have logically written your query.

But these are both forms of optimizer nudging, where we either have our queries conform to the indexes or have our indexes conform to the query. In either case, we end up with better performance. All right. I think that’s probably about good here.

Thank you for watching. I hope you enjoyed yourselves. I hope you learned something. And do remember to visit the video description down below for all of the helpful links that I can possibly provide for you. All right. Cool. Thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

SQL Server Performance Office Hours Episode 35

SQL Server Performance Office Hours Episode 35



Questions:

* I have a large table where 60% of the rows are defunct, the table is updated nightly by an ETL process that takes several hours. To improve query performance during the day, would you recommend using filtered indexes, or an archive table?

* Hello there! If a table that has a computed (not persisted column) that uses GETDATE(), is used in a scalar function, that function is never going to be inlineable no matter what, regardless if that column is not utilized. A covering index will not work either.

* What features would you like to see added to constraints in SQL Server?

* I often performance tune reasonably simple parameter sensitive queries that search date ranges. Before I was touched by your magnitude, I would approach this by dumping the sub-queries into a temp table. Now, I use dynamic SQL. Do you ever use temp tables for simple parameter sensitive queries? Why/why not?

* Why do so many people hate business logic in databases, and even stored procedures?

To ask your questions, head over here.

Video Summary

In this video, I dive into a series of community-submitted questions during my Office Hours session, providing insights and advice on topics ranging from managing large tables with defunct rows to the intricacies of computed columns in SQL Server. I also discuss the potential benefits of using temp tables for parameter-sensitive queries and address common misconceptions about business logic in databases. Whether you’re looking to optimize your database performance or just curious about best practices, there’s something here for everyone. If you have a question of your own that wasn’t covered, make sure to check out the video description where you can find links to submit your Office Hours questions directly.

Full Transcript

Well, through the magic of time travel, when you’re watching this, I will be in Europe. So I will be at Red Gate, Netherlands stuff. So that’ll, that’s fun to think about. All right. Next week at this time, where will I be? Anyway, we are doing, because it is time for that, Office Hours, in which I answer five community-submitted questions. Wonderful, right? Good stuff. Look at me. If you want to submit a community question, look down on the video description. You’ll see all sorts of helpful links. One of them is to ask me Office Hours questions, and you can, you can go get your question answered on this thing, if you, if you have one that you, you deem worthy of my time. Of course, just about anything is worthy of my time. So don’t set the bar too high. The bar is somewhere in hell, probably. With some damn soul twisting on it. So submit your questions. Thank you. There are also other helpful links in there. If you want to hire me, buy my training. I recently dropped my performance engineering course, which is, there, there is an on sale link down there. If you’re interested, if you’re interested in that. And, you know, some other stuff too. Like I said, by the, when you’re watching this, I will be on, like, I will be in Europe for the Utrecht event.

So, uh, what you should do is if you, if you’re in the Utrecht area, uh, maybe, maybe this is your last chance to buy a ticket. So check that out. Uh, and of course, after Utrecht, I will be home for a long time, uh, relatively speaking, and then going to Seattle, uh, November 17th to 21st for the past data community summit. Where, uh, Kendra Little and I will be delivering, delivering like a fastball, uh, two days of T-SQL, uh, pre-cons. And then I will be home, uh, I don’t know, doing things. What will I be doing? I don’t know. Figure it out when I get there. Anyway, let’s answer these questions because that is what we do for office hours.

And, uh, you know, try to do our best. Anyway, uh, first question here is, do, do, do, do, do, do. I have a large table where 60% of the rows are defunct. What a, what a nice word, defunct. Uh, the table is updated nightly by an ETL process that takes several hours.

Well, I mean, my rates are reasonable. Just saying. Uh, to improve query performance during the day. All right. So it’s a little mixed bag of information here, isn’t there? At night, this process takes several hours. During the day though, uh, would you recommend using filtered indexes or an archive table?

Well, if 60% of the rows are defunct, um, it seems like they’re just taking up space, useful space in this table. There could be, you could, you could be doing other stuff with it. Um, I mean, sure. Filtered indexes are nice. They’re smaller, right? They’re more compact. And if your queries meet all of the, you know, sort of prerequisites for, uh, filtered index matching, then, uh, they, they could be quite useful to, to screen out 60% of the rows there.

But, uh, general, general, if I were you, I would probably be working on an archive process because, um, you know, there are all sorts of rules for filtered indexes, what, what kind of columns you can make them on, determinism, uh, you know, um, whether queries match to them appropriately, uh, stuff like that. So, uh, I mean, I would probably go with an archive process.

If you want to read about a really neat trick for that, uh, I actually have a post on my site. So if you search on erikdarling.com for, uh, simplifying archival processes, uh, you’ll see a neat piece of sort of nested DML query that will make archiving data a lot easier for you. Um, you know, uh, if you’re too lazy, then sure, filtered indexes might help depending on stuff.

But, uh, you know, generally, uh, I would probably, if I, if I had more than 60% of the rows in a table not being useful to queries, I would probably just want to get rid of them. All right. Uh, do, do, do, do, do, do. I don’t see a question mark here, but that’s okay. Uh, hello there. Hello back.

If a table that has a computed, not persisted column, well, it can’t be persisted if it uses get date because it’s not deterministic, uh, is used in a scalar function, that function is never going to be inlineable no matter what, regardless if that column is not utilized. A covering index will not work either. Um, yes. Uh, one of, one of the restrictions on UDF inlining, uh, is that, uh, you can’t have non-deterministic functions in there.

Things like get date, sysdate time, new ID, ran, things like that. Uh, those make, those make for non-deterministic results because they are, uh, they are decided at runtime, not, uh, stored forever and ever in a way that would make them deterministic. So, um, what, so since you can’t persist that computed column, then, uh, enabling trace flag 176, Paul White has a wonderful, uh, post called properly persisted computed columns, uh, in which he discusses how to, uh, get around what you’re talking about, but, uh, get date there is going to screw it up, uh, in the permanence.

So, um, maybe don’t do that. If you need other options, my rates are reasonable. Uh, here we go. Oh, this one has a question mark. What features would you like to see added to constraints in SQL Server?

Uh, I’m not that guy. Um, I, I realize that there are all sorts of, uh, constraints that other databases have that SQL Server doesn’t. Uh, I just don’t spend a lot of time, uh, creating them. Uh, so, you know, like, you know, I realize check constraints have some deficiencies when compared to other, uh, databases, particularly to Postgres.

There are no, like, real domain constraints, things like that. Um, you know, they’re just, like, really complex constraints, um, you know, don’t often work out well. Uh, you know, even weird little shortcomings, like foreign key constraints, uh, can’t be filtered or something like that. You know, there are, like, all sorts of things that might be nice to have, but, uh, I, uh, you know, I question the, uh, the amount of effort that would go into adding them to their product, uh, based on, or versus, uh, how many people would actually use them since, uh, I don’t see a lot of people generally using constraints all that well or all that wisely now. So what would I like to see added? I don’t know.

There might be some neat stuff in there, but, uh, in general, um, I don’t really see people using them anyway. Um, some temporal constraints might be interesting, like a really easy way to figure out, like, the most recent active row or something, or, you know, something like that, but, uh, you know, people are strange. People are strange. Sorry. I don’t have a more explosive answer for you there.

I just, I just don’t get excited about constrained data. It makes me just think of tight underwear. Uh, I often performance tune reasonably simple parameter sensitive queries that search date ranges. Before I was touched by your magnitude, oh, well, speaking of tight underwear, uh, I would approach this by dumping the subqueries into a temp table. Now I use dynamic SQL. Do you ever use temp tables for simple parameter sensitive queries? Why, why not? Yes. Um, the answer is absolutely yes. So, uh, where temp tables make for, uh, a good, um, sort of parameter sniffing, uh, sensitivity issue is, uh, uh, let’s say that, um, like, just to make things simple, let’s say you have a store procedure that accepts one parameter and, uh, the table that, and that parameter is a sensitive one and the table that you hit, uh, you know, depending on how many rows that, uh, that parameter produces could be a nightmare.

Right. And let’s say that, you know, like, you’re like, we’re not just selecting from that table. Let’s say there’s some like, I don’t know, joins or exists or not exists or other stuff like that. Um, it can absolutely help to use a temp table to dump the results of the select for, from that table into there, and then do your more complicated part of the query that doesn’t have any parameters touching it with the results of that temp table. Indexing that temp table could also be useful depending on what is going on elsewise in the query, of course, but, uh, you know, I can’t tell you that based on your question. Um, the only, the only thing you have to really be careful of is if, uh, you know, how sensitive that parameter thing is, if it’s a difference between like, you know, a few hundred to a few thousand to even like, maybe even like a couple million rows, um, then a temp table can be perfectly fine. But if the, the magnitude of your, uh, sensitivity is, um, in the tens of millions or hundreds of millions of rows, then, um, you know, probably don’t want to dump that into a temp table. So be a little bit cautious there. Know your data or something like that.

Love your data. Uh, let’s see here. We got, okay. This is question number. Let’s make sure we got two, one, two, three, four, five. So this is the, this is five questions. I did not screw that up this week. Lucky me. Why do so many people hate business logic and databases and even store procedures? Well, people tend to hate what they’re bad at. That’s it. Uh, I hate PowerShell.

I’m bad at PowerShell. Uh, I hate using it. I hate typing it. I even hate using an LLM to generate it because I just hate the way it looks. I hate dealing with it. Uh, um, I am not good at figuring out what PowerShell is doing and I have no interest in getting better at. Uh, I, I frankly think it’s kind of a crappy language. So I hate PowerShell. So I don’t want to use PowerShell. So I don’t use PowerShell and I hate PowerShell. So when people talk about like things like in, in a very generic sense, like business logic, uh, or store procedures being bad, it’s probably because they’re bad at them. Right. And if, if, if I were terrible at SQL and I was very good at C sharp or some other programming language, uh, Java, maybe, I don’t know. I could code in crayon with Python or something.

No, we got, I’d probably be like, oh, screw that database. Oh, it’s hard to do all that stuff. I can do all this stuff in application code that I know well. So, um, you know, they’re probably just people who are bad at SQL. They’re, they’re, they’re bad at designing things and within the database, that’s about the end of it. Um, I I’ve never seen, uh, I’ve never seen a valid argument for it. Uh, again, sorry. I’ve never seen a valid argument against business logic in databases or store procedures that didn’t, um, also express what would seem like willful ignorance about SQL as a language or the possibilities of SQL as a language. Um, or the rather the capabilities of SQL as a language, um, that like I’ve never just never seen one. Uh, they’re all just like, Oh, I did this one thing once and it was bad and it didn’t, it was slow or like long rambling things. And you’re like, well, if you, if you were any good at this, you would have done it different. Right. You tried this one thing and it sucked and you were like, never again.

Right. It’s like, I don’t know. It’s a weird take. It’s a very weird take. Like don’t, don’t have your database do anything with data. Don’t put logic where the data is. Okay. Okay. Well, you know, uh, you know, I guess a little bit further to that, if, if you’re that type of person, um, you know, you could very easily find, uh, teams that agree with your sentiment and go, go work there. Um, but I think, uh, if you are a ma, if you are managing that kind of team and you start making rules like that, um, that that’s, that’s a bad idea. You, you should, you should allow your team to, um, to work where they are most comfortable and most proficient, do not set arbitrary rules about where, uh, certain logic should live or which APIs or procedures artists, another API, uh, should be used to interrogate the, the data in a database because you’re, you’re just hamstringing them. If you say, no, you can’t do, you have to use an ORM now and you have to put all the logic in some other piece of code, then they’re, they’re not gonna, they’re not gonna do their job as well. So don’t be stupid. Anyway, that is five questions. I think we’re done here. Thank you for watching. I hope you enjoyed yourselves.

I hope you learned something. And once again, if you would like to ask a question here, uh, please look down in the video description where the, the link to ask questions is, because that’s, that’s the way to ask them. Otherwise I cannot, I cannot accept psychic questions here.

All right. Goodbye.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Introducing My SQL Server Performance Engineering Course

Introducing My SQL Server Performance Engineering Course


Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

A Little About Table Variable Deferred Compilation Depth In SQL Server

A Little About Table Variable Deferred Compilation Depth In SQL Server


Video Summary

In this video, I delve into the fascinating world of table variable deferred compilation in SQL Server, specifically exploring how multiple inserts affect its behavior. I walk you through a stored procedure that demonstrates inserting rows into a table variable and then querying it to see if SQL Server defers compilation after each insert. The results are quite enlightening, revealing that while initial executions show promising deferred compilation, subsequent runs start behaving more like parameter sniffing, retaining cardinality estimates from previous queries. This video aims to shed light on this lesser-known aspect of table variables and might help you troubleshoot strange behaviors in your own SQL Server environments. Whether you’re using SQL Server 2019 or higher compatibility levels, understanding these nuances can be crucial for optimizing query performance.

Full Transcript

Erik Darling here with Darling Data. And we’ve spent some time this week talking about different sort of cardinality estimates for things that don’t really get great cardinality estimates. And one of the things that I talked about earlier in the week, I think it was on Tuesday, was that up to a certain point, table variables behaved a lot like local variables and that you would, they would sort of compile with the batch, but would not get cardinality estimates. a lot of information until table variable deferred compilation came along. And this made me think a little bit because, you know, in the past I’ve definitely tested stored procedures just to be like, okay, if I run this once and the table variable gets a thousand rows in it, and the queries that touch it now can see that there are 1000 rows in the table variable, and then I rerun this and I put like a hundred thousand rows in the table variable, what happens? And it turns out that… the behavior of table variables turns into what was essentially the behavior of table valued parameters in the past, which was that they would sort of act like parameters and the cardinality of the table value parameter would get sniffed and reused over the course of a query plan.

So like to me that was sort of interesting, but then it made me wonder like, well, what if I insert rows into a table variable multiple times? Will SQL Server defer compilation for multiple inserts? And so that is the question we are going to answer today. We are going to judge the depth of table variable deferred compilation. We are going to judge its depth like we are at a powerlifting meet and we want to make sure that it is squatted deep enough to get all white lights. We want to see the crease of the hip. We want to see the butt a little below parallel. We don’t want anyone cheating on their squats saying, oh, I squatted 1300 pounds.

Your butt didn’t even break even, pal. Anyway, if you look down in the video description, you will see many helpful links. Most of the help that those links will provide is in the form of giving me money. Hire me for consulting, buy my training, become a supporting channel member.

And if you want to ask me questions for my office hours episodes, you can do that. Otherwise, please do like, subscribe, tell a friend. Assuming that your friends are physically existent in the world and capable of subscribing to a YouTube channel.

But if your AI girlfriend has a YouTube account, I’m willing to accept AI girlfriends. They remain your girlfriends. They remain your girlfriends, but I’m willing to accept subscribers in the form of AI girlfriends. All right. Married man. Don’t mess with me.

All right. So leaving the house, of course, I get to go to Utrecht. I don’t know. I made what I thought was a funny joke, but I don’t know if anyone laughed at it. It was, I trekked, you trekked. We all trekked to Utrecht. All right. That was funny.

I don’t know. I’m a little deflated on that one. Sorry. Delivery was lacking a little life. October 1st to 2nd, I will be there with the nice folks from Red Gate who are bringing Pass on Tour. Well, they brought me to New York City. That was very nice. Dallas and now Utrecht.

And of course, I will be in Seattle for the Pass Data Community Summit, November 17th to 21st, with two rock’em sock’em days of pre-cons with Ms. Kendra Little about T-SQL and all of the wonderful things that happen when you use T-SQL.

All right. The life changer that is T-SQL. There we go, PowerPoint. Only took you five seconds to catch up with that. Click. Great. Wonderful. Let’s party. Let’s look at table variable deferred compilation depth.

So what I wanted to test with this store procedure is, like I said, multiple inserts. So here we have a store procedure that accepts one user input, right? So we are still going, we’re not using a table valued parameter here.

We are just going to use a regular old table variable. But the table variable we are using will acquire the table variable deferred compilation intelligent query processing feature. And we will get on initial compilation a guess, right?

SQL Server will give us table cardinality for that. So if I run this and, you know, run all the code, we should see something interesting, shouldn’t we? So here we have user IDs, right?

We’re going to insert into that table. We’re going to select all of the user IDs that match our user ID first from the users table. And then do a count. So we got some cardinality to estimate.

Then we’ll do everything from the badges table. And then we’ll do everything from the comments table, right? So we got that here.

Then we’ll do everything from the post table. Oh, post, that’s a big one. And then since there are two columns in the post tables that have user IDs in them, I decided to do both of them, right? So we’ll do last editor user ID too.

And then finally, we will do votes. And what I want to see here is if after each insert, SQL Server actually defers compilation of the count query until all the rows are inserted, right? So that’s a good one.

So let’s run this. We have actual execution plans enabled. And when we run this, this will take a couple of seconds to run. No, because we don’t have any good indexes in place. That’s fine, though.

We got time to kill, right? You’re not busy. So we have every other query will be interesting to us. So let’s look at how these pair up. So we have the clustered insert from users, right, into our table variable.

That inserts one row. And we get a one row estimate when it’s selecting from the table variable. That is correct so far. That is not incorrect.

One might see a one row estimate from a table variable and think, my goodness, but the insert actually did only insert one row. So as we get down further and we insert user IDs, say, from now the badges table, we add another 9,363. And now look, our clustered index scan says 9,364.

This is great news. We have actually deferred compilation of the second select until later. Right?

This is great for us. Now when we hit comments, we add 46,7… Can you imagine leaving 46,737 comments? My Lord.

What are you doing all day? But this is interesting though because now our clustered index scan says 56,101. Which, you know, I’m not a math person, but I’m fairly sure that if you added that to that, you would get that.

So that makes sense to me. And if SSMS reframes this one more damn time, I am going to lose my entire mind. So now, let’s see.

Now we’re inserting into user IDs, selecting from posts where owner user ID fits up there. And we add 27,901. So this person has left almost 47,000 comments and posted almost 28,000 questions and answers. So that is insane to me.

But when we add the 27,000 to what we had before, look at our estimate. It holds up very nicely. This is great. And if we scroll all the way down here, we will see this pattern repeat where every time we add rows, we get the right sort of guess from our select from the table variable.

All right. Well, that’s good for a first execution, right? But now let’s do a second execution.

My favorite user in the Stack Overflow database is someone named Eggs McLaren. They’re a dummy account. And let’s see what happens now. So let’s do this.

We’ll enjoy ourselves here a little bit. And if we look at this execution plan now, well, this is still good, right? One of one, one of one. But if you notice down here, we seem to be retaining cardinality estimates from the past execution.

And so we’re not really deferring the compilation anymore, are we? We have sort of given up on deferring compilation because now we’re getting guesses of like 69 of 9,300. And if we scroll down a little further and we click here, so hopefully SSMS stops moving, then we’ll start getting, you know, like we don’t get a good guess here.

I mean, I realize this is the insert, but like we’re retaining cardinality estimates from like not only the insert queries, but also the select queries. We can only defer compilation so much.

So now we’re getting 91 of 56,000 and 65 of 27,901. And if we scroll down further, we will see this pattern repeat itself. So at least on the initial execution of a stored procedure, we do okay, right?

SQL Server is like, well, every time I do an insert, I’m going to defer compilation of the select from the table variable immediately afterwards until after that insert completes and we get table level cardinality. Golly and gosh, that’s wonderful.

I’ve never, never been so happy in my life. The problem becomes that on further executions, the table variable acts like a parameter. And now we sort of have table variable sniffing, which I don’t know if that’s good or bad, but it seems, it just seems like a fun new problem to be aware of and have to go solve for people.

So maybe this video will arm you to go do that. I don’t know. I don’t know if you’re using SQL Server 2019. I don’t know if you’re using compat level 150 or higher.

I don’t know if you have disabled the table variable deferred compilation database scope configuration. And moreover, I don’t know if you’re using table variables, but if you are and things are behaving strangely, this might just answer some questions for you.

Anyway, thank you for watching. Hope you enjoyed yourselves. I hope you learned something and I will see you in tomorrow’s video where, I don’t know. I don’t know what we’re going to do yet.

I believe it’s going to be a Friday though. So it will be necessarily amusing, right? Because Fridays have to be fun. Fridays aren’t fun. Might as well all just…

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.