SQL Server Performance Office Hours Episode 23

Can you dumb down what SORT_IN_TEMPDB does? I’ve read the documentation many times, but I don’t get it. Index building uses tempdb even when I turn it off.

My devs use ORMs very heavily. They let them write WHERE IN @Foo and the ORM will convert it to IN (@Foo1, Foo2, …). Is it worth bullying them into using table-valued parameters instead or am I just walking a path that ends in temp tables?

In Query Store I have one query that has two planss. It’s just one Nested Loops join, but one is really fast and one is really slow. What should I look for? They seem identical.

I still don’t get RCSI vs SI. Do I need both? What’s really different?

I want to buy your training but my boss thinks I can learn everything from ChatGPT. What should I tell him?

Video Summary

In this video, I dive into some user-submitted questions on SQL Server topics ranging from index building and TempDB usage to query optimization and isolation levels. Whether you’re dealing with ORM-generated queries or trying to understand the nuances of read committed snapshot isolation (RCSI) versus snapshot isolation (SI), there’s something for everyone. I also address the ongoing debate about whether to invest in my training or rely on AI tools like ChatGPT, highlighting the importance of human validation and technical depth that AI currently struggles to provide. If you have any questions you’d like answered live, feel free to submit them through the provided Google spreadsheet. Whether you’re looking for performance tuning tips or just want to chat about databases in general, I’m here to help!

Full Transcript

Erik Darling here with Darling Data and it is brutally blistering hot outside here in New York. It is like a hundred and something degrees. So if I seem brutalized by external sources, you ain’t wrong. Anyway, we are going to do office hours right now. This is where I go. I answer five. Count them five. That’s all the fingers God gave me. User submitted questions to my little Google spreadsheet. If you want to ask a question for one of these office hours episodes, you can go to this link and you can read some stuff and you can ask a question. If you feel like supporting the absolute insane amount of incredibly high quality SQL Server content that I produce and provide, you can sign up for a membership, you can sign up for a membership. All of all of these things that we’re talking about here are also down in the video description. So if you don’t feel like transcribing from your screen or just guessing how to do something, you can, you can always look to the video description for helpful clickable links. If you need, if you need help with your SQL Server and you’re like, boy, this Erik Darling fellow sure seems like a cool cat. Maybe a cat in a hot tin.

I’m not a cat in a hot tin roof today, but I, of course, am a SQL Server consultant. That is where I make the majority of my money from. I do all this stuff and do it better than anyone else. So we got that. If you would like to buy my performance tuning content, you can do that too. You can get all 24 hours of it for about 150 US buckaroos with that discount code there. Again, full link helpfully assembled for you down in the old video description. Another thing that I have available that you can spend money on in some, some, you know, fun way is my new T-SQL course. Learn T-SQL with Eric. That’s me. All 23 hours of the beginner content is currently published. The advanced stuff is coming after the summer. When it cools down, my brain stops frying. This heat has got right out of hand, as a wise woman once said. If you are coming to pass and attending the T-SQL pre-cons that Kendra Little and I are putting on, you will, of course, get access to all of this companion content for with, with your admission to the pre-cons.

Just a reminder, this is the pre-sale price of 250 bucks that will be going up to 500 bucks when everything is done. So get it while it’s hot or get it while it’s cheap, as they say. Man, do I wish I was out of the house right now? Maybe not in any of these places. I think everywhere is hot. I am going to be on the Red Gate Roadshow Tour this summer. That’s the Pass On Tour dates. New York City, Dallas, and Utrecht throughout the summer and I guess a little bit into the fall months.

All leading up to, of course, the big old Pass Data Community Summit taking place in Seattle this November. So I highly suggest you, if you are in the areas of any of these things, that you show up and we hang out and talk about stuff you’re into. It doesn’t even have to be databases. You can just talk to me about whatever.

Quite frankly, I’d rather talk about anything other than databases. Anyway, let’s answer some questions. And let’s zoom on in here.

And here’s our first lucky person. Oh, you lucky, lucky people. Can you dumb down what Sort and TempDB does? I’ve read the documentation many times, but I don’t get it.

Index building uses TempDB even when I turn it off. So rather than try to dumb down what Sort and TempDB does, because it’s hard to imagine anything dumber than the documentation, let’s focus on why your index building might still use TempDB.

Probably the most obvious one is that you’re using online equals on. That would be the first thing that requires TempDB. The second one would be, if you ever look at the query plan for the index that you’re building, you will most likely see a Sort operator in there.

And if that Sort spills, that Sort will spill to TempDB. So the Sort and TempDB is not talking necessarily about the physical Sort and the query plan, which you can’t help but Sort spills to TempDB regardless of what you choose for a TempDB option.

There might also be some stuff in there if you have lob data involved. But aside from that, I think the online index build is probably the most reasonable explanation for why you are still seeing TempDB usage, despite, I mean, Sort and TempDB being turned off your index build options.

All right. My devs use ORMs very heavily. My rates are reasonable.

I’m just going to say that up front. I’m here. Like, this is all me. So they let them write where in at foo, and the ORM will convert it to in at foo, foo to.

Is it worth bullying them into using table-valued parameters instead, or am I just walking a path that ends in temp tables? So it really depends on what your problem is.

Is there a problem with performance of the end clauses? Is the problem that they generate varying end clauses that all generate different query plans? Are you fighting a plan cache issue here?

If you’re fighting a query performance issue, I sort of doubt. If you’re fighting a plan cache issue, then the table-valued parameter thing will be useful. But if you’re fighting query performance issues, then the table-valued parameter would probably not get you anything all that much better.

It’s just like the lack of column-level histograms on table variables, which back table-valued parameters, would probably lead to some oddities in there. I suppose the nice thing about table-valued parameters is that the table cardinality does get sniffed a bit like a parameter would, which can be good or bad depending on just how many rows are going in there.

But it could be a case, too, where your data distributions are not so rough that using a table variable would hurt you. But, you know, again, without knowing what exact issue you’re facing, it’s a little hard to tell you if it’s worth bullying them into using them.

Of course, temp tables are the solution to many, many paths in SQL Server. So it would not surprise me in either case if you ended up using temp tables instead anyway. So maybe just get them into temp tables.

I don’t know. This might be a thing there. Let’s see what we got here. Oh, I love this question. In query store, I have one query that has two plans.

It’s just one nested loops join, but one is really fast and one is really slow. What should I look for? They seem identical.

Ah, so what you should look at is the nested loops join operator. Just take your MIS and right-click on the nested loops join and hit properties. Over in the properties tab that opens up in Management Studio, you’ll probably see one to two differences in there.

You might see that one nested loop says optimized true and one might say false. That might be one thing that you would see. But the important thing that you will see, the big thing that I find in these cases is that one of them will be using some form of prefetch, either ordered or unordered, and the other will not.

The prefetching optimization happens when SQL Server estimates that 25 or more rows might go into the loop join. And if you get a case where fewer than 25 rows, so like if it’s a, like the easiest way to think about it is maybe a parameter sensitivity thing.

It could also be a table like statistics thing. Or, you know, it could be a table variable. Well, I guess not.

Well, maybe table variable depending on if you’re on 2019 and all that table variable deferred compilation stuff. But, yeah, most likely one is going to have a prefetch optimization applied to it and the other is not. So that would be what I would look for there.

Depending on what you find or depending on what is driving the cardinality estimation there, if it’s a parameter sensitivity thing, you might try optimizing for a value that always gets you 25 or more rows so that you get the prefetch optimization.

Because most likely it’s going to be that the prefetch plan is the faster one. If it is a temporary object, you know, you could always try using some slightly out of warranty stats updates commands to change the row count on your temp table or something.

But, you know, again, my rates are reasonable. All right. I still don’t get RCSI versus SI for the folks at home. That’s read committed snapshot isolation versus snapshot isolation.

Do I need both? What’s really different? Do you need both? No. But you might find a use for both depending on if you’re asking this question, you probably don’t need both. If you’re asking this question, what you probably want is to just turn on read committed snapshot isolation and start removing all those pesky no lock ins.

The real difference is in statement versus transaction level consistency. And the easiest way to think to use hand gestures to tell you what that means is let’s say that this is a begin transaction and a commit, right? The other begin transaction and your commit up here.

And let’s say that there are two select queries within this begin transaction and commit. If you are using read committed snapshot isolation, this is the biggest difference to people who query it. This is not the only difference.

It’s like snapshot can obviously do stuff with updates and there’s conflicts and stuff that can happen. But the most common thing that people who write queries are going to be concerned about is going to be select query stuff. So we have this begin transaction up here and this commit transaction down here.

And we have two select queries. Under read committed snapshot isolation, this first select query will return any versioned rows that it needs to as of the time when that select starts. The second select query will return any versioned rows that are required when that select query starts.

That’s a little bit different under snapshot isolation where, like, you know, again, same deal, like begin transaction, commit, and two selects. This select will run and return any under snapshot isolation. This select will run and start returning any required versioned rows here.

And then the second select will actually use the timing of when that first select accessed versioned rows to figure out when it should return versioned rows from. So it’s statement level consistency, which is our CSI, right, where each statement is consistent. And then transaction level consistency, which is snapshot isolation, which is consistent with when the first query within the transaction accessed versioned rows.

So that’s the that’s the big difference for most people who write queries. All right. The fifth and final question, a humdinger.

Wow. Oh, wow. I want to buy your training, but my boss thinks I can learn everything from chat GPT. What should I tell him? Well, I think the easiest way for me to describe it is that I am a person who produces training.

Right. And when when LL was, you know, there I mean, when LLMs came around and up until today, I have tried to use them at various points to, like, you know, bounce some ideas around because, you know, it’s just me sitting. I am a solo individual person.

And I don’t have anyone to, like, talk to about stuff that I want to do, really, except myself. And so, you know, I’ll talk to the robots and I’ll be like, hey, I want to I want to I want to make some new training. Like, you know, I want to bounce some ideas around.

And I can’t even get chat GPT or like any of the other LLMs to, like, do a good job of, like, outlining good training concepts. They just come up with the worst crap, like like the nonsense gobbledygook that you see people like say over and over again on like LinkedIn, like most prolifically. Like, that’s all they come up with.

They don’t come up with like anything like good or like in depth. They don’t come up with anything that like beyond like surface scratching. And like, you know, they’ll say, you know, 10 things and three of them will be wrong. But like they’re just mixed in with all the other stuff that’s like, you know, either mostly or kind of right or just like situationally right.

So like even for me, like I go through it. I’m like, oh, well, I can’t use this. I can’t use this.

This is wrong. Like skip this. Like so like as a person who creates training content and I like who knows kind of like what good is the stuff that they come up with when I’m trying to get them to like like listen to me about things is not good. So like youth like for a person who needs the training, you would get you would get quite a miseducation in a lot of ways if you were to purely use an LLM unless you have the time to not only like like pester the LLM about stuff, but then go and like validate and verify everything that it’s saying.

So often like the value of the training is that, you know, it’s like mine, like like like I like I write it. I validate the hell out of it, like with all the stuff that I do. And also I get it tech reviewed.

So like I have Paul White like tech review my training so that I make sure that stuff is extra correct. So like like it’s like I’ve already done unpackaged and like like validated and votes for everything that I can like in the stuff that I produce. A normal person doing that would spend way, way, way, way longer getting a lot of stuff wrong and having to fix it.

So I, you know, if you’re if your boss thinks that the chat GPT knows everything, I mean, rude awakening when when that happens. You know, one thing that I tell people is that, you know, if you are if you think if you’re that confident in it, then you should have a conversation with an LLM with something that you. You know, deeply that you like like understand and you you have like a like a great grasp on and see how often it’s it’s you have to you.

You go, oh, no. Oh, that’s not that’s not it either. So, like, you know, that’s pretty much what it comes down to for me.

So, no, like I’m sure that you can, you know, get some stuff out of it. But as far as like, you know, the the depth and the the validation and all the like like the technical correctness of it, that that still does require a human touch. Anyway, thank you for watching.

Hope you enjoyed yourselves. I hope you learned something and I will see you in the next video and of course the next office hours. All right. Thank you for watching. Thank you.

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.