A Follow Up On Fixing Parallel Plan Row Skew In SQL Server

A Follow Up On Fixing Parallel Plan Row Skew In SQL Server


Video Summary

In this video, I delve into the issue of parallel skew and share a practical solution to address it in SQL Server queries. Parallel skew can be a tricky problem, especially when dealing with highly skewed data distributions that lead to inefficient thread utilization. I demonstrate how creating an artificial distribution by using a `VALUES` clause and joining it back to the main table can help achieve better thread distribution while also making the query more batch-friendly. This approach not only improves performance but also showcases the importance of considering parallelism in query optimization, particularly when natural solutions are insufficient. Join me for the next video where we’ll explore other aspects of SQL Server query optimization!

Full Transcript

Erik Darling here with Darling Data, and in today’s video, we are going to talk about how to fix parallel skew, or rather how I fix parallel skew for a particular query. All right. I can’t pretend this is going to be the thing that you do every single time you have this problem, because this is hopefully a rare problem, at least in today’s day and age, but you can still run into it plenty of times. Lord knows I do. And I find it interesting, and I’m going to talk about this. I’m going to be a quick-enough to talk to you about, and I’m going to be able to tell you a bit about it. And I’ll talk to you about it. So, if you find this interesting, well, stay tuned. Down in the video description, all sorts of helpful links. Perhaps you have lots of parallel skew, and you would like a young, handsome consultant with reasonable rates to come fix it for you. You can find all sorts of helpful links to do that. You can also buy my trainings. You can also support this channel with money, not just with various forms of internet applause. You can ask me office hours questions. That is a totally free endeavor for you. And I will answer five of those every Monday. And of course, if you enjoy this content, please do like, subscribe, tell a friend or two, tell your AI girlfriend who hopefully wasn’t too terribly affected by the recent AWS outage. We pray for her well-being.

I will be at Pass Data Community Summit in Seattle, November 17th to the 21st, giving two days of the best T-SQL pre-connery that has ever existed with Ms. Kendra Little. And then, I don’t know, well, hopefully I can talk about my other upcoming stuff soon. It’s just, people are very slow to announce things. Apparently, they just like to wait for excitement to boil over. Anyway, let’s talk about this parallel skew stuff. I’m going to make sure I go to the right copy of Management Studio here. So, let’s start off on this tab, because this tab is going to show us why we have such skewy problems with these.

So, what I’ve done with the votes table is something kind of mean. And so, to give a little background on this, when you have very skewed data, that can lead to very skewed parallelism. And in the public version of the stack overflow database, swallowed a bug. Most of the voting history has been anonymized, of course, because voting is a private endeavor. It should not be made public.

And so, what I’ve done is I’ve made a copy of the votes table. And I’ve set the user ID column equal to whatever the vote type ID was. So, vote type ID is a number from like 1 to 8 or something that has various types of voting that people can do on questions, comments, and answers, and whatnot. Well, I guess not comments in this table. But, so what I’ve done is I’ve effectively made a very skewed user ID column.

And I’ve created an index and all this other good stuff. But, this is the way that the user ID column breaks out. So, for vote type IDs between 1 and 4, these are the most common vote types. Well, actually, three of them are very common. One of them is not terribly common.

But, vote type of 2 is an upvote. Vote type of 1 is accepting an answer as the answer, the little green checkbox. And a vote type of 3 is a downvote. So, sorry, upvotes, accepted answer, downvote if I have my voting history correct here.

And 4 is like offensive or something. So, not too many offensive flags. Because Stack Overflow is a place for friends. It’s also going the way of MySpace, a place for friends at this point.

But, at some point, I’m going to be giving demos on Stack Overflow. And someone’s going to be like, what is Stack Overflow? I’ve never heard of it. It’s going to be funny.

Anyway. Anyway. Yeah. So, this is the way that these integers spread out. And these are the numbers for those. Because we have a group buying account and whatnot.

So, 37 million, 3.7 million, 3.5 million, and 733. All right. So, when we run a query that does, like this, that does this outer apply and whatnot. And we look at the query plan.

You may remember this from the original video that I did. And we come over and we look. I mean, not there. The tooltip is utterly useless for discerning what our problems are here. If we look at the way that rows were distributed across our DOP 8 plus 1 threads, you’ll see that they all ended up on thread 3.

Right? That number there would be the total count of the 37 million plus the two around 3.5 millions plus the 733. They all ended up.

So, the parallel page supplier runs this hashing function. And depending on how things hash out, that’s how the rows get distributed to threads. It’s dealing out all these rows. And unfortunately, our hashing function kept hashing these things out this way.

Now, sometimes when you do this, you can set DOP to different numbers, like especially odd numbers, in order to get the different thread distributions. But I tried that for every DOP between 2, which got me a serial plan all over again. So, theoretically, in reality, it was really between like 3 and 16 in order to get a different spread here.

But they all ended up, I mean, not always on thread 3, but always all on a single thread. And you can sort of visualize how this happens a little bit differently if you rewrite the query to turn this from one seek into, I guess, four different seeks. So, if you do one select for vote type ID 1, one select for vote type ID 2, and one for 3, and one for 4, and so on.

And then you get the dense rank over those. Then you get a slightly, well, you get a very different execution plan, but it helps you to sort of visualize things a little bit, right? So, if you look at these, we get one, two, right?

You can tell that’s 37 million, or sorry, I think that’s, yeah, that’s 3.7 million. That’s 37 million, probably. That’s 3.5 million.

And there’s our 733, right? So, we can see all four of the individual seeks when we do this. And if we go look, again, not at the tooltip, which keeps really making its presence felt here. We’ll see all of the rows end up on thread 1 here, right?

Which is, you know, not great still. And if we look down here, we’ll see all 37 million rows, well, they all ended up on thread 1. And then down here, there’s 3.5 million rows all on thread 1.

And over here, there’s 733 rows all on thread 1. So, all of these ended up hashing out to the same thread. Right?

Just in different seeks. And then if we come over about here and look, we’ll see thread 1 gets all 44 million some odd threads, rows on one thread. So, things do not end up better when we do this.

SQL Server does not like, unfortunately, SQL Server does not like spread those out better or do anything else smarter. It all just sort of ends up a complete washout. Now, the one way that I was able to rewrite this query in order to get better thread distribution was to do this.

So, rather than correlate on the four vote type IDs that I cared about, remember, because in the original query, what I had done was written where vote type ID is between 1 and 4 here. Right? So, we kind of narrowed it down.

We kind of forced ourselves to have a crappy parallel distribution on this one. But it’s a lot easier to do that intentionally than it is to get it to happen by accident. So, we kind of forced the issue on that one.

But what I was able to do to get this one worked out was to sort of nest things a little bit and put the numbers 1 through 4 into a values clause. Right? And call that vote type.

Alias that column that comes from these as vote type ID. And then left join the select here and join that on vote type ID there and say where user ID equals user ID out here. Right?

So, oh, it’s still logically a correct query. And when we look at the query plan for this one, this is kind of at the intersection of fixing parallel skew and being batch mode friendly. Because what you’ll notice in this plan, which is going to be different.

So, like, just to come back to these a little bit. You see we have a parallel nested loops here. And for this one, we have a parallel nested loops here as well, if I’m remembering correctly. Ah, I was.

Wonderful. Yeah. So, parallel nested loops. Good, good, good. Everything correct there. Now, we get something different. We get hash joins. Right? We have two hash joins instead. And if we look, we have this constant scan.

So, this constant scan is going to be the four rows, the numbers one through four that I put into that values clause. And if we look, I mean, there’s obvious signs that we got batch mode, like the window aggregate here. The important thing is this query finishes in about four seconds.

And the other queries over here, this one finished in 29 seconds. And this one finished in 59 seconds. So, I guess writing the four seeks out as multi-seeks was at least somewhat helpful by about 20 seconds.

But probably not where we want things to be anyway. But this query, if we come over and we look at the index scan over here, again, not the tooltip. We’ll see that the rows ended up hashing out very nicely on this one.

Right? I mean, there’s a little outlier here. Like, this is maybe not some, maybe not perfect, but way better than all 44 million rows ending up on a single thread. So, this is sort of at the intersection of fixing parallel skew and writing batch mode friendly queries, right?

Somewhere in there. But, yeah. So, you know, obviously, this is happening in batch mode. If you look up there, this sort will be happening in batch mode.

And since this sort is a child aggregate of this batch mode window aggregate, this sort operator is a child operator of the batch mode window aggregate. There we go. Batch mode sort, blah, blah, blah, blah, blah.

Then the window aggregate can read from the batch mode sort using parallel threads. Usually, the producing side of a batch mode sort will be single threaded, but not when there’s a window aggregate involved. And so, we get batch mode pretty much across this whole thing.

And this is exactly what we want. It’s just batch, batch, batch all the way down. And we get a much faster query with much better thread distribution. So, when you’re dealing with parallel skew, particularly like the kind that you cannot resolve naturally, maybe by forcing a different dop.

Again, particularly odd number dops tend to fix stuff like this. But it didn’t happen for me here. So, I was forced to rewrite the query in sort of a weird way.

But, you know, I’ve used the values trick in other contexts before in order to get different query transformations from things. Usually, you would take values and cross-apply to them. But doing cross-apply here, or it would be outer-apply because it’s a left join.

Using outer-apply here produced the same skewed parallel nested loops plan as before, which didn’t buy me anything. But doing the join, that means you can’t correlate inside of this. You have to do it outside of this.

So, you actually force SQL Server to do all this stuff and then join onto it here and then correlate to the user ID that gets produced here after that. So, it was a much more efficient use of threads and times. And, of course, getting batch mode involved was a very big help in making this thing go much, much faster.

Because we are processing a lot of rows, right? 44 million rows. You want batch mode for that many rows. That’s a good number of rows to batch up and not deal with on a row-by-row basis.

Because internally, that’s what row mode is. One row at a time. And batch mode is, of course, multiple rows, SIMD instructions, CPUs, and all that. So, this was just a fun way to rewrite this query to, A, fix the parallel skew, and as a happy side effect.

A very happy side effect. There are no fiascos, just happy screw-ups. I’m going to skip the alliteration on that one.

But this is just a happy accident that batch mode also got involved when I changed the format of the query. Anyway, thank you for watching. I hope you enjoyed yourselves.

I hope you learned something. And I will see you over in tomorrow’s video where we will abandon all parallel skewness. All right. Thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

How Many Plan Variants Can You Get With The Parameter Sensitive Plan Optimization In SQL Server?

How Many Plan Variants Can You Get With The Parameter Sensitive Plan Optimization In SQL Server?


Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Faster Better Wronger

Faster Better Wronger


Video Summary

In this video, I delve into an often-overlooked aspect of query tuning: ensuring that the results are correct and sensible. Erik Darling from Darling Data highlights how optimizing a query for speed can sometimes lead to incorrect or nonsensical results if the underlying logic is flawed. He demonstrates this with a real-world example, showing how a poorly written query can produce absurdly large numbers due to incorrect joins, leading to misleading data. The video walks through fixing these issues by rewriting the query and emphasizes the importance of validating both the correctness and sensibility of the results, especially when working on complex queries that affect multiple tables.

Full Transcript

Erik Darling here with Darling Data. And we’re going to talk today about an important aspect of query tuning that does not get enough attention. And that is making sure that the, not even necessarily, I mean, of course, necessarily the results of the query that we made faster, like making sure that those are correct. Like maybe comparing those to the results of the original query, but also making sure that the results of the original query that doesn’t really make any sense whatsoever. Because despite all of the good intentions that exist in the world, people may write queries with incorrect logic. They might write queries in a way that does not return sensible data. We all need sensible data in our lives. I’m not talking, of course, about no-lock hints, which only dangerous and irresponsible people use in their production queries. We’re talking about just general logic and correctness. So we’re going to talk about that a little bit today.

Down in the video description, you will see all sorts of helpful links. You can hire me for consulting by my training, become a supporting member of the channel, be an official Darling Data supporter. And you can also find links to Ask Me Office Hours questions, which are, of course, free. Yay! And then, the usual, if you enjoy this content in any way, shape, or form, however minor, please do like, subscribe, tell a friend so I can get a bigger YouTube or something.

Okay. Seattle. I’ll be in it. November. With Kendra Little doing two days of the most fantastical T-SQL pre-cons that the world has ever known. I’ll have all sorts of the swaggerific new stuff there, stickers and t-shirts and whatnot.

And I will also be giving all attendees access to all of the T-SQL training. So you’ll have that going for you, too. So think of all the wonderful things you’ll get for free just by showing, by having your employer pay for you to show up. Right? It’s a good exchange. This is absolutely bribery.

So let’s go talk about query correctness stuff. Now, that’s the wrong thing. Cool. All right. Let’s pretend that didn’t happen. And let’s go over here and let’s look at this query. This query is slow. Right? If we look at the execution plan for this query, it ran for one minute and 23 seconds.

Not a good time that right. Took a long time to run. We can see all sorts of strange things happen in this query that we don’t like. Right? All sorts of things that we would never want to see. Right? We have bad estimates and we have long running nested loops joins because of those bad estimates.

And everything just goes to hell in a handbasket. But what really goes to hell in a handbasket are the results. Someone wrote this query here. Now, we could do a bigger top here, but we would be here for a long time.

This took a minute and 23 seconds for one user. If we had more users, well, by God, by the time this thing finished, I’d probably be packing it up to go to Seattle, which is in about a month now. Actually, exactly one month now. I should probably book that travel.

But someone wrote this then, you know, we get the top one user and we select from that CTE and we join to posts and then we join to comments. But the way that these joins occur, we have this sort of incorrect multiplicative effect on things. Right? Look at the numbers we get back. John Skeet, despite being an absolutely prolific human being, does not have 236 million total posts and comments.

And these posts and comments scores are also very large numbers. These are like almost foreign phone number long. Right? They are crazy. So how did this happen? Well, let’s say that we wanted to rewrite this query in a way to make it faster. Right?

That would also sort of show us why we have this bad multiplicative effect on things. Right? And why we got such large numbers. So if we rewrite the query like this, right, we’re going to do the same thing here and we’re going to, but we’re going to separate the join to posts out. We’re going to pre-aggregate that.

And then we’re going to cross join to comments. Right? And we’re going to correlate the cross join to the, the user’s CTE down here. So let’s run this query and let’s, let’s, well, first we’re going to note that it is much faster. Right? That finishes instantly, not in a minute and 23 seconds. Good for us. And we get the same results back.

But the only way that we can get the same results back here is by giving like, again, the multiplicative effect. We would have to multiply post score by the total number of comments. And we would have to multiply the comment score by the total number of posts.

And when we would have to multiply the comment count times the post count in order to get the numbers to match. And that’s probably not right. Right? Because we know John Skeet does not have 236 million posts and comments.

And we can validate that. Right? If we go back to the source data and we say, hey, John Skeet, how many posts and comments do you have? Well, these are, these are the numbers that live in the actual tables. Right?

We have about 331,000 post score. We have about 14 and a half thousand posts. We have the comment score of 46,000 and we have a comment count of about 16,000. So like when you look at all this, you might wonder, well, how did we end up with those crazy numbers? Right?

And it’s really because of the way that the two inner joins work. Right? They’re both one to many relationships. And when we join the one to many relationships together, we get many more relationship. Right? Like if we come back to the original query, we joined, where was it? Oh, how far? Oh, those are indexes.

There we are. We joined that CTE to posts. Right? And on owner user ID to ID, that’s a one to, that’s a, well, I guess then backwards, that’s a many to one relationship. But then joining to comments, that’s also a many to one relationship. So we really get sort of this, I don’t, I don’t want to use the word Cartesian product, but I do want to say it sort of explodes the results in an incorrect way.

Where joining posts and comments on these things in the same sort of set of joins, it blows the results up and gives us incorrect numbers. Right? That like these sums and counts here, we’re just way, way, way out of proportion. So if we wanted to rewrite the query in a way to give us just what we wanted back, you know, we could, we could rewrite it like this. Right?

And we could just stick the select top one and join to the pre-aggregation on posts here and then join to the aggregation on comments here. And this would not explode the results. We would still have to add post count and comment count together in order to get the total like posts and comments. But that’s not so big a deal. And this number is correct. And these numbers match what we had in the source table for that one user.

We could also just have taken our original query and not blown things up with the multiplicative effect that we saw from the, the two jo- the two joins that we did that exploded our results earlier. And we would get, of course, correct results back from this. So when you’re writing queries like this, it’s so for me, as a consultant, it’s hard.

And this is why I do like prefer to like do things with like on zoom with like people who have more domain knowledge. So like when we’re reviewing results of things, we can sort of like, you know, like sanity check stuff on our own and be like, like, I can say, hey, like, I don’t know this data very well. You probably know it better than I do. Is there any way that like these numbers are accurate?

Cause this looks funny to me. Like, like, like, you know, like I usually like on a sales call, someone would be like, we’re a blah, blah dollars a year company. And I’m like, cool. Want to pay me more? Like, oh, you, so you have money. Great.

So like, but you know, like then like you might see some reports come back and like, you know, you’ll see like one customer has spent like $10 billion in the last month. And you’re like, well, you sure? Cause that sounds, sounds a little wrong, but these are important things for you to spot check and validate as you are making queries faster. Is that the, not only are the, are the results correct in that, like, maybe like they match whatever previous thing, but also that the results make sense.

Right. Cause you don’t want to give people data that is not sensible back. Right. Doesn’t do you any good to write a faster query that still gives you crappy results. Anyway, that was all I had to say today.

All right. Mind your data. Thank you for watching. I hope you enjoyed yourselves. I hope you learned something. And I think this is dropping on a Friday. So I will see you next week with the office hours, Monday video.

All right. Thank you.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

SQL Server Performance Office Hours Episode 39

SQL Server Performance Office Hours Episode 39



Questions:

  • What are your favourite alternatives to alerts for Page Life Expectancy? I know that I have memory issues, but alerting on high I/O waits seems silly.
  • You mentioned you often prefer select into and adding the index later for parallel inserts. Not having real control over the nullability is causing me issues with creating primary keys after select into. Is a unique clustered basically the same thing?
  • You look like you’re bulking up again. I thought you were dying. Are you on steroids? What is your routine?
  • What are your lest favorite things to see in a query plan?
  • I forgot to ask how much protein you ate with steroids

To ask your questions, head over here.

Video Summary

In this video, I dive into answering five user-submitted questions for free during a live office hours session. We tackle topics ranging from alternative methods to monitor page life expectancy when dealing with memory issues, to the pros and cons of using `SELECT INTO` for parallel inserts and adding indexes later. Additionally, we explore what I consider less favorable elements in query plans, such as eager index spools, unnecessary scans, and parameter values embedded directly into the query text without being reflected in the plan properties. The session was packed with valuable insights and practical advice, making it a great watch for anyone looking to improve their SQL Server skills. If you have any questions or topics you’d like me to cover next, feel free to drop them in the comments below!

Full Transcript

Erik Darling here with Darling Data and today we’re going to do office hours in which I answer five, 1, 2, 3, 4, 5, that’s this many fingers, user submitted questions for free, right, that’s a whole point I guess. Down in the video description if you would like to ask a question there’s a link down below where the fingers are pointing down below just look, look down, right, my links are down here pal. And you can also find links there if you if you would like to hire me for consulting, buy my training, become a member of the channel in a paid way to support my endeavors here. And of course, the usual, if you like this blah, blah, blah, like, subscribe, tell a friend, you know, that, that. This Seattle, I’ll be in November with Kendra Little teaching two days of the finest T-SQL pre-cons you have ever seen. I will have all sorts of new swag-er-ific items for you, T-shirts, stickers, training materials. So there is no reason for you not to show up, is there? No, you can’t, you can’t give a good re-, you can’t give me a good reason for that. So, let’s answer these questions, right, let’s do, let’s do this thing, cuz I guess I promised you I would. Uh, what are your favorite alternatives to alerts for page life expectancy? I know I have memory issues, but alerting on high IO weights seems silly. Well, if you have memory issues, there are certainly going to have high IO weights, so why is that silly. Um, alerts? I don’t know, like, I, I don’t, I am not in a situation where I receive alerts from servers, so I don’t think a lot about what I, what I would, uh, what, what, like, alerts I would send myself, uh, in, in the course of the day. Um, you know, for me, it’s, it’s more general monitoring. Um, you could look for spikes in page IO latch SH or EX weights. Those would be worth, uh, worth monitoring.

I think for me, uh, if I was sure that I had memory issues. Um, I also may want to alert on, um, when a significant amount of memory goes towards query memory grants, perhaps even when resource semaphore weights happen. Uh, those are things that I would rather alert on than PLE, cuz PLE is a stupid dump. All right. Uh, oh, I should have, I should have highlighted that question. There we go. Yes. Um, anyway. Um, these are usually how memory issues manifest themselves. So, uh, that’s, that is what I would keep an eye on. Uh, uh, uh, uh, here we go. You mentioned you often prefer select into and adding the index later for parallel inserts. Not having real control over the nullability is causing me issues with creating primary keys after select into.

Uh, is unique clustered index basically the same thing. Uh, so to, to hopefully remedy your first situation, if you wrap whatever column you wish to not be null, uh, in your select list in the is null function and give it an appropriate replacement, uh, your, the resulting column will, uh, have a not null, uh, thing to it. So use is null. You can’t use coalesce for it. Coalesce doesn’t work, but is null. Uh, if you wrap your column in that, the result will be a not null, uh, not nullable column. Um, is unique clustered index basically the same thing? Yes. Aside from the fact that a unique clustered index will allow one null value in there.

So, uh, depending on how you prefer to approach this, uh, you could add a unique clustered index. If you’re sure there are no nulls in there, or you could also wrap the column in, uh, is null and have a better time. Okay. Uh, come on, cursor. Where are you? Oh, you’re over there. Ah, there we are. All right. Uh, you look like you’re bulking up again. I thought you were dying. Are you on steroids? What is your routine?

Uh, all right. Well, um, I guess thanks for noticing. Uh, I, I, I, I have, I have been, uh, a little bit better about, well, actually I’ve always been good about the gym, but, um, um, I, I have been good about, um, I guess more, uh, progressive training. Um, I am not on steroids. Uh, I, if, if I were on steroids and I still looked like this, I would get my money back. Uh, my routine is generally, uh, basic barbell exercises, squats, deadlifts, bench press, overhead press, uh, depending on, uh, what I’ve done.

Uh, you know, deadlifts typically one set of five squats, either a three sets of three, three sets of five, uh, somewhere in there. Um, if, if I’m feeling real wiped out, I might do a lighter weight on squats with three sets of eight, uh, or three sets of 10, just to get some extra work in. If, if, uh, my legs are feeling particularly burned out, uh, overhead press, I do one day of, uh, 10 singles at a pretty heavy weight.

The last, the last one I had was, uh, 200. So, uh, for the 10 singles. And then, uh, I’ll also do a slightly lighter day on overhead press for, with three sets of five. Uh, also in the mix, I’ll, uh, usually make room for some rows, some, uh, chin-ups, uh, in there.

Uh, occasionally, uh, I guess RDLs. Those are, those are nice hamstring work. Uh, if you get them heavy enough. And, uh, yeah, that’s about it. I don’t, I don’t, I don’t do anything. Uh, I don’t have like a, a bodybuilding routine with lots of hypertrophy and whatnot mixed in.

Uh, to do. What are your less favorite things to see in a query plan? All right. Lest, lest we forget. Uh, I don’t know. Um, okay. Let’s think of some stuff in here. Uh, I hate when I see spools in a query plan, uh, especially eager index spools.

Um, uh, lazy table spools are also not, not something that I usually love seeing. Uh, I don’t like when I see really big lines going into a sort operator. That usually tells me we have, we have an issue.

Um, uh, let’s see. Uh, I dislike seeing a top above a scan. That’s another one that I don’t like to see. Uh, and I don’t like when I see, uh, the sort of constant scan, um, concatenate stuff, sort merge interval, and then a nested loops join.

Because that usually tells me that, um, we either have a join with an or clause or that we have, um, a mismatched data type. That quite frequently, if you have, like, let’s say, a date time, uh, uh, date time to column, and you have a date or a date time parameter, SQL Server has to do some extra work to, to work things out. So those are things that I typically dislike seeing.

Um, of course, you know, the usual stuff, um, unnecessary scans, right? Like you have a, you have like a scan where there’s a predicate on it that’s not a bitmap. You’re like, oh, maybe I should create an index there.

Sometimes you’ll get the helpful little green text, sometimes not. But, um, you know, those are things that I typically dislike seeing. If I have an actual execution plan, uh, that’s a little bit different because then I have operator times to sort of guide my tuning efforts in a, in a smarter way. Uh, estimated plans, you know, you can spot some red flags in those, but typically, um, you know, you’re like, eh, uh, it takes a little bit more effort to figure things out.

Um, table variables, uh, dislikes, uh, table variables and plans. One thing that I, uh, really dislike in, uh, seeing an execution plans is when like there are parameter values, uh, like as part of like the query text. But then you go and you look in the, the, like you get the properties of the root operator in the plan and there are no parameter values in there.

Cause that usually tells me, uh, unless there’s a recompile hint where the literal values get, where the parameter embedding optimization kicks in and you get literal values in the query plan. That usually tells me that someone is using local variables and I think, hmm, I shake the fist at you. Um, not necessarily because the local variables are causing a problem, but just because I can’t get the values for local variables.

Cause SQL Server does not sniff and use them, right? Like it does with a parameter. So those are things I dislike seeing. There’s probably others if I thought, uh, I forgot to ask how much protein you ate with steroids.

Uh, all right. So once again, I’m, I’m clearly not on steroids. Um, I, I, I, I, it’s not, not my thing.

Um, you know, life is hard enough without trend making me emotional. Uh, but as far as protein goes, um, I drink two protein shakes a day, uh, that are 50 grams of protein a piece, uh, unflavored Isopure, uh, powdered Pedialyte, some, uh, psyllium husk to, to keep things. Uh, orderly.

And, uh, I, I have the scoopable creatine. So I’ll do 10 grams of creatine a day and across the two shakes. Um, and then of course you must, you must eat real food. What lest you become a skeleton.

Uh, and I usually try to get around another hundred or so grams of protein from actual, uh, meat sources. So chicken, steak, stuff like that. Uh, not a big fan of pork.

Um, not a religious thing. Just, you know, it’s not, it’s not that it’s bad. It’s just, I, I like other stuff better. Uh, so I think, I think that about does it.

Interesting mix today. Thank you for watching. I hope you enjoyed yourselves. I hope you learned something and I will see you in the next video, uh, whenever that may be. Uh, I guess tomorrow.

All right. Thank you. Goodbye.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Why DISKANN Indexes Are So Slow To Create In SQL Server

Why DISKANN Indexes Are So Slow To Create In SQL Server


Video Summary

In this video, I delve into the intricate process of creating vector indexes in SQL Server, specifically focusing on the background tasks and query hints involved. By examining the detailed query plan generated during index creation, I uncover a series of unique trace flags, parallel plan preferences, and other hints that are not typically available to users. This exploration reveals how these elements contribute to the complexity—and current slowness—of vector index creation, while also hinting at potential optimizations for future versions of SQL Server. Through this analysis, viewers gain insight into the inner workings of SQL Server’s query optimizer and the advanced techniques used during index builds that could impact performance and usability in various scenarios.

Full Transcript

Erik Darling here with Darling Data. And in today’s video, we are going to endeavor to answer an important question. Hopefully a temporarily important question. But why are disk ANN or disk ANN indexes so slow to build? These are, of course, the new vector search indexes that Microsoft has given us in SQL Server 2025. And of course, the usual spiel does apply here. This is, this is all pertinent to RC1. So release candidate one of SQL Server 2025. I believe this will be the last one until we get an RTM since in about a month, there will both Microsoft Ignite and past data summit will be going on. And usually that’s about when an RTM build drops. So things could change by then, but at least for the next three or so weeks, this is what happens in the background. Before we get into that, down in the video description, all sorts of helpful links. You can hire me for consulting by my training, support this channel. You can also ask me office hours questions. That is totally free. I answer five of those every Monday. And if you enjoy this content, you should probably like subscribe and tell a friend. May not want to pass this one on to anyone who works at Microsoft for now. Speaking of past data community summit, like I said, yeah, that’s got, that’s happening in beautiful Seattle, lovely this time of year. November 17th to 21st, I will be delivering two days of T-SQL pre-cons with Ms. Kendra Little. They are going to be the best T-SQL pre-cons that have ever taken place in all of human existence. So you have that to look forward to. Anyway, let’s go.

Look at this. Now I am, I am officially in two SSMS spawns territory with how busy I am with things, but that’s a, that’s a good thing, right? Busy is good. So I want to go over a couple of things and then show you the real, the real spicy stuff. So I am using, so start things way back. My, my dear friend, Joe Sack, my bosom buddy, Joe Sack has been doing a lot of work. And the, the AI, the AI, the AI space with things. He’s a, he’s a, he’s a bit more keen on it than I am. But, but he’s been messing around with, with the vector stuff in SQL Server. And so all credit to this database existing goes to Joe Sack. I’ll, I’ll have a link both to his, uh, substack where he, where he’s talking about all this interesting stuff. And, um, the GitHub repo where he keeps this database. If you want to play along, play a lot, play around or play along or whatever you want to do, whatever form of play you’re into. I’m not sure.

Uh, but Joe had a post, uh, actually yesterday, uh, was it? Well, I mean, yes, yesterday is, as I’m recording this, but you know, temporal elements are all, all screwy. When we start publishing things online, uh, about a query that, about like a top something query with vector search. It wasn’t slow. It was just, you had to ask for a much bigger top than you thought, um, in order to get back results.

Uh, but anyway, uh, I saw the query and I was like, wow, that’s messed up. Like there’s gotta be a way to fix that. And, um, it turns out there, there, there wasn’t cause of the way the vector search thing works, but, uh, it all sort of led to this. So, um, what I’m going to do is I’m going to, uh, so Joe has a table in his database called properties.

Uh, and I forgot an S there, didn’t I? There we go. All good. Brand new script, you know, not even saved with the file name yet. So we’re all, you know, a little, a little dicey there, but, uh, I’m just going to create a 20,000 row table based on Joe’s table that has stuff in it.

Right. So let’s drop and do that. Right. So 20,000 rows. And, um, now what I’m going to do is create the vector index. And one thing I want to point out here that’s sort of interesting is I have, uh, actual execution plans enabled.

So when I run these two queries, I of course get it in a, uh, a query plan for the, uh, creating the table and selecting data into the empty table, whatever, whatever select into really does. And then, uh, adding the, um, primary key to it. Right. So the first two things that I do there, cause you need a primary key if you want to create a vector index.

So now I’m going to create the vector index and this, keep in mind, this is 20,000 rows. And while this is not like the biggest beefiest VM that’s ever existed, it’s about 10 gigs of memory, the four cores, something like that. This is 20,000 rows. Right.

And then we’re waiting and we’re at 10. I mean, if you, if you look, uh, sorry, this elbow, you can see like 14, 15, 16, 20, 20,000 rows is pretty slow. So, and I did get to wondering why 20,000 rows was so slow and I’m going to show you in a moment what’s going on while a vector index builds.

But, uh, what I wanted to show you before that are maybe some other things that, um, you, you, you maybe won’t be so thrilled about and maybe something you will be so thrilled about when with vector indexes. So this all took about 30 seconds for 20,000 rows, which is not, not a good time, uh, to be honest with you. But of course, RC1, all this stuff gets better changes, you know, just currently things not so hot.

Um, so just like columnstore indexes, uh, when they first came out, adding a vector index to your table, uh, does not just make the, the column that’s indexed read only. It makes the entire table read only. So if like, if I try to update a column in the table, it’s called bedrooms, right? And I remember I created the index on this description vector column and, uh, I guess SSMS still hasn’t quite worked out all the like new stuff going on in there.

But, um, at least SSMS 21 has it, maybe SSMS 22 has, I should give that a shot someday. It makes the entire table read only, right? So data modification statement failed because tiny properties has a vector index on it. So you like the entire table is read only like you can, it’s not just like the vector index column, which is, I don’t know, maybe not great if you have a transactional system.

Uh, but two nice things that I do want to say about vector indexes is that they do not support rebuild or reorganize. So you cannot rebuild or reorganize a vector index. So I suppose Ola will have to get to work making sure to exclude vector indexes, at least as they currently live.

Um, but anyway, what I really wanted to show you today, what the, the real meat and potatoes stuff is what’s going on in the background while a vector index builds. Okay. So we’re going to start creating this and we’re going to run a little query over here that shows us what’s happening while a vector index creates.

Um, SP who is active strangely did not pick up on this, but, um, you know, whatever. So, uh, we get a query plan and some query text back. The query plan actually shows us a query plan for what’s going on.

Like when I created the thing over there, there was no query plan when I hit, when I said, Hey, like make this index. And all of a sudden the query plan was missing. So this is like the query plan.

And if we zoom around, we’ll see, uh, you know, uh, there’s tiny properties and here’s the vector index seek. We’re doing something over there and well, there’s tiny properties again. And then if we scroll over this way a little bit, we will have a clustered index seek into this graph edge table, 120, something or other.

And apparently we’re, uh, we’re missing some statistics on this table. I don’t know, Microsoft, you ought to create some statistics on that one. I think, but, uh, anyway, the query that’s running while all this happens is not a typical create index query.

No, no, no. It is this whole wacky wild adventure. So while that index creates, right.

I’m just going to format this stuff a little bit. So it fits on screen a little bit better. Um, these are some, I guess, parameters that get passed into whatever we’re doing. Um, we declare dop, right, which is an integer and equals four.

Um, I think that might be, um, I think that might be related to my system, particularly since I have four CPUs in there. And then we start running this query. So we have a, if random start ID equals zero, we do this, uh, and we get some stuff in here.

And this, this, to me, this is interesting. So the, the query that’s running in the background uses the enable parallel plan preference hint. This, this hint, uh, has up to now been completely unsupported, almost disavowed by Microsoft.

This does not show up in the valid use hints DMV. So maybe this is a sign that this hint is now safe for all of us to use in production. Right.

That’s exciting stuff, right? That’s big news there. Uh, and then we, uh, well, let’s see, we select a top one thing here that Microsoft might need to get itself a copy of SQL prompt because some of this formatting is an atrocity. Uh, I can, I can, I’m willing to share my license with you, Microsoft, if you’re, if you’re really feeling, I know there’s been a lot of layoffs lately.

So if you’re really feeling a budgetary pinch, we can, we can talk later. You just let me know. Uh, so we get the property ID and we’ve got a table sample here with repeatable one.

Hmm. All right. And then, uh, this is where I guess things start getting interesting. This is that graph edge table. Uh, I guess at least one of them that we saw in the query plan.

And we, uh, insert some stuff into there, right? Uh, we select some properties and, ooh, cast that as binary 192. Very exciting stuff.

And then we do this row per page and oh boy, we are, we are using the fizz lock in here. My goodness. Huh? There we are. We are fizz locking things. That’s, that’s a fun one.

All right. Uh, and then, uh, apparently down here. So like the, the query plan we saw was most likely the stuff that’s coming up down here because there is a bit of a loop or something that goes on, I think, or at least, uh, you know, in RC zero, there was a loop.

I forget what they do in this one. Maybe. Oh yeah. There’s a batch start and batch end. Anyway, uh, there is some interesting stuff that goes on in here. Um, uh, what was it? Uh, did it, did, uh, maybe it’s down here.

Uh, there’s a, uh, there’s a function that we can’t use is disc and prune, right? So this has got a whole within group thing. So this is some kind of like probably internal windowing function that, uh, that, that Microsoft has access to from like a system thing that we don’t have access to.

I haven’t tried any of that yet. Uh, and then down here we have a query with a bunch of hints on it. We go to merge union, very exciting stuff. We have our parallel plan preference hint again.

I don’t know. This is feeling very production ready to me now. And then we’ve got some trace flags. Now, some of these trace flags I am familiar with. So there’s stuff for like, uh, nested loops batch sort, disabling nested loops batch sort. There’s stuff for disabling spools.

There’s stuff for disabling nested loops pre fetch. Yeah. But this one’s 7454. I don’t, I don’t know what that one does. Be honest with you. I just don’t. Um, I don’t know every, every single trace flag, but then we exec sp vector index build update stats.

We update stats, uh, after doing that. And then, uh, we have a batch start where we select the max property ID and some batch size stuff in here, right? All good things.

Uh, and then, uh, I don’t know the sort of some of this stuff, uh, more batch size stuff. And then where was, there was something else. There was something else in here that was very interesting to me.

Um, uh, where was it? It’s not this thing. I mean, this is sort of the index creation stuff that goes on in the background, right? So like when you create a vector index, you tell SQL Server, whether you want it to be, uh, cosine or one of the other mathy things that they have, but then there are some options that you, at least currently, I don’t see a way to specify them, but, uh, like stuff like this, that shows up in the sys dot, uh, whatever vector indexes DMV there is, but it doesn’t show up as like a creation element right now.

But then actually this, this one was, uh, was interesting as well. So use hint, uh, disallow batch mode. Again, we have enable parallel plan preference, but then there were, there were some other funny ones and I don’t, I, I doubt these are going to be, um, available to us mere mortals, uh, out in SQL Server world, but parallel DML, so we can disable batch mode, enable parallel plan, enable parallel plan preference, parallel, enable parallel DML for vector index build and parallel DML using independent transactions.

So this is the, these ones I have not seen before. I don’t, I doubt these are valid use sense for us, but, uh, I might go and try those. And then we have, uh, the same query trace on stuff that we saw before. So I’m not going to explain that one again, but then we are optimizing for, uh, certain values for the batch start and end.

And, uh, then there was some other interesting stuff in here, uh, like this pack int function. Do we get to use that? I don’t know.

I, I, I, I, I’m, I’m just seeing all this. So, uh, I, I, I want to figure that out too. And, uh, good Lord. Come on, put cross apply in a new line. All right.

So then, uh, well, there’s unpack, there’s unpack int. So we get to pack int, pack an int and then unpack an int. That’s a fun, that’s a fun pairing of things. And down here there was a nut. Well, there was disk and prune, but then there was, oh, where was it?

Uh, I don’t know. This, this stuff, this stuff all gets rather difficult to read. But then, uh, you know, when, when people tell you to be careful with query hints, my Lord, Microsoft. My Lord.

Uh, we have some of the same things that we saw before with the, um, uh, disabling batch mode, enabling parallel plan preference. Uh, I apparently deleted something there that I shouldn’t have. There we go.

Line that back up. Uh, and then the parallel DML stuff that we saw before. And, and here’s a, here’s a fun one. Uh, Microsoft turns off eager index spools in this query. They’re like no eager index pools for you.

Select. So with sell to index on the fly, uh, that is the, that is the optimizer rule that would build an eager index pool in your, in your query plan. So Microsoft apparently anti eager index pool as well. So high five Microsoft.

We share, we share, we share a mutual loathing for the eager index pool. And then at the end, of course, we, uh, we, uh, we, uh, we sp vector index build update stats again.

So right now I would imagine that like, since this, this is all like very loopy stuff going on or like, um, I imagine that this, this kind of thing is why, um, building vector or disc and indexes is so slow currently.

Um, I would also imagine that this is why they are also currently, this also currently makes your whole table read only. Cause if you were to like run a command that changed stuff in the table and you had to run all this stuff over again to like figure out, like, like, like rebuild your vector index, um, or like update your vector index, like during like regular transactional processing, that would be kind of a nightmare.

So I understand why we are not, we do not currently have, um, writeable, uh, disc and indexes. Cause that wouldn’t be fun, but, uh, I really do hope that this stuff gets sorted out. Um, if not by RTM in three weeks, no pressure guys, uh, then hopefully in the near future, because, um, you know, uh, it would, it would be a terribly uncompetitive thing for a SQL Server to not have, uh, writeable, uh, vector indexes.

Um, and like, again, trade-offs and stuff, right? Like Oracle has writeable vector indexes. Postgres has them.

Elastic has them. I forget Mungo, but like lots of other database engines have writeable indexes. I don’t know what trade-offs they made in order to accomplish that. Perhaps Microsoft, there were, there were certain trade-offs Microsoft was not willing to make, uh, in doing that, or perhaps there are other limitations involved that, um, we, we, we, who do not have the source code are unaware of.

So anyway, if you, if you, if you’ve heard people talk about why, or talk about, uh, disc and indexes being slow to create, uh, hopefully this answers your question as to why they are slow to create.

Um, I think it’s, it might be worth, uh, looking at this. So this is the full properties table or a full or properties table. I, I, uh, there were some problems earlier.

Uh, I don’t want to talk about that, but, um, there are 400,000 rows in the pro, like I just reinserted some extra ones. The one that you download from Joel only have 100,000 in it.

So you have to beef the table up a little bit. But, um, when you create, when I tried to create a vector index on the full properties table, the, um, the query looked a little bit different. Um, I think it, I think it was doing more to batch it, but, um, it might not be terribly interesting to go find more of that in here at this point.

But I don’t know that, that looks kind of the same to me. But anyway, uh, at least now we know we have some answers as to, uh, why building the disc and index is not terribly fast as of RC1.

Okay. As of RC1 disclaimers and all that. Uh, thank you for watching. I hope you enjoyed yourselves. I hope you learned something and I will see you in the next video where we will, um, I don’t know yet, but actually I don’t, I don’t have anything else planned.

So it’ll, it’ll be a surprise to you and me. All right. Thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

A Weird Thing With NOEXPAND

A Weird Thing With NOEXPAND


Video Summary

In this video, I share a humorous and intriguing experience I encountered while writing training material on index views and the NOV expand hint in SQL Server. I discuss how using the NOV expand hint can sometimes prevent parallel execution plans when you might expect it to allow them, highlighting an interesting quirk of the optimizer’s behavior. This video aims to provide insights into why this happens and explore potential reasons behind it, while also emphasizing that the use of index views and the NOV expand hint remains valuable for many scenarios. If you’re interested in diving deeper into SQL Server optimization or need help with your database performance, consider supporting my work through donations or by subscribing to my channel to stay updated on future content.

Full Transcript

Erik Darling here with Darling Data. And today’s video is a little funny thing that I ran into. I was writing some training material up and I ran into a funny thing was talking about index views and the NOV expand hint. Now, there are many good reasons why you may want to use a NOV expand hint when dealing with index views, particularly in standard edition. You know, like perhaps because you don’t want SQL Server’s optimizer to attempt to expand the index view to the underlying definition. You know, maybe have worse performance. You might also want statistics automatically created on your index view. Good reasons to use the NOV expand hint, but I ran into something funny with it where it was like when I used the NOV expand hint, I was not getting the parallel plan that I wanted. But when I dropped the NOV expand hint, all of a sudden, happy, fun, parallel query. So we’re going to talk about that today. Down in the video description, you will find all sorts of links that are helpful to both of us, right? It’s a symbiotic relationship. You can give me money. I can make your SQL Server faster or you can give me money and you can receive training from me or you can give me money and also continue to receive YouTube content, which I mean, you know, I don’t get paid much by the hour for the YouTube content, to be honest with you. So, you know, that’d be nice. But, you know, apparently I’m like 10 million subscribers away from that being lucrative. So we’ll just have to we’ll just have to punch harder. Get that and get those numbers.

You can also ask me office hours questions for free. I answer five of those every Monday. And of course, if you if this if this content is up your alley, you may you should like and and subscribe, of course. And if you know someone whose alley this might also be up, you should tell them to to watch it and hopefully like and subscribe as well, because how else am I going to hit 10 million YouTube members and then I can finally stop doing all the other stuff and just I can just stream. All day long. Wouldn’t that be great. This November, I will be in Seattle with Ms. Kendra Little delivering handle hand delivering two days of the best T SQL pre conning pre connery you will ever see in your life. So you should go there. You should come to Seattle on the appropriate date and come to the pre con and get some stuff from me and Kendra be great for everyone, right? We all win. We all continue to win.

We’ll win. We’ll win. We’ll win. We’ll win and triple security in here in the next August 1815 in this year is your exp dativeness of the past three years long. Yellow, you’ll win. It’s not very simple, it’s not great for you.

of the store procedure is to allow me to inject hints and options easily without having to rerun the query that hits the index view. And I’ll show you why here. So let’s first run this store procedure with the no expand hint. Thank you, SQL prompt for making yourself known.

And let’s look at the messages tab. And in the messages tab, we’re going to see the query that hits the index view. This is why I don’t want to have to keep highlighting the whole thing. This is much more compact. But you’ll see that I have the no expand hint supplied here, just like I have in the hints parameter up here. And the execution plan for this irks me. Why?

Well, it’s a serial plan. And I do not want a serial plan. I want a parallel plan. The problem is, in this case, the no expand hint does something kind of funny. It doesn’t prevent parallelism in all cases, of course, right? That’s not my point. I’ve certainly seen the parallel queries with the no expand hint applied to them. That’s not what I’m getting at here. This is just a curious case, right? So like, so just like a weird thing. Now, if I rerun the store procedure, right?

We execute this and we say no hints and no options. You’ll notice that that was a little bit faster. And of course, we got a parallel plan. Because of that, if you go over to the messages tab, and we scroll down to where the table gets hit, we now have a blank spot where there was once a, there was once a hint supplied in there. So like I said, the no expand hint does not prevent parallelism. If I run the query with the no expand hint, and I also use the use hint enable parallel plan preference, well, SQL Server comes up with a actually, let me show you one thing first.

If I run this, oh, wait, let’s come back to the original one. This isn’t this is sort of a funny thing about it. If I run this, when we get the serial execution plan, it has a cost of 22.6722 query bucks, right? So the serial plan was not expensive enough, right? The estimated cost of the plan was did not break my cost threshold for parallelism setting, right? Which is weird, because when we run it without the no expand hint, we get a parallel plan, right? So it’s like, like, why would the no expand hint end up here, right? Like, like, did SQL Server just not like, go find some more stuff? It’s a good question. We’re going to answer that. Now, if we look at the cost of the parallel plan that we get without the no expand hint, right? This, of course, has a cost of 9.3 something query bucks. So SQL Server did find a cheaper parallel plan, right? So at some point, without the no expand hint, the optimizer did some more stuff. What more stuff did it do?

Maybe, maybe we’ll be able to answer that. But what SQL Server does is it went out and it found maybe a more expensive serial plan and then found the cheaper parallel plan and said, oh, I’m going to go with the cheaper parallel plan. So there is the exploration space there. For some reason, we just don’t hit it when we use no expand. Now, like I said, it is possible to get a parallel plan with no expand hints. I’ve seen plenty of times where it happened completely naturally, but in this case, we don’t get it. So I’m going to go here and I’m going to run it with no expand and the use hint enable parallel plan preference so that we get a parallel plan even with no expand there. And you’ll see that it is possible. It does happen for us. Sort of curious. And this is, this is the very curious thing. If we look at the cost of the, the, um, the parallel plan with that hint supplied, it is 5.2 query bucks. So that’s even cheaper than when it found the parallel plan naturally without the no expand hint. So what happened in there? Well, I was able to get some answers, not full answers. Um, if we run, so I’m going to show off some, some rocket science-y trace flags here. Uh, the first one is 8675, which will show us how many tasks the optimizer applied to a query while it was optimizing it. So this is without the no expand hint and without the no expand hint, the, if you look at the, right, we get the, get the parallel plan. Uh, we go over to the messages tab.

This is where SQL Server will show us the stuff that it did, right? So the number of tasks, which if we scroll down a little bit, we’ll eventually get to do, do, do, do. So at the end of, actually that was important. The end of search one, we had a cost of 136.73 query bucks, right? So there we go, right? That was the cost of the serial plan. And then if we keep scrolling down, we’ll have more tasks. Look at 7, 7,200 tasks. And if we keep scrolling down, we’ll have this end exploration, right? And this is where SQL Server found the parallel plan that costs 9.3, whatever query bucks, right? So that keep that number in mind, 7,239. Okay. Now, if we run this with the no expand hint and those same trace flags, um, what we’ll see is something a little bit different, right? So we get the serial plan again. If we scroll through messages, we’ll have do, do, do, do, do, do, do, 1900 end of search one cost 2,200 query bucks. So SQL Server and did fewer tasks here, right? The other one had like 7,200. This is only 1,900 that the end of search one here, remember there were two end of search ones. So SQL Server ended search one on the last one with a cost of 130 something query bucks. This one ended search one with a cost of 22 query bucks. So this is why like we just didn’t go into that additional exploration space for some reason with the no expand hint applied. There were a couple other trace flags that I tried, or rather there was one other trace flag that I did try to look through a bit to try to figure out where things were going weird in here. But the output of 8615 is not very easy to manage. You’ll notice that the scroll bar over here is, and that’s a quite a small little rectangle scrolly thing. And there is quite a bit of information in here to get through that. I just couldn’t like, I couldn’t make enough sense of this to figure out if there were like where the, um, actual change and stuff was to like find where parallelism was, uh, where either were like the part, like the plan wasn’t where the same space wasn’t explored rather. So a little bit of just an oddity thing there and a little bit of how I looked into it further. Um, that’s about it. Uh, it’s not, not, not telling you not to use index views and not telling you not to use the no expand hint, but if you ever run into this situation, maybe, maybe you’ll, maybe this is the same reason why. Um, I don’t know, I don’t know if no expand, um, in general limits the search space more than not using the no expand hint. Maybe it does. I don’t, I don’t know. I don’t have that much insight into what the, the no expand hints interactions with the optimizer, but it might be something to, it might be something interesting to delve further into, uh, if I, if time, as time allows. But anyway, thank you for watching. I hope you enjoyed yourselves.

I hope you learned something and I will see you in tomorrow’s video. I believe we’ll talk something about query correctness or something like that. I don’t know, whatever. Anyway, thank you.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

A Bug In Batch Mode Adaptive Join Plan Timings

A Bug In Batch Mode Adaptive Join Plan Timings


Video Summary

In this video, I delve into a fascinating issue related to adaptive join plans and operator times in SQL Server. Erik Darling from Darling Data highlights discrepancies between the actual execution time of queries and what is displayed in the query plan when using adaptive joins. Specifically, he demonstrates how the time spent on certain operators is not accurately reflected, leading to confusion and potential frustration for database administrators and developers. To address this issue, I’ve opened a user voice item urging Microsoft to rectify these inaccuracies. If you’re interested in supporting this effort or learning more about SQL Server performance tuning, be sure to check out the helpful links provided in the video description, including options to hire me for consulting, buy my training materials, and become a paid member of the channel.

Full Transcript

Erik Darling here with Darling Data. And this is just a quick video to promote a user voice item that I opened up about, what do you call it there? Bugs with operator times and adaptive join plans. That’s, I guess what that says, isn’t it? Anyway, down in the video description, all sorts of helpful links. You can hire me, you can buy my training, you can become a supporting paid member of the channel. You can ask me questions for free for my Office Hours episodes, and of course, as always, please do if you enjoy this content, even in the most modest increments, like subscribe, tell a friend, and all that stuff. I still am speaking, as far as I know, hopefully, at Past Data Community Summit coming up in Seattle, November 17th to 21st, where Ms. Kendra Little and I will be delivering the two most fantastic days of T-SQL pre-cons that you have ever seen in your entire life. So, hope to see you there. And as a, just a little teaser note on that, I sent the order for my new line of promotional T-shirts out to the printers yesterday. So, I will have brand new, very cool, very funny T-shirts to give to attendees of the pre-cons. So, we all have that to look forward to. Anyway, oh, I clicked one too many times. Let’s take a look at a couple things.

So, these are the two queries that I’m going to start with, right? So, one of them, they’re both the same query. The only difference is one of them has this use hint, disable batch mode adaptive joins thing. If we look at the operator times for these two queries, we’ll see that they both take, I mean, close enough to the same amount of time. 38.7 and 38.57. So, like 500 milliseconds difference, but that’s not really the issue.

The thing is that in the second plan, you’ll notice that the hash match inner join, like, so this is a batch mode plan, right? Mostly. We didn’t get a batch mode adaptive join, but we did get a batch mode hash join, which I’ll show you in a second. But you’ll notice that this thing correctly accounts for about 30 seconds of time spent in the batch mode hash join, right?

If we hover over this thing, you’ll see that it is batch mode. And because it’s batch mode, that means that it is only accounting for the time spent in that operator. And if we sort of trundle along through the query plan a little bit further, like we have, you know, the milliseconds that we spent reading from the tables, the 29 seconds in here, about two seconds here, about seven and a half seconds here.

So, it all pretty much adds up close enough to 38.574 once you account for, like, the little, like, eight milliseconds and stuff in other places. And the adaptive join plan, that’s a lot different, right? Because if we look over here, we have 400 milliseconds and 700 milliseconds.

And, of course, because this is the unexplored branch of the adaptive join, zero milliseconds, right? We have zero rows that came out of this thing. All of the row action came out of this part of the plan where we hit the comments table, not this one.

But the adaptive join says that it only ran for about 900 milliseconds, which is a little bit funny. Because if we look at the rest of the operator times, they line up pretty closely from one plan to another. It’s 1.6 versus 1.9, 7.2 versus 7.5, and then seven milliseconds versus eight seconds.

So, the time in this plan is completely unaccounted for. Now, of course, I checked all sorts of things like latches and spin locks, thinking, like, I must be crazy because, you know, Microsoft would never mess up this badly. But the other thing that I looked at is, sorry, I’m holding back a sneeze at the moment.

We might have to break recording or something. I don’t know. But if you look at the wait stats, the top one for both of them is CXSync port, and it’s got just about the same amount of time spent in both of the plans. There’s really nothing else interesting for other waits in these plans that would account for, like, the amount of time spent in, like, nothing else that says, hey, 38 seconds, right?

So, a while back, so the reason I’m bringing this up is because both of these plans end with a batch mode sort, right? Batch mode sort here and batch mode sort here, right? The reason I bring this up is because a while back, I had posted something, and Paul White wrote an article called, what’s it called up here?

Impossible Execution Timings, where he actually referenced me by name, which was, you know, of course, an emotional moment in a young man’s life. But in the article, Paul goes into detail about the batch mode sort and the threads and all the other stuff that happens in there. But he also brought up some workarounds.

And the reason why I bring this up is because I also tried the workarounds, and I’m going to show you those in a moment. So, we had one which was putting data into a temporary table, right? So, like, you know, this stuff here. The other, which was an eager index spool, right?

And you get that by doing this top spid minus spid plus 100% query. And the other was reducing parallelism, so max.2. And then row mode sort, so using trace flag 9347 to get a row mode sort instead of a batch mode sort.

So, and this isn’t to say that Paul was wrong about any of this, because when is Paul ever wrong? This is just to say, I tried this stuff. This isn’t what’s happening, okay?

So, this is just to say, I’m not crazy. Paul’s still right. I’m just not, I’m not the one with the problem, right? So, if we come back over to Management Studio, we’re going to see all of the things that I mentioned in Paul’s post that I have here. So, one of them is putting stuff into a temp table.

One of them is saying top spid minus spid plus 100%. The other one is running the query at max.2. And then finally using trace flag 9347 to get rid of the batch mode sort. So, for all of these query plans, you’ll notice that there’s still 30 some odd seconds in each of them, right?

Some of these, like, you’re going to see the operator time show up differently because of the different plans that we got. But none of them that use an adaptive join will have the time correctly assigned to them. So, like, the first query where we dump data into a temp table, at least I think that’s what we did.

Oh, come on, zoom it. You’ll still see about 30 seconds represented in that regular hash join, right? So, SQL Server didn’t choose an adaptive join for this one.

So, this one actually does still look okay. For the next one down, this is that funny top spid minus spid plus 100% query. And, you know, the operator time does show up earlier because we have the gather streams here, right?

So, this is where the batch mode stuff pretty much ends. None of the parallel exchanges support batch mode. So, this is where we go into row mode.

So, we see, like, 40 seconds in here, but the adaptive join still shows the wrong amount of time here, 938 milliseconds, which, strangely, is exactly what the one over here showed, too. Or, actually, am I on crack? Yeah, I’m on a little bit of crack.

Okay, 894. All right, never mind. Not exactly. Close, though. If we scroll down, this is the plan where we reduced max dot to 2. And this runs for a lot longer because we have a much lower dot, right?

The other query ran at dot 8. This is at dot 2. But we still don’t see the correct operator time show up really across here. The adaptive join shows a little more time because it ran at a lower dot, but it’s only 2.5 seconds, right?

So, this thing ran for two full minutes. We see 22 seconds spent here. We see 5 seconds spent here.

We see 2.5 seconds spent here. But we don’t see anything that fully accounts for the two minutes anywhere, right? We can pretty much guess that it’s going to be happening at this point. And if we scroll down, finally, we’ll have one more query that has an adaptive join.

This was the one that disabled the batch mode sort. So, the sort at the end is in row mode, right? It is in row mode.

This did happen in row mode. This is not a batch mode sort. But still, the adaptive join shows about 940 or so milliseconds of time. And the full query runs for just about the same 38, 39 seconds. So, I think this is a bug.

And because I think this is a bug, I have opened up a user voice item to say, Hey, Microsoft, I think something is wrong here. That link to that will be down in the video description. So, on top of all the other helpful links in there that we talked about before, you also have that link to click on.

And you can upvote this item and hopefully get this fixed. Because if, like, you know, I do a lot of query tuning work. And so, to me, when I said this, I said, this smells fishy. And dug into it.

But if you were doing this kind of work, you might just, like, give up and, like, run away. Like, I don’t know. Just, like, not be so psyched on things. Anyway, go vote for it. This is a problem.

It should be fixed. Thank you for watching. I hope you enjoyed yourselves. I hope you learned something. And I will see you in tomorrow’s video. All right. I think that’s good here. Yeah?

All right. Goodbye.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

A Little About Default Constraint UDFs in SQL Server

A Little About Default Constraint UDFs in SQL Server


Video Summary

In this video, I delve into the fascinating world of default constraint user-defined functions (UDFs) in SQL Server, exploring how they behave differently from other UDFs when used as default constraints. I walk through creating and testing these functions to understand their performance implications, especially in scenarios where parallelism is involved. By examining both a simple processing function and a more complex badge check function, I highlight the nuances of how SQL Server handles these UDFs under various conditions. Whether you’re looking for insights into default constraints or just curious about the intricacies of T-SQL functions, this video offers plenty to explore and ponder.

Full Transcript

All right, Erik Darling here with Darling Data, your pal, your bosom buddy, your friend till the end. And today’s video, we’re going to talk about, there we go, thank you, PowerPoint, default constraint UDFs. Because I got an office hours question about these at some point in recent memory, and I was not prepared for the level of sickness that exists in this world.

So, I had to look at this more, and I did, and I regret every moment of it. So, if you look down in the video description, you’ll see all sorts of helpful links, where you can hire me for consulting, buy my training, become a paying, supporting member of the channel, like PBS or something. You can, for free, ask me office hours questions, and if you like this content, please do like, subscribe, and tell a friend, because, well, what’s the point of making these if no one watches them, I guess?

Good question, right? So, I’ve mentioned before, I do have a few announcements that I will be making, hopefully in short order, about upcoming speaking events. But the only one that is publicly available at the moment is Past Data Community Summit, taking place in Seattle, November 17th to 21st, where I have two days of the best T-SQL material with Kendra Little that you will ever see in your life.

And, you know, you should just be there, right? So, that’s fun. Anyway, it is still Rocktober, so we are going to continue to be spooky.

Anyway, we’re going to look at two different kinds of functions here today. We have this first function, which doesn’t do anything. We’re going to call this loosely processing in memory.

I have another one that performs data access, which is a little bit more interesting. But we’ve got this function here called serializer. It takes an input here and then spits that input back out here. Well, very simple stuff.

And what we’re going to look at is, you know, kind of how that works. Now, when I first started looking at this, I was like, man, like this could get real bad, right? Because like, you know, like a lot of the UDFs I have in the Stack Overflow database are ones that like, you know, you put in like a user ID or a post ID and you go get information about that user ID or that post ID, right?

So, I was like, man, if I can pass a column to a default constraint UDF, we’re in big trouble. Thankfully, we can’t do that, right? Thankfully, Microsoft was like, nah, right?

So, we’re not allowed to put a column in there. We can only put, let’s see, expressions, constants, constant expressions, and in some contexts, variables. But for us, we’re going to not do most of that stuff.

We’re going to, whatever. Anyway, that is not a name resolution thing. So, like if I create the table like this without anything, and then I try to add that constraint, right?

If I say, hey, why don’t we make that thing in here? SQL Server is still like, nah. All right.

So, same error there. Good. So, like that’s actually a load off my mind immediately. So, the best we can put in there is a literal value. So, if we just say the number one, which means this is going to just spit the number one out no matter what, and we create the table, we are now successfully do that. We’re all set, right?

SQL Server is happy. We’re happy. So, on and so forth. Anyway, if we, like, we have a default constraint, right? And obviously the column is not nullable.

So, we can’t like put a null in there. But, if we look at this and we actually supply the column with a thing to put in there, what happens is kind of neat. SQL Server actually doesn’t deal with the UDF.

Now, this will only be obvious when we look at the other plan, but usually when there’s a UDF, let’s say like in a query, there would be like a non-inlineable UDF for a UDF and a computed column or check constraint, and now default constraint, holy cow. We would get like a non-parallel plan warning somewhere in here. We don’t have that here, but we will in a minute.

So, when you supply a value for the column, SQL Server does not run the UDF because it doesn’t have to produce a value, right? So, if we look at sys.dmExec function stats, a brand new DMV in SQL Server 2016. Wow.

We will see that there are no executions of our UDF in there. But if we rely on the default constraint, so we’re just inserting into this sumDate column. We’re not inserting into the column directly that has the default constraint UDF in it.

If we do this, we’re going to have something slightly different happen. We’re looking at the query plan. Now, we have this tsql user-defined function is not parallelizable thing in the query plan.

Telling SQL is telling us we cannot generate a parallel execution plan. Not that we need it to insert one getDateRow, but you see my point, right? Like we can tell the UDF was invoked there.

And if we look in our wonderful brand new 2016 DMV, we will see that we have one execution of the UDF. Okay? All well and good.

Now, the same pattern is going to follow if we say insert a bunch of rows, right? So let’s put a hundred rows into the table. And let’s look in here, right?

So since we supplied a value for bad default, we’re not going to have any additional executions of the UDF after doing the insert, right? So even though we put a hundred rows into the table, we had a value for every row. So SQL Server was like, we can bypass invoking the UDF for those rows.

If we put more rows in, but we rely on the default constraint, right? We’re not, we don’t have that bad default column in here. So we’re relying on SQL Server to do its thing.

And we run this, we put a hundred rows into the table. But now we look at this. We only have two executions, right? So SQL Server really only invoked that UDF once. Since we’re supplying a literal value to that thing and the value that it produces is, you know, like always like a static value almost.

SQL Server is like, well, I just need to figure out what this is once and I can do all the inserts with that. Right? So we get, even though we put a hundred rows into the table again, right?

We did that. SQL Server only, only executed the function once when we didn’t reference it in the insert list and supply of value. Now, normally, if there’s a computed column in a, sorry, if there’s a UDF in the computed column or a check constraint, there are some weird, there’s weird stuff that happens around parallelism.

For a computed column UDF, it’s like, unless the column is persisted and used trace flag 174, you’ll get a fully, fully serial execution plan no matter what. If it’s in a check constraint, you get a plan that’s forced parallel if you reference the column that has the check constraint, right? So for these, it’s kind of funny because all of these queries, whether I reference the whole table or just the bad default column in any way, SQL Server is allowed to use parallel plans for all of these.

All of these plans have all the visible indicators of parallelism, whether it’s parallel exchanges happening in here like these, or whether it’s the little, you know, icons that say, hey, you went really fast, right? So all of these queries are allowed to go parallel despite the UDF default constraint. Now, if we run this and look just to make sure, we still only have the two executions of the UDF from the inserts that we did.

Good stuff. All right. Let’s look at a different type of function. This one is called badge check.

And what this one does is a little bit different. And this was actually kind of tricky to come up with because like I said, a lot of the, a lot of the UDFs that I have in Stack Overflow, except something like a user ID or a post ID. And then they go out and find some information about that user or that post.

So it was a little hard to figure out something that I might be inserting into a table that was something I could go out and look and like, like also insert like a value that exists in the Stack Overflow database as the same thing. So what this thing does is like you supply a badge name to the UDF. SQL Server goes and figures out if that badge exists in the badges table.

And if it exists, then you put the badge name in. If it doesn’t exist, then you just, you say, then there’s no badge, right? So pretty simple stuff there, but this is the best I could come up with.

So I don’t have an index. So I intentionally do not have an index on the badges table on the name column. So if I run this query, it’s going to take around 500 milliseconds.

It’s not going to return anything, which is fine because that’s, that’s what I expected. But I wanted to do this to sort of like exacerbate the issue with the UDF. So let’s clear up the plan cache and let’s redo our bad default constraint table with this.

Right. So we’re still supplying just a static value in here. Like we’re not like, like this isn’t going to execute, like do something different for every row. It’s always going to be nope. Right.

So we’re going to ever for every row that goes into the table, we’re going to go look at the badges table and say, hey, does a badge called nope exist? If not insert whatever like other value was in there. So it’s a little bit different from the last one.

And the almost the same thing happens, at least at first. So if we insert into the table here and we look at the query plan. Oh, thank you, tool tip.

I definitely wanted you to show up and not let me click on properties. We won’t get the non-parallel plan warning. Right. So all the same so far. If we look at the exec function stats, DMV, we see no executions. And then if we insert, but we make the default constraint do some work.

Ah, you notice that took about 500 milliseconds. Right. And you might notice that there’s a compute scale out right here that took about 500 milliseconds because we had to go look at the badges table.

Now we’re going to see that we could not generate a valid parallel plan because the function plan is not parallelizable. Right. So good, good, good. Right. So just about the same thing. And if we look here, we’re going to see that we have one execution.

And now we’re going to insert 100 rows, but we’re going to supply a value for the column bad default. Right. So just the same thing as before. You do this. That happens instantly. Right. So this, this, you know, the zero milliseconds. And we will not have the go away tool tip. I hate you.

We don’t, we don’t have that non-parallel plan warning over here. It doesn’t exist. So all like all kind of good stuff there. Right. And if we go look at our function stats, DMB again, we still only have the one execution from when we did the single row insert.

But, so you might notice that there’s not a hundred rows in here because 500 milliseconds times a hundred is like 50 seconds. Right. We don’t want to sit here for that long, proving that, proving these things. So we’re just going to insert 20 rows here. And this is a little different, right?

Cause now it’s 500 milliseconds times 20 rows, which I forget how long that is. It’s maybe 10 seconds, maybe 20 seconds. No, 10 seconds. All right, cool. So now we have our execution plan. And if we right click here and look at the properties, we have our non-parallel plan warning over here.

But far more interestingly, we have a compute scalar that takes up 10 and a half seconds of time almost. Right. Like there’s no time here in any of these, there’s no extra time. Oh, there’s a millisecond added there. And there’s a, well, there’s no time added there. So all like basically everything except one millisecond of execution time is in this compute scalar.

And of course this compute scalar. Oh, thank you. Tooltip. They keep showing up. You’re really helpful. If we dig in a little bit to through the defined values, oops, that’s the wrong one. That’s also the wrong one. We want this one. We’ll see our scalar string for badge check with nope. Right.

So that this, this compute scalar was responsible for invoking our UDF. Right. And if we look in here, right. And we say, Hey, what happened? Now we have 21 executions. So once per row that UDF executed to go look stuff up.

So depending on what your scalar UDF does, if it just processes data, let’s say in memory, right? Like it, like you take an input and you return some other input based on that without going out and touching user tables, it’s all fine. But if your scalar UDF goes out and looks at another table, right?

Then you’re going to get this really crappy, like once per row execution and maybe even very, very slow execution because of what your, if your function body is slow. And then for the parallelism stuff, well, let’s, let’s go look at that. Do, do, do, do, do, do.

We get three plans and all of them are allowed to go parallel. It’s all the same visual indicators of parallelism, whether it’s the parallel exchanges like gather streams or repartition streams, or whether it’s the little racing stripes on all of our little operators here. Right.

So all of these things. So at least it doesn’t hurt select query performance. But if you write a really monstrous T-SQL UDF that goes out and touches user tables, well, you might be in for a bad time for other things. And of course, if we go look at this, we should still have just the 21 executions of the function from the single row insert.

And then the 20 row insert that did not where we did not supply a value for the column that has a default constraint on. So anyway, I thought this was all very fun and interesting. I hope you also thought it was fun and interesting or maybe just fun or interesting.

Maybe one of the two, maybe you just found it incredibly sad and depressing, which is also a good outcome because now you won’t do this. Right. Or maybe you never would have. Maybe you’re just that much smarter than whoever asked me the question. I don’t know. But anyway, that’s good here.

Thank you for watching. I hope you enjoyed yourselves. I hope you learned something and I will see you in tomorrow’s video where I think we’re going to talk about a bug in adaptive join plans with query execution time. So fun things to look at there. Anyway, thanks. Goodbye.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

SQL Server Performance Office Hours Episode 38

SQL Server Performance Office Hours Episode 38



Questions:

  • 3. What are your thoughts on using AI (non-Microsoft) for query optimization and index maintenance? I’ve had 50-70% success tuning queries/indexes with 2-3 LLMs by providing schema/index/stat context first. Considering AI automation: identify poor-performing queries, add env context + tuning rules, use MCP tooling to test in read-only dev env; 3 models propose optimizations, best performer wins and goes to DBA for review. Thinks this is worth pursuing now, since AI will get better over time too? Not worried it will replace my job, performance tuning is only about 5% of my work and changes, generally, only make it to prod if prod reports performance issues. Even when I took the worst offending query and reduced runtime by 12X (no exaggeration) using indexed temp tables and a CTE to do aggregates then join on the temp tables instead of repeated joins on the same tables over and over. The change was *I think* was “too good to be true” and “might have a logic flaw that we can’t properly test for in dev” and thus hasn’t made it to prod… sorry for the rant.
  • Lets say there is a table with Key column of number but it has trailing zeros (may be obe or two zero), i had that column as varchar, but business would never put zero when searching for that column. All out queries had like clause with ‘%%’, that killed the perofrmance. How would you approach the problem ?
  • I’ve never seen anyone use a plan guide, not even in a blog post or quick video. Any idea why? When do you use them?
  • We have a UDF as a default expression for a column on a table. Does SQL Server still evaluate the UDF in default expression even when I explicitly insert values into that column?
  • Hey Erik. It took me many months, but I’ve finished watching all of your videos and turned your lessons into a drinking game. What do you think? I’m playing it right now https://dbfiddle.uk/CQbdBNH5 

To ask your questions, head over here.

Video Summary

In this video, I dive into a lively session of Office Hours where we tackle five community-submitted questions on various SQL Server topics. From exploring the use of AI in non-Microsoft environments for query optimization and index maintenance to discussing the pitfalls of storing numbers as strings with trailing zeros, we cover a wide range of scenarios. Additionally, I delve into the lesser-known topic of plan guides and address some humorous yet insightful queries about UDFs used as default expressions and even the personal experience of someone who has binge-watched all my videos—turning them into a drinking game! It’s been an engaging session, and I hope you found it both entertaining and informative.

Full Transcript

Erik Darling here with Darling Data. And today’s video is, of course, everyone’s favorite. My favorite, your favorite. Of course, that is everyone, isn’t it? Office Hours, where I answer five community submitted questions to my fancy little Google spreadsheet. And hopefully everyone is happy with the answers. If you look down in the video description, there are all sorts of helpful, useful links for you and me. You can hire me for consulting by my training, become a paid member of the channel. Of course, you can ask me Office Hours questions for free. They are not insert coin here. But we’ll see, we’ll see how this, see how this next year’s economy is. And if, of course, if you like this material, you, I would be ever so pleased, ever so chuffed. If you would like, subscribe and tell a friend or two or 10. I think I’ve suggested in the past, just taking people’s laptops and hitting subscribe in their YouTube accounts. Past data community summit coming up in Seattle, November 17th to 21st. Kendra Little and I doing two bang up days of T-SQL pre-cons, arguably the best two days of T-SQL pre-cons that have ever taken place in the world. I think I’ve mentioned before, I do have a couple other announcements coming up soon, hopefully sooner than later, so that you can see what you’re doing.

The fine folks of the world can start purchasing some tickets. Be nice there. Anyway, let’s go answer these gosh darn questions, shall we? Oh, we have a third. So if you watched last week’s episode of Office Hours, we might remember our friend with the high volume fintech OLTP something or other on SQL Server 2017 Enterprise Edition. Third question has arisen from our friend. Hopefully we can give our fintech friend enough free advice here. What are your thoughts on using AI non-Microsoft? Well, geez, I’m convinced. For query optimization and index maintenance.

We’re having AI rebuild our indexes? I don’t know. What is that? What are you on about? I’ve had 50% to 70% success tuning queries. Okay, well, I have 100% success. How’s that? With two to three LLMs by providing schema index that…

I’m just going to say something here. You work in fintech and you’re sending schema index and stat context to different LLMs. That’s interesting. Okay.

I hope they’re local. Considering AI automation. Identify poor performance queries. Add environment context and tuning rules.

Use MCP. Holy Moses. Okay. Performance tuning is only 5% of your work. Boy, oh boy.

Look, if it’s working for you, go ahead. I don’t know. Performance tuning is 5% of your work. If you need an LLM to tell you to use index temp tables and a CTE, you’re probably better off doing that.

All right. I need to move on before my head explodes. Okay. Let’s say there is a table. Yes, let’s say there’s a table. I would love to say there’s a table.

With a key column that’s a number, but it has trailing zeros. I don’t know why a number with trailing zeros is strange. Many numbers have trailing zeros.

Numbers with leading zeros are a different matter. I had that column as Varkar. Oh, great way to store numbers.

Varkar. Good job. But business would never put zero when searching for that column. All our queries had claws with double wildcard that killed performance. How would you approach that problem?

Well, I wouldn’t. That is just a fundamentally unsound design. You have allowed business users to trample over the database entirely, and they’ve won now. You can’t take that away from them.

I’m saying you because you asked the question. But whoever designed this screwed it up from the beginning. Even if you put a columnstore index on that, the double wildcard, assuming this is going to be a Varkar at least 10 or 11, right?

Because most people, when they store numbers as strings, at least account for the maximum width of whatever data type the number is supposed to start with. So if it’s an integer, then it would be like, what, 10, 11 or something, right? If it’s a big, the 9 quintillion number, it’s like 15 digits.

So, like, you know, it would be like a Varkar, like 10, 11 or 15 or whatever. So, Varkar 10, sorry, because it’s the same number of digits as a phone number. It’s a 214, three other digits, four other digits.

So, you’ve just ruined the whole thing from the get-go. You might be able to do some very complicated stuff with trigram searches, but I don’t know if you want to implement all that just to help these people out. But the smarter thing to do would have been to give them more restrained search protocols so that you did not end up here in the beginning or from the beginning.

There you go. Anyway. I’ve never seen anyone use a plan guide, not even in a blog post or quick video.

Any idea why? When do you use them? I’ve seen many people use plan guides in blog posts and videos. So I have no idea why you’re not able to find them.

Perhaps you do not search well, whoever you are. When do you use them? Well, you use them when you need to guide a query to a different query plan that it is not finding naturally. The name is somewhat self-explanatory in that regard.

I’m not going to pretend that they are very easy or ergonomic to use. There are many times when even yours truly has struggled to get the query to pick up on the plan guide. But with enough trial and error, you can usually get it working.

All right. Let’s see here. We have a UDF as a default expression for a column on a table. Does SQL Server still evaluate the UDF in the default expression even when I explicitly insert values into that column?

Oh, I see. It’s a default. Okay.

It shouldn’t know, but there’s no telling what strange things might happen when you start doing awful things like using UDFs as default expressions. I’d actually never even considered that level of terrible.

So good job you. You might make it into a future demo with that level of bad. Holy Moses. All right.

My Lord. Hey, Eric. Hey, Eric. It took me many months, but I finished watching all of your videos. You should go outside.

You should go enjoy life. There is a world out there. I don’t know why you’d want to watch all of them. Oh, well, that’s why. You’ve turned my lessons into a drinking game. Well, I changed my mind.

I wholeheartedly approve of you. What do you think? I’m playing it right now. Well, what do I think? I fully endorse drinking games. I think that they are one of the last vestiges of proper modern society and that you should continue to watch my videos and drink.

I’m not going to that link, but I don’t want to know because I have a feeling if I go there, I’m going to know how much you drank and I’m going to know how much drinking it took you to get through my videos and that might make me a little sad.

So I’m not going to go there, but I like that you have the kind of go-getter attitude that you create drinking games for me. All right.

Thank you for watching. I hope you enjoyed yourselves. I hope you learned something. And I will see you in next week’s office hours, but also in tomorrow’s video, whatever that may be.

I’m sure it’ll be a humdinger. Actually, we’re going to go through my Rogols presentation that I recently finished giving at the Pass on Tour events. So we’ll have that to look forward to, won’t we? Rogols.

Yay. Fun. It would be awesome as if I talked about Rogols and no one compared me to other people who have talked about Rogols.

That would be fantastic. It’d be nice if I could talk about something without someone just pointing out that, hey, someone else has talked about that. Wow. Cool. All right.

Rogol and away all this week. Thanks for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Row Goals: Part 4

Row Goals: Part 4


Video Summary

In this video, I delve into the world of query tuning for Rogals—those pesky tables that can sometimes cause performance issues. We explore how to identify when and if using a top hint or row goal is appropriate to optimize queries. By analyzing a specific example where SQL Server scans 11 million rows unnecessarily, I demonstrate how adding a seek hint with the top operator can significantly improve query performance. Additionally, I discuss the nuances of using row goals in conjunction with cross apply and for hints, showing how they can help SQL Server make better decisions about execution plans without resorting to overly aggressive hints that might not always be allowed by strict DBAs.

Full Transcript

All right, welcome back to the fourth and final installment of the Rogals talk that I’ve been recording and hopefully presenting to the adoring masses. We’ve looked at a number of different things about Rogals, like sort of some like what they are, some good stuff, some bad stuff. We’re going to look at some cool query tuning stuff that you can do with them now, and ways that you can sort of figure out if Rogals are appropriate for you.

All right, so we’ve got this query right here, and this query is kind of funny because it only returns one row. Again, going back to the rows per second ratio thing, and it takes about 900 milliseconds to run. It’s not disastrous, but like, I don’t know, maybe we’re just unhappy with it for whatever reason.

The reason why I would be unhappy with this query is because I’ve got a perfectly good index on the post table that we could seek into and life could be grand with. Okay, but no, we start off by scanning the post table, and it takes about 900 milliseconds.

If we look at the work that SQL Server did to, on that scan, it was 11 million rows. All right. Why would SQL Server scan 11 million rows when it could seek to them?

Well, if we add in the for seek hint here, we will get a different execution plan, arguably better and faster execution plan back, right? So we say, oh, well, there’s 15 milliseconds now.

We started seeking into the users table. We found some rows. We did a seek into the post table. We did a seek into the comments table. Note here, there’s a top above a seek, and this is not a disaster the way a top above a scan was. I mean, this one’s, oh, this is okay for us.

So maybe for whatever reason, we were like, ah, we’re better than for seek. We can, we can get away without that. Maybe, maybe we have terrible DBAs who would yell at us for, for using query hints.

Cause some DBAs are like that. Some DBAs who couldn’t, who couldn’t actually tune the query themselves. It’d be like, you never use a hint.

Uh, so a way that we could get around our DBAs prying, hinting, hating eyes, uh, would be to, uh, use a top. Right? So just like we’ve done in other queries, we’re going to use a top with a big number here to tell SQL Server that we want to do a particular part of this query together.

And then, uh, we want to make sure that, um, SQL Server sort of like puts a little like hug around this and does this stuff and then does the other stuff or not, not like then, but it does this stuff on it in its own sort of world. So if we run this, uh, with the top in there, we get back a, another, another query plan. Again, this like, just like using the four secant and the last one, this one also takes about 15 milliseconds.

Now we don’t have the outer, we don’t have an outer top out here, right? We just have a top down here, right? There’s no root level, uh, row goal, but there is one down here in the top.

And the top puts this part of the plan together. What, what this does that’s kind of cool is it, again, it starts off with the users table, right? So we seek into the users table, we find the stuff we care about there, and then we go into our nested loops join.

And then for each row that, uh, comes through here, we go into the, we go into this part of the plan and we seek into the post table without having that four secant. So the combination of sort of cross apply in the top, um, with the cross apply with the top, and we have the, the correlation within the cross apply here. If we take out the top and we rerun this, we’re basically back to the original execution plan, right?

This one’s back to like 900, 900 milliseconds. SQL Server makes the same kind of boneheaded choice to scan the post table out here first, and then do a hash join to the users table, which is not what we want, right? But having the top in here is what kind of gets SQL Server on board with doing what we want within the cross apply.

Uh, one thing that’s actually, uh, kind of worth noting here, because, uh, someone actually asked me about this, uh, at, uh, the, the first time I gave this talk, which is in New York, I think. God, it all feels like 10 years ago now. I’ve slept like five minutes in the last three months.

Uh, but it was about the, the, the star in here, right? So we have a select P dot star from this, but obviously, uh, the only thing that we’re getting in the outer part of the query is the ID column, right? So SQL Server is smart enough during sort of like, uh, like some, some of the early optimization stages where like, I don’t know, parsing, binding, algebraizing, simplifying stuff, all that stuff that we usually skip over.

Well, I mean, simplification is cool, but the rest of it, uh, SQL Server figures out pretty early on that we only need the ID column in the, like the outermost project of this. And so it throws away all the star that we don’t need in here. So kind of a nice thing.

Uh, anyway, um, I’ve spent a lot of time in my videos, making fun of, uh, the optimize for unknown hint. Uh, because people use optimize for unknown, uh, I mean, or some, some equivalent of it to fix parameter sniffing, right? Uh, they’ll either use optimize for unknown or they’ll set, uh, some local variable equal to a formal parameter passed into the store procedure.

And they will put a quote in there that, Hey, I fixed this whole thing. Optimize for is cool for a lot of reasons. Uh, but not that.

So optimize for is neat because you don’t have to optimize for unknown. You can optimize for specific values. And one place that you can optimize for specific values is with top. Now this of course deserves, uh, the biggest shout out in the world to Adam Mechanic.

Because if you look through SP who is active or some of his other code and stuff, you’ll see this pattern all over the place. And it’s a very cool one. So thank you, Mr. Mechanic for, uh, introducing the world to this one.

Uh, so all the, all the credit goes there. So let’s say we have this query. And we run this and we have sort of a similar situation to the last one.

SQL Server, once again, chooses violence. It starts off by scanning the post table. And again, this is reading 11 million rows and it takes 900 milliseconds, right?

And we are, well, just about 900 milliseconds, 863 milliseconds, which we’re rounding up, uh, just to keep things in order, right? 900 milliseconds. This sucks.

We hate it. We’re mad at it. We don’t want this thing to keep doing what it’s doing. No, we could, of course, like in the previous example, we could throw a force seek hint on the post table. But again, we have mean, crabby DBAs who do not allow us to use force seek hints. They only allow us to use, uh, code from crazy people.

So let’s say we want, uh, in a different way for SQL Server to use, uh, the, to start with the users, kind of start with the users table and the filtering that we’re doing on the users table and then move on from there. All right. Let’s say that that’s just our goal for this.

All right. It’s our, it’s our, our goal. And we’re going to use a row goal to do it. So, uh, what we’re going to do is, uh, we’re going to use some again, top magic, right? We’re going to say, we’re going to declare the top and we’re going to set it to the nine quintillion number, right?

The big, big number, the hugest number in SQL Server. I guess, I guess some decimals and stuff could be bigger, but who has time for that? One, one could die thinking about these things.

Some, some probably have, you know, if you think, if you think carefully about it, but here we’re going to use top. Uh, we’re going to use this top, well, local variable, uh, in this part of the query. So we’re going to cordon off this select at this top at top from users where creation date, blah, blah.

Uh, and reputation is greater than blah, blah, right? And we want SQL Server to give us the top, uh, nine quintillion rows for this. But maybe we don’t want SQL Server to come up with a query plan for nine quintillion rows.

Maybe, maybe if we gave to see if we gave SQL Server that big number, maybe SQL Server would do something terrible. Maybe it wouldn’t use our index. Maybe it would freak out and just be like, uh, I can’t, I can’t wrap my head around it.

I’m going to give you the worst query plan I can. Ha ha ha. This is the most expensive one sucker. Right? Like you don’t know, like it just freaks you out when you think about it, keeps you up at night.

So let’s say we wanted SQL Server to, uh, do this, but, uh, Oh, sorry about that. Do this. Right.

We want, we want to get the top big number, but we want SQL Server to act like this is a small number. And maybe we don’t want to add a fast end hint to the outer query because again, the crabby DBAs, the discipline crabby DBAs will come and pinch us at night. So what we can do is something that’s very cool.

And we can say, we can tell SQL Server to optimize that for that top being equal to one. So even though we have the top with the big number, right? We’re going to keep, we’re going to get all the rows in the world back.

We’re going to opt. We’re going to ask SQL Server to optimize for a query plan that only asks for one row. So it’s sort of like saying fast one, just with the top and the row goal. And when we run this, we get back a much snappier execution plan.

Right. And we indeed have a top operator here and we have a seek into our index on the users table and then a key lookup to get some other stuff. And then we go and get the rest of the query plan done, going into a nested loops, seeking into the post table and then doing the not exist portion of the post table thing.

So row goals, fun, fun, fun. Uh, row goals are sort of like promises from the optimizer about how many rows will be produced. Um, you can set them, uh, or rather you may set them explicitly using top offset, fetch fast.

And those are the most common. Uh, there’s also like the set row count thing, but we don’t need to talk about set row count. That’s you use that.

I know, no mercy and no pity. Uh, and the optimizer may set them implicitly. Uh, if you use, uh, things like exists, not exists in, not in, or equals any equals all or equals any or equals some equals all is much different. Uh, and they can act at times like optimization fences to help you reshape query plans.

Why they matter is because they change how the optimizer costs and build plans. Uh, they can make some pretty impossible queries very, very fast by forcing, uh, better join orders or just sort of better, like things happening together that, you know, especially if you can get it. So they reduce rows very, very early on.

That’s a very powerful thing that you can do, um, to, uh, start getting better query performance without having to throw hints everywhere. Cause hints often break down over time. Um, exists will, uh, generally already have a role goal of one.

You don’t need to say select top one distinct order by group by all the other stuff. Cause SQL Server just throws it right in the garbage. Uh, top 100%.

Like we talked about, if you put this in a view, the optimizer completely throws it in the garbage, slam dunks on your head, gives you a wedgie. It’s a mean time. Um, if you are going to use top and you are expecting data back in a specific order, make sure you have use deterministic ordering. That is like, you know, the order by elements that you have, there is at least one unique tiebreaker within them.

Otherwise you have non-deterministic ordering and you may have silent bugs and stuff in your query plan. There are some very cool things that you can do, uh, especially with cross supply and top. Um, you can get nice, easy nested loops joins.

You can get SQL Server to seek into stuff when, uh, you thought that you would never convince the optimizer to do the right thing. Uh, if you were hitting problems with row goals and stuff like if exists, or, uh, you see that top above a scan, um, for the, if exists, we used count big to fix stuff. Uh, for the top above a scan, it’s usually creating an index. So you have a top above a seek.

That will make life a lot easier. Uh, and if you wanted to, uh, maybe ask SQL Server for a large number of rows, but only come up with an execution plan, like it’s going to hit a small number of rows, then you can use parameterize top and, uh, tell SQL Server to optimize for top equals some small number, right? It doesn’t have to be one, just has to be a smaller number than nine quintillion.

So, uh, that wraps things up. Uh, thank you for watching. Uh, again, we use row goals wisely and as always, I’m Erik Darling and my rates are reasonable. All right.

Goodbye.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.