"SARGable" Sorting and Grouping In SQL Server

“SARGable” Sorting and Grouping In SQL Server

Thanks for watching!

Video Summary

In this video, I delve into the concept of sargability in SQL Server, expanding on its traditional usage in join and where clauses to include sorting and grouping. I explain how wrapping columns or functions around expressions can disrupt index usage, leading to suboptimal query performance even when the differences might seem negligible at first glance. By sharing practical examples and execution plans, I illustrate the importance of maintaining sargability not just for search operations but also for overall data integrity within indexes. Throughout the video, I emphasize the broader implications of non-sargable predicates on code quality and database performance, encouraging viewers to adopt cleaner coding practices that can prevent future performance issues.

Full Transcript

Erik Darling here with Darling Data and continuing with my new channel setup where I’m going to do my begging and pleading at the beginning of videos where people are still paying attention. Of course, you fully have the option to skip over this part. That’s up to you. It’s a cruel world. What can you do? This video we’re going to talk…we’re going to use the term Sargable a little more loosely. …than some people might be comfortable with. Sargability typically deals with you writing expressions in a way that allow SQL Server to effectively use indexes for them. Whether it’s a join or a where clause. But we’re going to talk about the Sargability even though there’s not technically a search argument involved in the context of sorting and grouping because it can do stuff there. So, this is a This is my YouTube channel. All of this content is free. I don’t have any problems except Intel timing out looking for drivers constantly. God, I hate that thing. And I’d like to keep it free. So, in order to help me do that, you can subscribe to the channel. There are very low-cost memberships available where you can donate a buck or two if you’re feeling generous.

But if you can’t be bothered with a buck or two, then please. I like likes. I like comments. I like subscribers. I like growing this channel so that someday when I do have to spring commercials on everyone, it can be to the largest possible audience. I’m also a SQL Server performance tuning consultant. I can do all of these things in order to make your life with SQL Server better and easier. And you can hate SQL Server less because I’ll take that on for you. I will be your SQL Server sin eater. If you need SQL Server training at a low cost, you can get 75% off mine with the coupon code SPRINGCLEANING. Just go to training.erikdarling.com, plug that code in, and off you go. Stream away, my friends. Stream away.

As far as where you can catch me live and in the flesh, Friday, September 6th, I have a full-day pre-con at Data Saturday Dallas. And November 4th and 5th, I have full-day pre-cons both days at Pass Data Summit in Seattle. And I also have regular sessions attached to those events. So, if you’re going there, stop by, give me a hug, come to my pre-cons.

It’s a nice thing to do. All right. Cool. Everything went black. Most people are used to dealing with the term sargability in the context of join and where clauses. So, if you do something like this, or if you do something like this, most right-minded people will call you a damn fool.

When you wrap columns and functions like this, you ruin SQL Server’s ability to interpret indexes in an orderly manner and find and join your data in a quick and orderly manner. It is generally not great for performance when you do this.

It’s not going to hurt you every single time. But it is something that you’re going to want to avoid because someday it will hurt. Or someday someone will copy what you did in your code and it will hurt.

And they’ll think you’re an idiot for doing these things. So, generally, this is sort of like a sanitary code thing for me. You know, avoiding this sort of stuff generally tends to help your code bases in general be cleaner.

I’ve said it on the channel before in other videos. But when you write code, code is culture. So, if your company culture has good, clean code in it that doesn’t have a bunch of things like non-sargable predicates, then you will tend to have a better, more quickly functioning code base than if you allow this sort of crap to go through.

Because this is ideally not what you want to see in your queries. This can also apply to other parts of the query, though. It’s not just joins and where clauses.

So, just to give a quick example, none of these queries are going to be bafflingly slow or awful or ugly. But you will see differences. I created an index on the comments table just on the score column.

Because it’s a good column to create an index on. Especially for the demos I’m about to run. Because, um…

Would you believe that I wrote this specifically to show you these problems? I don’t know if you would believe that. People think I’m a lot more slapdash than I am.

Maybe I am. So, let’s run these two queries. And we’re just going to get the top one score from the comments table. And we actually sort of got lucky.

Because, you know, the score column is not unique in any regard. But we did not have a tie for the highest scoring comment. How a comment got a score of 1270 is a real remarkable thing to me.

Because this is the comment that has a high score of 1270. Did you really stumble upon that exact situation in a real life scenario? Or was this question only meant to be a puzzler just for the fun of it?

Space? Question mark. Apparently that was profound to almost 1300 people. So, I wonder…

I don’t know how many… I don’t know what that comment’s current upvote situation is. I might go to the Stack Overflow Data Explorer after I record this and see. Because I’m a little fascinated now.

So, both of these queries return the same row. Which is, in our case, lucky. Because when you have non-unique columns that you’re ordering by, you often need a tiebreaker column to give you a reliable result that is a unique column.

You would need to order by, like, the score column and then, like, ID descending or something. Because ID is an identity column and you generally won’t see too many dupes in identity columns. But coming to the execution plans, look what happens here.

And the one where we have a score unencumbered by is null, this whole thing runs in zero seconds with a serial execution plan. Right?

Even though we have to scan this index, that’s okay. Because all we have to do is grab one row out of the index. We do a lookup to get whatever else we want. But this lookup is for one row. And would you believe a one row lookup is not the end of the world?

It’s not a tremendously big penalty on a one row lookup unless you’re using columnstore. So, this is pretty quick. In the second query, where we wrapped the score column in an is null, right?

So, if there were any nulls in this, we replaced it with zero. We have a parallel execution plan. This thing runs at DOP 8.

It takes almost a second to run. And SQL Server finds it necessary to sort the data. Notice we don’t have any sorts in this query plan up here, where we didn’t wrap score in is null, because the index presents score to us in an orderly manner, right?

Indexes put data in order. This is one of the things they do that makes queries go faster. We have ordered data.

It’s good for searching. It’s good for already having data prepared in the order that we need to present it in. So, all good there. But as soon as we wrap score in is null, SQL Server says, oh, no, no.

This is no longer in order. We’ve got to sort this, right? And you can see that this sort is on an expression. Expression 1, 0, 0, 1 descending. That expression maps directly to is null score 0 over there.

So, this goes parallel, requires almost a full second to run, takes about a, oh, I guess that’s one meg memory grant. Not the end of the world, but, you know, just needing one at all for this, it’s pretty ridiculous because we have an index that has this in order.

But we ruin that index’s ability, or rather we ruin SQL Server’s ability to use that index effectively by doing this. This can also happen with grouping, right?

And, you know, this is probably less of a performance suck than I think the ordering problem is. But I want you to pay attention to a couple of things in these query plans because they’re interesting, right?

There’s not a vast difference in timing between the two, right? And these both generate parallel execution plans. This one takes about 500 milliseconds. This one takes about 700 milliseconds.

So, there’s only about a 200 millisecond difference. But in the plan where we wrap the score column in is null, and we group by that, SQL Server has to add extra operators in, and we have to do extra work in order to do that.

For this small query right here that we’re doing, it’s not that big a deal. For bigger queries that process way more rows, or even bigger queries where, you know, you might join stuff off together, and there might be all sorts of other crazy things going on in the query, this stuff has sort of like a snowball effect on things.

And, you know, I think it’s important to understand, again, coming back to like the code is culture thing, don’t make, if you write cleaner code, you’ll have fewer problems in general.

Like paying attention to the basics and the fundamentals of things is really important because then you have, like your code is more, your code runs more reliably fast. You don’t have as many weird bugs and other things to worry about. And you don’t have to worry about these pile-on effects where, I don’t know, let’s say that you stuck this group by query in a CTE, and then you were joining it off to something else.

SQL Server might make all sorts of bad choices because it had to make extra steps, do things less efficiently, maybe make worse guesses as far as cardinality goes. All sorts of pile-on effects can happen as soon as you start allowing these little unwanted in your code base.

Now, for this one, notice something kind of interesting here. SQL Server starts with a stream aggregate, right?

And this basically reads the data in order from here. So rather than having to, you know, sort data or do anything else, we have the data that we care about in order here.

We do need to sort after this one because the repartition streams, I don’t believe was helping us maintain data in order. That’s okay. But really the important thing here is that we just don’t have any unnecessary additional steps in this plan in order to get the results that we need.

In this one, we have an extra compute scalar to generate the isNull on the score column for grouping. And then we have this partial aggregate where we partially aggregate data.

And we are off a little bit in the cardinality estimation department. Again, this is, again, pile-on effect, right? In SQL Server, this could make a more profound difference where the cardinality estimation was off by more and we had to do something else with this result, join it off somewhere else, you know, whatever it is.

And then, you know, coming over here, we do have to, you know, again, repartition, sort, stream aggregate. So we do just a little bit more work in this query. The amount of work that these two queries do, and even that the two sorts do, pretty inconsequential in the, and like, you know, if you just looked at that case, like, you know, just on its own, you know, for the sorts, you would probably be like, oh, yeah, a serial plan that finishes in zero seconds is better than a parallel plan that takes a full second.

For the grouping queries, you know, again, nothing, nothing end of the world here. I just want you to understand that when you start adding in these extra unnecessary things, you start adding in extra unnecessary work for SQL Server to do in order to produce the result you want.

So the concept of sargability for me, you know, well, it is, you know, if you, if you want to polish some ivory on it, it is purely for situations like this, where, you know, you’re wrapping columns and functions and those, those, those functions prevent SQL Server from being able to use your indexes effectively to find data, right?

That’s search argumentable. But sargability for me kind of goes a little bit further because I feel like the concept of sargability is important to grasp because it’s not just for searching things.

Sargability is literally just about searching things, but I think that the concept of not disturbing the contents of your indexes with function calls wrapped around columns goes, needs like a bigger word for it.

There has to be like a concept beyond just like, sargability is this one part of it, and then there’s this other part of it where like, you’re just disturbing index order, and that disturbing that index order messes up SQL Server in all sorts of weird ways.

So it’s not just when like the sargability problem, when you’re searching on stuff. There’s also this problem where, you know, as soon as you like wrap a column in a function, SQL Server has to process things in a different place, right?

Because things like reading data and things like, you know, preserving index ordering, that’s like storage engine stuff. That’s stuff that happens like way, way down over here in a query plan, right? That happens like way, way down deep in the query plan.

Running functions, even built-in functions, for SQL Server, which have absolutely no relational meaning to the engine. Those happen like a few layers up in this like expression service of the query execution engine.

So they don’t happen down here. They don’t happen down here. They happen way up here. So like a bunch of stuff has to happen down here before you get to the point where you’re calling that function.

And that’s where you can really start screwing things up, is because you take away SQL Server’s ability to do things way deep down in the execution plan or in the storage engine.

And you all of a sudden force it to start doing stuff way up here where it’s far less efficient, right? And again, for this query, there’s only a couple hundred milliseconds difference. But you wouldn’t, I don’t think most people would expect to see a couple hundred milliseconds difference when the only thing that we’re doing really differently is either grouping by score on its own or grouping by is null score here, right?

So this is where things get, this is where we put the extra work in for this query. Now, again, 200 millisecond difference, not gonna end your world, but as you start dealing with bigger and bigger data or you start using data that you do this sort of thing to in the context of bigger queries, you’re gonna start noticing more and more profound differences and you’re gonna start seeing more and more profound problems with query performance as time goes on and your data grows and your queries get slower and you start wondering why.

And then you say, gosh, I ought to hire that Erik Darling to fix these problems. And then my job is to come along and delete is null. So, you know, there is that.

Anyway, thank you for watching. I hope you enjoyed yourselves. Hope you learned something. I hope that you didn’t skip over the part of the video where I asked you to like and subscribe and comment and hire me and buy training from me because that would be mean.

And I’m not mean to you. I don’t know why you’d be mean to me. So I’m gonna go upload this and then, I don’t know, maybe I’m gonna record something else.

We’ll see how it goes. I do need to get to the gym today. I do need to keep up this godlike physique that you see before you. So we’re gonna go to the gym.

Maybe we’ll talk more about that in a different video. I don’t know. Maybe that’ll be a TikTok exclusive. We’ll see what happens.

All right. Cool. Thank you for watching.

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.