A Little About Optimizer Nudging In SQL Server
Video Summary
In this video, I delve into optimizer nudging—a technique where we rewrite queries to guide SQL Server’s query optimizer towards the execution plan that best suits our needs. We explore various methods, such as using `TOP` in subqueries and leveraging primary keys for faster performance, while also touching on how index alignment can significantly impact query efficiency. By separating relational elements from informational ones, you can craft more optimized queries. Additionally, I provide links to valuable resources in the video description—ranging from consulting services to training opportunities—that can help you master this technique further. Whether you’re looking for professional assistance or want to support my content, there are plenty of ways to get involved and improve your SQL skills.
Full Transcript
Hello again. Welcome. I’m glad you’re here because we’ve got a couple interesting things to talk about. We are going to discuss optimizer nudging. And of course, by optimizer nudging, I mean the process of rewriting queries in order to get the optimizer to do what we want. So this is just a brief video about that. We cannot go into depth about that because then you’d have to pay me and then you wouldn’t be watching this for free. So we’re just going to, we’re going to just do a little bit about that. Uh, if you look down in the video description, right somewhere in here, uh, you’ll see all sorts of very helpful links. If you need help nudging the optimizer, well, you can hire me for consulting. Uh, if you want to learn how to nudge the optimizer further, you can buy my training. If you just want to support, uh, my, my continuing to provide free, free help with, uh, nudging the optimizer, you can become a channel member and contribute money to the channel. Uh, you can also ask me office hours questions. And of course, if you like this content, uh, the, one of the more, one of the most helpful things you can do is of course, like subscribe and tell a friend because if you keep it all to yourself, it’s very selfish. Uh, pass on tour. This will be one of the last opportunities for you to purchase a ticket for pass on tour Utrecht, October 1st and 2nd. So, uh, I forget.
I forget which day I forget which day this week. I have to stop saying that, but I’ll figure it out before, before I, before I mess anything up. And of course, past data community summit, Seattle, November 17th to 21st, two days of T SQL pre-cons with Kendra little. That’s me, me and Kendra little Kendra little. And I will be there teaching you amazing things about T SQL. And then I’ll be home for a bit until I don’t know, someone else decides to pluck me from my domicile and throw me out into the world and expose me to all of you. people again. So anyway, let us begin by nudging the optimizer and fun, various and fun and sundry ways or something like that. So, uh, I’ve got an index created here on the post table on creation date, uh, and then score. And, uh, you know, we’ll, we’ll, we’ll, we’ll talk more about this later, but for now, just stick with me. So, uh, we’ve also got this query and we’re saying, give me the top thousand rows, uh, where creation date is between these two dates or betwixt, not exactly between, because between would be less than and equal to two.
Right. Between is inclusive in that way. Whereas this is exclusive of the stuff. Uh, and if we run this query, uh, we’ll get what I would call a sort of ugly query plan. Right. Uh, we, we start by, uh, seeking into our index and we get kind of a lot of rows out of there. And then we, uh, we do a key lookup. And of course this, this is a row sort of a row by row operation because it is a nested loops join and the whole, and then we end up sorting the data over here.
So we still have, we, we, we created an index, but we still, we still have to sort data, which is, which is a bit upsetting. And then at the very end, we, we top out with our thousand rows and we, uh, we, this all takes about 750 milliseconds. Now the situation can of course get worse if we hit upon one of SQL servers, many tipping points.
For example, if we say select the top 1000 and we increment this date by one to 2013.0318. Remember this was 2013.0317 up here. Uh, SQL Server will neglect to use our nonclustered index entirely. And this will all take about 900 milliseconds after scanning the post table and all that other stuff.
So let’s say that we’re unhappy with this and we want to nudge the optimizer in again, various and sundry ways. Uh, we might decide to rewrite our query like this, uh, where rather than allowing SQL Server to choose what we’re, what the, between doing a key lookup and a clustered index scan. Let’s say we do this, right? And we say, select the top thousand rows, uh, from posts where P dot ID is in yada, yada.
And you know, we, we do all that same stuff. Well, we can get a much faster performance doing this because we’re kind of taking advantage of some stuff that we know about SQL Server. One, it digs primary keys. Primary keys make things very easy. So if we just, uh, say select star from the post table where the primary key, which is the clustered primary key, which is the ID column is in these top 1000 rows.
Well, SQL Server doesn’t really have much to worry about here. Cause this is the, this is a very narrow select list for the top 1000, right? And, uh, even if we were to bump this up to that 18 number, there was a tipping point before, well, SQL Server is still, uh, keen on doing what we want.
Right? So this is, this is all a much better arrangement than just writing the query like this. Now, of course, if when you are first writing queries, uh, you, you should, uh, arrange those queries logically in a way that makes sense, right? A simple and logical way that attempts to get data the way that you want it.
And then if you are unhappy with the performance of those queries at the outset, then you might experiment with different syntax. And this is kind of what this is all about. So, uh, adding in the, the, the select top 1000 to a sub query, we’re still almost doing a key lookup.
We’re just doing it a little differently. Right? Because we still have to look those columns up. We’re just writing the lookup in a manual way, right?
Cause we’re saying select star from posts up here. And then we’re saying, but you know, just where the ID is in this. And that’s what SQL Server does. It gets the thousand rows that we care about in here.
We still have to sort data, which we’ll talk about in a minute. Uh, and then we can see that this is where the top isolates those thousand rows we care about. And then this is where we go and do the lookup, right?
Cause we still, we still have, we have the nested loops joining. We do a lookup based on the thousand rows that we got based on the stuff we cared about up there. And we seeked into the index down there to get the columns we care about just for those thousand rows. The very, very handy thing.
And this comes, this comes down to something that I’ve talked about a bunch of times in other videos, which is sort of separating relational elements from informational elements. Informational elements are things that you’d want to show to people. And relational elements are things that you need to filter on and order by and other stuff like that.
So separating those two in your mind, like manually separating those in your mind, and then manually writing the query with those separations in mind can do very good things. Of course, one thing that is a little, a little annoying, and I only say a little annoying because, you know, it’s just how SQL works.
But you can’t, you couldn’t use exists for this. All right. So like a lot, a lot of like videos that I, where I’m like, Hey, exists and not exists. They’re pretty great. Problem is you can’t put top in an exists like this, right?
So like in, in, in this, in the, the query plan for this query up here, we very clearly have a top operator limiting the rows to 1000. But down here in this exists query, if we were, oh, scroll down a little too far, blew the whole gag. Uh, uh, if we do this, when we just hit control L notice, we don’t have a top here.
SQL Server completely throws the top away. Right? SQL Server is like, no, you don’t get a top. You think they’re not limiting rows in the exists.
Top is gone. All right. So we couldn’t do that here. We could use an outer top with exists. So top 1000 out here.
And then this in here, the thing is it was like, I mean, I’m gonna compat level one 60 for this. So I get batch mode on rowstore. So with batch mode on rowstore, of course, the query plan looks like this and I get, you know, uh, I don’t know.
It takes about 600 milliseconds and I get an adaptive join and all these other batchy things happen in here, which, you know, is fine, but it’s still not the plan that we’re really after. Right? Cause we want to limit those thousand rows before we do anything else.
Right? So like the adaptive join here is just like, and batch mode are just kind of like, oh, look, we saved the day. If we were using an older compat level, we might see a plan that looks like this with a loop join. Right?
And, you know, we kind of, we’re kind of just back to the original plan. We had that took about 700 something milliseconds, 750 milliseconds or whatever, but the 20 millisecond difference here isn’t, isn’t interesting. Uh, but it’s, it’s almost like we have, we’re still doing like the manual key lookup, right?
It’s just not called the key lookup here. It’s we’re just, cause we’re just joining the two tables together. Right? That’s what exists is doing.
It’s just joining essentially, or semi joining, depending on how you look at it. So we’re getting the top thousand rows from posts and saying P dot star where exists, but like this doesn’t like, I don’t know. I, the, the other, the other way of writing this that I showed you before with the in was, you know, with the, we could put the top 1000 in the in clause and just get those thousand rows.
We ended up way better off. It was like a hundred something milliseconds versus, you know, 700 something milliseconds. Now, one thing that is worth talking about is you could also consider this to be an index problem, right?
So like a very important concept when you’re performance tuning queries is aligning queries to indexes, right? So if we look, remember the index that I created up here was on creation date and score and score the, by default, this is going to be ascending order for both of these. All of the queries that we’re writing have creation date and ascending order, right?
So that’s why we still had to sort data across all of these. If you, if we were to think about this as like, you know, a query index alignment problem, and we took the descending, like sort out of score, right? We just said score ascending, then we would end up with a query that’s pretty fast naturally on its own, right?
Like this takes two milliseconds. Well, that’s even better, right? That’s, that’s pretty great. We don’t have to worry about sort of getting more in depth with things and rewriting queries and funny ways within and pondering why exists doesn’t allow top and all that other stuff.
So we might consider this just an index alignment problem. And if we create an index on creation date with score descending, right? So now score is in explicitly in descending order in this index, and we go and run that original query.
Well, guess what? Our original query doesn’t need rewriting either. So if you want to think about, you know, if you’re in a situation where you can’t really change indexes because maybe the tables are too big or you’re on standard edition and, you know, the blocking, you know, sometimes you have to think about rewriting queries in a way that takes better advantage of the indexes that you do have.
If you are allowed to change indexes, then you might, and you’re not allowed to change queries, which is a frequent thing for a lot of the sort of third party apps that I see people using SQL Server with, then, you know, doing something like this can make queries way better. Of course, if you’re allowed to change both, then it’s really, you know, chef’s choice as to how, what temperature that steak gets cooked at. You can, you can choose to either rewrite the query in a neat way, hopefully with some comments, or you could change, or you could add or change an index that would more better conform to the way that you have logically written your query.
But these are both forms of optimizer nudging, where we either have our queries conform to the indexes or have our indexes conform to the query. In either case, we end up with better performance. All right. I think that’s probably about good here.
Thank you for watching. I hope you enjoyed yourselves. I hope you learned something. And do remember to visit the video description down below for all of the helpful links that I can possibly provide for you. All right. Cool. Thank you for watching.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.