Stubborn NOT EXISTS Ordering

Video Summary

In this video, I delve into the fascinating world of stubborn not exists ordering in SQL Server queries. You might be wondering why I’m wearing a headset that’s making me look like an air traffic controller—well, it’s because my fancy wireless microphone setup broke, and I’m waiting for replacements to arrive. This quirky situation has led to some humorous moments, but it hasn’t stopped us from diving deep into SQL Server optimization techniques. I use dynamic SQL and XQuery to generate random not exists predicates and analyze their execution order in the query plan. By running this process multiple times, we can observe how the optimizer handles these clauses and whether reordering them might improve performance. So, grab your headphones (if you have any) and join me as we explore this intriguing aspect of SQL Server’s query optimization!

Full Transcript

Erik Darling here with Darling Data, and you might be wondering why I’m wearing a headset. Well, it’s a funny story. It turns out when you spend like 800 bucks on a fancy wireless microphone setup, this thing is fine, but the microphone that plugged in here, they don’t send you a backup of one of those. And it turns out that the little wire thing that you usually saw on my shirt about right here is very fragile. And if you try to adjust it, it’ll just snap. And so I’m waiting for my replacement ones to show up. Unfortunately, my scheduled YouTube videos are going to run out before they arrive tomorrow. So we’re going to do a few of these with the old headset, and I’m going to look like a goofy air traffic controller. While I talk to you about SQL Server. So I hope you I hope you can survive these trying times, my friends, where Erik Darling is wearing a headset, because I’m having a tough time with it. Honestly, I feel like I look stupid in this thing. Anyway, let’s talk in this video about stubborn not exists ordering. And by stubborn not exists ordering, well, of course, what I mean is the order that like SQL Server’s optimizer is famous.

For being able to take a query and do all sorts of stuff with it, play all sorts of tricks, like do fun things to like reorder joins. But one thing that it doesn’t do. At least that I can I can’t get it to ever do it is reorder the order of not exists predicates, or not exist sub queries, whatever you want to call them in a query. And I think it’s it’s kind of amusing the way that it doesn’t do this. If there’s a message here, it’s that if you’re going to write a query where there are multiple not exist checks, you may want to spend a little time figuring out if the order of them improves query performance at all.

I’m gonna I’m gonna I’m gonna give you a little spoiler here for our query, it doesn’t make much of a difference. It’s gonna run in two to three seconds anyway, because I did I did many things right along the way. But for you out there and real real user space land, it might it might actually make a difference. So before we do that, haha, we need to shh feel ourselves, sh feel all over ourselves. If you appreciate this SQL Server content, even when I’m wearing a headset, which is I know is hard.

You can sign up to become a member of the channel. And you can for as few as $4 a month, support a starving SQL Server consultant. If you if you are also starving for some reason, maybe AI took your job on already, I don’t know. You can do all sorts of fun stuff to help this channel thrive and survive. And we won’t put food in my stomach, but it’ll put a smile on my face. You can like you can comment, you can subscribe. And if you want to ask questions privately that I will answer publicly during my office hours episodes, you can do that at this link. All of the things that I talked about are very conveniently linked for you down in the video description. You don’t even have to think much about it. You can just click randomly on things until something works. Just just like with SQL Server. It’s a good time. If you need consulting from a guy in a headset.

I am available as a SQL Server consultant, not just not just a pretty face on YouTube, I could be a pretty face on a zoom call for you to health checks performance analysis hands on tuning of whatever you need tuned. Fixing your SQL Server is a SQL Server is a SQL Server. Fixing your SQL Server performance emergencies and of course training your developers so that you don’t have those emergencies anymore.

It’ll be a nice good time for you. I promise everything will go your way. You will you will you will be endowed with the the luck of the luckiest civilization out there. If you would like to get some training from me, I have 24 hours of performance tuning training.

You can get it all for about 150 US dollar rules. No tariffs on that. I promise we’re staying tariff free here in the data darling world. We go to that link you put in that discount code and you’ll get the everything bundle for $150.

It’s nice. I also have a new T-SQL course that is currently publishing as we speak. I have finished the read query portion and the modification query portion. Next up will be isolation levels and then programmability.

So it is half done. At least the beginner portion of it is the advanced portion will be after come out after the summer. The pre-sale price until all the advanced material is out is $250. If you will go up to $500 once the course content is fully fully released.

Just so you know, if you are attending past data community summit in Seattle this November, and you are attending the pre-cons that Kendra Little and I are doing on T-SQL. This is companion material to that content.

So if you attend the pre-cons, you will get access to this content as part of your admission to those. Speaking of speaking. Ho-wee.

Boy, am I my arms tired. Pass is going on tour. It’s the Red Gate Roadshow and they have cordially invited me to go to all of these things. New York City, August 18th and 19th.

Dallas, September 15th and 16th. And Utrecht, which is a hamlet near Amsterdam, October 1st and 2nd. And of course, this is all leading up to past data community summit taking place in lovely Seattle, Washington, November 17th to 21st, where Microsoft has indefinitely canceled their build conferences and has relocated them to sunny Las Vegas.

So, I don’t know, maybe, maybe pass a move to Vegas too. Who knows? Anyway, with that out of the way, what I want to show you is like, like, A, I’m going to prove my point.

But B, I’m going to show you how I prove points like this to myself. It’s often quite a process. It’s quite an endeavor, quite a chore to do these things.

But they help me. They help me understand things and they help me see things how they really are. So, what I did in order to sort of prove this out was I used my best friend, Dynamic SQL, and my other best friend, XQuery.

And what I did was I built up Dynamic SQL in a way that I could grab the query plan for the query that ran, have that query generate not exist clauses in random orders, and then give, like, show me the query that ran in the order that the clustered index things happened in on which tables, so that I could match up the order of the not exist subquery clause predicates, and the order that things happened in the query plan.

It was all very tedious. But when I got it right, it was very exciting. So, I’m going to walk you through the Dynamic SQL portion, and then I’m going to run this a few times, and then I’m going to show you the results.

All right? Cool. I hope you enjoy it. So, we have some variables up here, some local variables. These are not formal variables that we’re going to use to hold various things. This will hold the Dynamic SQL we’re going to execute.

This is going to hold the parameters for the Dynamic SQL. This is going to hold an output parameter for the Dynamic SQL. Why Zoomit just dissed me like that live on YouTube? I don’t know.

These are the parameters that we are going to pass into the Dynamic SQL. This is going to, again, this is going to be the output thing for this, and this is going to be the output thing for this down here.

Maybe I could have organized those slightly better. And we’re going to use this replacement thing here. This replacement thing is going to come in handy in a moment, which we’ll get to it. Just remember, there is a local variable called replacement.

This is going to be our general Dynamic SQL setup. We are going to set this every time, and we have this sort of, we have this thing over here that says replacement.

This is going to act as our token. This is the thing in this batch that we are going to replace with our varying not exists predicates. We also have this lovely piece of SQL in here, and this lovely piece of SQL is going to get the execution plan for the query that’s for the session that’s running here, which works.

I assure you, this is correct. As crazy as it looks and sounds, this is correct. So what we do is we set that, remember that replacement variable I told you that was very, very important.

We use this replacement variable and we set this, and we’re going to use this brand new, brand spanking new, fresh off the factory lot string ag function that came out in SQL Server 2017. And we’re going to use this instead of more XML.

There’s this XML down below. So don’t worry, all of your fetishes will be satisfied. And we are going to aggregate this column with an empty space. And we are going to say within group order by V.O.

You’re probably asking yourselves, what are V.C and V.O? Those are fantastic thoughts to have. And they are wonderful questions to have answered.

So this values clause generates two columns. One of them is this nchar10 with a not exist. And you’ll notice that each one of these not exists has a different table in it.

We have badges, comments, posts, and votes. And then we have this other thing that it generates for new ID. New ID is how we get the random ordering.

So this values clause, which is aliased as V, has the C column, which is where we’re holding the nchar10 and not exists. And then the O is where we’re holding the new ID.

So when we run this all together, we’re going to concatenate all of these not exist clauses ordered by the new IDs that get generated in here. And we are going to get randomly ordered not exists subquery clause predicates.

Okay, cool. After that, we are going to replace in the in the dynamic SQL portion, the replacement token with that string that we just generated and assigned to the replacement variable.

We’re going to execute our query and we are going to output the C and the query plan. All right. The C up here is the C is just, of course, the result of the count big.

We don’t actually do anything with this. We just throw it away. But the query plan, we do something very exciting with. You’re ready for some XML.

You look like you’re ready for some XML. So I’m going to give you some XML here, some X query for you. We’re going to print the query that runs. All right.

And then we’re going to delete some XML. Have you ever seen this before? Have you ever seen a delete from XML? We’re going to use this thing. And we’re going to.

So back story on why this is necessary. So we’re going to do a query. We’re going to do a query. When the query up here runs, there are two query plans that get generated. There’s a query plan for this query, which has the not exist stuff in it. And then there’s a query plan for getting the query plan.

Couldn’t make this stuff up. Right? Could not make these things up. So what we need to do is we need to preserve the first query plan and delete the last query plan. And apparently this works to do that.

Isn’t that insane? The second thing we need to do with XML is shred our query plan. Remember that little, that, that output thing called query plan that we, that we assigned the query plan to? Well, we’ve got something interesting to do here.

We have to select from a variable, the XML nodes for all of the clustered index scans. What we’re going to pull out of that is the node ID, which is going to tell us the order that things happened in the query. The table name that, that the thing that, so we have the node ID, which is the order that things happened in.

We have the table name, which is the thing that got clustered index scanned. And then I’m also going to select the query plan here so that we can validate our, our results. Okay.

All good. This is great. This is wonderful. I am very happy with all of this. We’re not going to do anything with this. This was me just making sure that I was, I was right about things. Uh, I ran this quite a bit to see if I could find any like real big outliers that would be like, Hey, look, this is a big performance thing, but there, there really weren’t any with, with, with, uh, with these.

Anyway, uh, what we need to do now is come back up to the top and we’re going to run set no count on once so that we connect to, and look at, look how nice this is. Look how nice this new SQL Server management studio 21 connect dialogue. We have this futuristic thing.

We can tell, we can even tell it what database we want to connect into, and we can do that. No one look at my password. Of course, this is highly, highly confidential information. Don’t look at the password, but the, we can tell it exactly which database to go to. So we, I don’t actually accidentally connect to master and hit an error the first time.

Fantastic. Thanks, Aaron. Anyway, we are all connected there. And now let’s run this once and let’s just see what happens. Uh, my, my, my VM may have restarted last night.

So this might, this first run might take a couple of seconds because we might have to read some stuff, uh, from, from disk into memory. Remember kids reading stuff from disk into memory is terribly slow. Keep your data in memory and your queries will always be faster.

So, uh, what we have now that we are finished, now that we’re down here, right? We want, we don’t know what, we don’t want to hang around up there. We want to hang out down here because our query, this is what the, this is the query that ran and gave us some stuff.

So we have the node ID, which, which is going to, and I’m going to prove all this out to you. We have the node ID. This is the order that tables were accessed in.

We have the table name and here’s the query plan. Over in the messages tab, we have the query that ran. So all of these queries start with the users table, right? So in all of these users ends up being first.

Then we have our not exist predicate clause query plan predicate subqueries. And we see the order that these happened in here. So we have badges, comments, votes, posts.

All right. B, C, V, P badges, comments, votes, posts. If we come back over to our results, we have users first, because that, that was, that was the from clause. And then we have badges, comments, votes, posts.

Okay. Well, do we know if these, how do we know these node IDs aren’t lying to us? Well, let’s look at the query plan. We have users.

Ta-da. We have badges. Ta-da. We have comments. Ta-da. We have votes. Ta-da. And we have posts.

Again, ta-da. If we hover over any of these, we’ll see the node ID down here in this little tooltip. So this is node ID 14. This is node ID number 12.

This is node ID number 10. And this is node ID number eight. Ah, we got it.

And this is node ID number six. All right. So I think, like, now that we’ve kind of proven that the setup, this is a valid test. Well, here’s our six, eight, 10, 12, 14, right?

Just like we saw up there. Let’s run this a few times and see stuff in some different orders. So now we got a completely different order on this one. Now we have posts, badges, votes, and comments.

And now these are no longer even numbers. Now these are odd numbers. We have seven, nine, 11, and 13. So remember, users, posts, badges, votes, comments. If you look at over here, here’s, oh, go away thing.

Here we have users, posts, badges, votes, comments. So the order matches again. If we keep running this, and I’ve done this hundreds and hundreds of times, like the number of loops I’ve written around this to do this over and over again is absurd.

And if we, every time we validate this output, here’s users, votes, posts, badges, comments. So VPBC over here. What do we have in the messages tab?

V, P, B, C. Every single time. So coming back to the main point, when you are writing queries with not exists, SQL Server, SQL Server’s optimizer will do, let’s just, I won’t say nothing.

Cause who knows, maybe like it almost does something, but then changes its mind and is just like, nah, I don’t think so. Hmm.

It’s not for me. SQL Server’s optimizer does not really make any attempt to reorder not exists predicates. So when you are writing queries that have not exists in them, some tables might be cheaper to access and do a not exists from, and you might be able to narrow rows down than other tables.

So always be very, very careful and cautious. The order that you write your not exists is in because you might be able to get your queries to run much faster just by reordering not exists is because SQL Server will be able to do it for you.

Anyway, that’s all I have for this one. Thank you for watching. I hope you enjoyed yourselves. I hope you learned something. And I will see you in the next video where I will still be wearing this atrocious headset until my little clippy thing shows up.

It’s not an actual clip, like a tie clip or like a clippy, like the paper clip guy. It’s just a little like, like clippy microphone thing, which I don’t know. I might start wearing for formal occasions as well with my Adidas tuxedo, because that’s how I roll.

All right. Anyway, I guess that’s good for this one. Again, thank you for watching. I hope you enjoyed yourselves. I hope you learned something and I will see you over in the next video. All right.

It’s magic. Ta-da! Ta-da!

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.