How To Write SQL Server Queries Correctly: Where Clauses With OR In Them
Thanks for watching!
Video Summary
In this video, I delve into optimizing queries that use OR clauses across columns, particularly in scenarios involving outer joins and complex WHERE conditions. I explore how these types of queries can sometimes struggle to leverage indexes effectively, leading to suboptimal performance. By rewriting the query to break it down into more manageable parts, we can better guide SQL Server to utilize the appropriate indexes for seeking rather than scanning, thus improving overall query efficiency. Through practical examples and detailed explanations, I demonstrate how strategic indexing and query restructuring can significantly enhance performance in these challenging cases.
Full Transcript
Erik Darling here with Darling Data. And we’re going to talk about something quite interesting today. Now, we’re going to, of course, carrying on, soldiering on with our series on how to write queries correctly. And I did promise some tidbits, morsels, tiny, sprinkled little nuggets of Intel timing out looking for drivers. God, that’s depressing. It just does that randomly. I don’t understand. I wish Intel would just get out of computers. It was a good run. Intel and Boeing just get together and go do something else. Enough lives have been lost. So I did promise some stuff about performance and we’re going to cover that today because this is not precisely about how to write queries correctly. This is about indexing for a specific type of query and how you can rewrite that query in a slightly different way to maybe better understand why your indexes are not being used perhaps the way you think they should be. So we’re, of course, going to be talking about where clauses with or operators in them. Shocking, I know. Who would ever be indecisive about what data they’re returning in their query? It’s hard to know. The level of indecision, I have.
I see in many SQL Server queries. Why don’t you just say select star from table. The number of OR conditions that you have applied to this query makes me wary that you might just be doing a secret select star. Anyway. So let’s go on with that. But before we get into the meat, the beef of this subject, if you would like to become a member and support this channel with $4, as little as $4 a month, you are, of course, able to do that by clicking on the little become a member link in the video description. If $4 a month would mean that you would starve cold and alone in the streets, like these poor UK pensioners who have had their heating allowance pulled from their hands, you can like, you can comment, you can subscribe.
Those are all wonderful ways to keep me warm all winter long. If you need help with your SQL Server, if you’re out there in the world saying, God, I wish I knew why this SQL Server was so slow, or I wish someone would come and make this SQL Server faster for me. Well, I do all of those things. I can tell you why it’s slow, I can make it faster, and all of that at a reasonable rate. Ha! It’s wonderful.
If you would like some very high quality, incredibly high quality, probably the highest quality training you can possibly get at a very low price, probably the lowest price you can possibly get without pirating it, You can get all of my training for about $150 USD for the remainder of your years on this planet Earth. If you go to that link and put in that coupon code, there is also a link that assembles all of these details for you in the video description.
It’s your lucky day. No upcoming events, because end of the year, yada yada yada, 2025, we will reconsider our stance on no new events. For now, I just wish to be warm and snuggly with a bottle of coat roti and not going anywhere in a winter coat to talk about SQL Server.
With that out of the way, let’s fade to black and let’s talk about how we can make OR clause queries better. Now, the OR operator is, of course, perfectly valid to use. There are some times, and I hate to see it more than others, joins particularly, left joins especially.
It’s almost like you should have just done a full join, probably, for a lot of those WHERE clauses. But when you use an IN clause, there’s a perfectly reasonable chance that… Sorry, when you use an OR clause, there is a perfectly reasonable chance that the operator will convert it to a series of SEEK predicates like IN1, 2, 3, or it might take IN1, 2, 3, and make it OR 1, OR 2, OR 3, and you don’t have to do anything about it.
Optimizers are just kind of funny like that. They try to optimize things, and sometimes they do better than others. The problem isn’t sort of generally when asking for IN or OR on a single column, right?
If you are the type of, like, dim person who likes to say where is null column something is null or is not null or whatever… or coalesce or whatever thing you do, even… I even see people do it with null if…
like, null if empty string is null. But if you say, like, where column equals empty string or column is null or anything like that, usually you’re better off just typing more, right?
So if you say, like, where column 1 equals something or column 1 is null, it’s okay. As long as you have an index that supports, you know, that, like, supports what you’re searching for, an OR expression on a single column is not going to ruin your day.
Where OR clauses tend to get tougher from a performance perspective is when they span columns, especially across joins, especially outer joins, because outer joins, you know, they do preserve nulls. So you end up with a weird sort of set of things to think about as far as, like, query performance goes there.
So if you were to do something like this, you might have a tougher time getting it to perform as well as… if that were just where t1.thing equals something or t1.thing is null. You know, at any point, though, sufficiently complex filtering will…
like I’ve seen so many times, and something that we’re going to cover in a couple few videos, is when you… someone decides to take a where clause or a join clause and turn it into a case expression, thinking that it will simplify things.
So it’s like 1 equals case when some conditions equals 1, or some other conditions equals 1, or some other conditions equals 2. And that’s when… that’s really painful overkill.
Just a plain old select… case expression, a select list, doesn’t cause nearly as many headaches, right? I mean, some types of, like, exists in subqueries… are exceptions to that, but…
for the most part, a select list… with a… sorry, a case expression, a select list, it’s not going to be a problem. The main issue, of course, that we’re talking about today is performance, and sometimes figuring out a reasonable index or set of indexes.
Sometimes… you know, it’s crazy to think that sometimes you might have one query that needs more than one index. You might need some number of indexes in order to make your query work reliably fast.
If you work… if you’re the… if you’ve ever done work with, like, interval-type queries that have a start date and a begin date, and you need to either find overlapping or non-overlapping or consecutive or non-consecutive things, you’re probably very, very used to the idea that your start date and your begin date are each essentially going to need an index that leads with them so that your searches can get to things in there as quickly as possible.
Now, sometimes it’s not just the indexes. Sometimes your queries, when you do write these types of queries, will need index hints both to use the correct index and to perform the correct seek or scan on that index.
Some people get really scared of that stuff, but, you know, the optimizer just sometimes will not make the right choice on its own. It refuses to.
Even though you know better and you run a query, and it’s better and faster when you add those hints in, some people will just say, Oh, well, you can’t add hints. It’s too scary for me. And so you end up with a query that is just perpetually slow.
So, you know, if you are… You know, if you have perfect indexes and perfect queries and everything’s working great, I’m very happy for you, of course.
Couldn’t be more thrilled. You’ve reached a level of nirvana, of spiritual perfection and enlightenment that I can only hope to achieve one day. But until then, I’m just probably going to try to keep getting paid to fix other people’s lack of spiritual enlightenment and nirvana.
Hopefully bring them a bliss that they’ve never known before. Get a back rub too.
So I already have this index created on the post table. It’s on owner user ID, last editor ID, and score. And this is the query that I am trying to get to support.
Sorry, this is the query that this index is there to support. So I’m going to show you something. And it’s going to be something that maybe annoys you.
I know it annoys me sometimes, and I see it, is even though this index leads with owner user ID, and even though we do have owner user ID as an equality predicate in our query, we don’t seek into this index, do we?
That is not what happens. We scan the index. All right, and our scan predicate is looking for a score greater than zero, and owner user ID equals 22656, or last editor user ID equals 22656.
Well, you’d think that we could at least be able to seek to one of those, but when you try to tell SQL Server to seek into this index, it says no.
The query processor cannot come up with a query plan that will help this thing. Why is that? Very strange. Well, it’s not because the index is created wrong. If we did create the index with the score column first, we wouldn’t be able to seek to that.
The thing is, with this predicate, most everything in the, like, rather, not most everything, but really a significant chunk of the post table does have a score greater than zero.
So we wouldn’t really get much out of that. It wouldn’t help us as much as you think it would. If we run this, where we’re looking for a score greater than 1,000, of course, things with a score greater than 1,000 are far less common in the user’s table, sorry, in the post table.
So this would result in a lot fewer rows, but we still can’t seek to what we care about. Like, it’s still just not an option to us with this index. So, sort of paraphrasing what’s in here, because, you know, I do need to finish these videos at some point.
It’s really tough to get a seek with an or clause sometimes. If we put the score column first, depending on what the score column is searching for, it might get better because, it might be okay, because we’d at least be able to seek to a small number of scores, but we still wouldn’t get the good results that we want to out of this query.
Because essentially, we have two separate predicates, right? Where owner user ID equals 22656 and score is greater than zero, or last editor user ID equals 22656 and score is greater than zero.
Like, we have this sort of double Dutch, double date weird thing that we have to figure out. If we express the query, and this is where I was talking about, where if you rewrite the query in a slightly different way, it’s a bit easier to see what SQL Server can seek to and what it can’t, given the index that we have.
So, let’s run the query like this. And I’m just doing the score greater than a thousand. It’s a little bit quicker this way. But essentially, we’re taking the query that we had written in one compact bit, and we are writing it in two separate bits, and we’re joining, we’re unioning those results together and selecting from the results of both of these.
So, we have one up here for owner user ID and one down here for last editor user ID. When we write the query like this, what we need to do becomes a lot more obvious. Even, it’s so obvious, even SQL Server can figure it out.
Except what SQL Server tries to tell us we need is an index on three key columns that covers the entire post table, including the body column, which is an envarkar max, including several other string columns in there.
If you’re the type of person who, you know, I don’t know, maybe you’re a consultant even, who’s like, oh, looked at the missing index request, going to create those, please don’t.
It’s stupid. They’re not good. They’re not good ideas. Even if you, like, consolidate them a little bit, you have no idea what query they’re helping.
You have no idea what the actual, like, the helpfulness of the indexes will be for the workload, for specific queries. It’s really just working blind, and you shouldn’t be doing that.
You should know better. Because people are paying you. They’re not paying you not to know better. They’re paying you to know better than they do.
If you add missing index requests just from the DMVs, looking at uses, or impact, or anything like that, you’re screwing up big time. Don’t do it. So what we can do, because we are smarter than SQL Server’s missing index requests, is we can create a couple indexes that will help our query.
Now, we’ve seen the query plan for the one before, right? We were able to seek into one part of it, and we had to scan the other. We had to seek to, we could seek to owner user ID and score, but we couldn’t seek to last editor user ID and score, because last editor user ID is the second key column in the index.
So we don’t have that in a helpful order unless we hit owner user ID first. The thing is, we’re not scanning to where owner user ID equals something and last editor user ID equals something.
That or condition means we could go here or we could go here and we need score, right? So we can only seek to one of those arrangements based on what the index was.
We can only seek to this arrangement because last editor user ID is the second key column. We cannot seek to this arrangement. Okay, so with these two indexes in place though, if we look at this query plan now, SQL Server will use an index union plan.
We can tell us index union because we have a concatenation here where a SQL Server basically adds this index to another. If this were an index intersection plan, SQL Server would join these two indexes together.
So we basically seek to what we care about in this index, right? Which is score and owner user ID. And we seek to what we care about in this index, which is last editor user ID and score.
And then we bring those two results together and we do a lookup for the rows that we, for the remaining columns that aren’t in those indexes for the rows that we care about. All right, so SQL Server is able to bring those two results together and then combine the clustered index to get us a full result for the query, which looks pretty close to what we would see in here, but just arranged slightly differently.
Let me actually run both of these together so we can see the differences. There’s not, there’s really not a big timing difference on these. If I, if I ran these enough times, well, actually, these are all at zero seconds anyway.
Sometimes they end up at like a couple milliseconds, but for the most part, the choice is either to concatenate the results of both of these seeks via index union, sort them down a little bit.
It’s only 16 rows, so no one really cares, and then do one key lookup for the 16 rows that we have. The alternative is to do one index seek and one key lookup, and then another index seek and another key lookup, of course, the first index seek is just like we’ve seen before.
Oops, that didn’t frame up too well, did it? Owner user ID and score, and then the second index seek is last editor user ID and score, and then we just do slightly smaller lookups for the rest of these, and then we return out the 16 rows that we care about for both of them.
So indexing for these types of queries is a little bit more challenging. Like I said, though, whenever you’re fixated on writing some OR condition into a WHERE clause, we’re going to talk about joins with OR clauses next, but whenever you’re fixated on writing an OR clause into a, writing ORs in your WHERE clause, you might want to consider performance, performance, and the way to consider performance, of course, is to write the query out with each of the specific WHERE clauses expressed the way that you would care about that, and if you do that, what’s going to happen is SQL Server is going to point out where you have good indexes to do things and where you don’t have good indexes to do things, and that can really help guide you to make better indexes to support queries that have this kind of extended, expansive logic in them.
So anyway, that’s about enough of this. In the next video, we’re going to be talking about joins with OR clauses in them, which are a separate nightmare, but we’ll have fun with that, and that’ll also be a bit more performance-focused, but, you know, we’re going to…
we’re going to stick to the sort of script of the series and make sure that we all understand that there is a correct way to write these sorts of queries.
So thank you for watching. I hope you enjoyed yourselves. I hope you learned something, and I will see you in the next video about joins with OR clauses in them, and we will all have a good laugh at the end if we make it to the end.
Really, anything could happen, right? You have about 15, 20 minutes in which, I don’t know, the world could end. The lights could all go out.
It could be a solar storm. Anything. It would be awful, but hopefully not. Anyway, thank you for watching.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.