A Little About Filtering On Dates In SQL Server
Thanks for watching!
Video Summary
In this video, I delve into some quirky issues related to working with dates in SQL Server that have popped up during recent client engagements. I start by discussing a peculiar situation where a client was applying the `AT TIME ZONE` function incorrectly on a column instead of using it properly within a parameter or variable. This led to significant performance degradation, as demonstrated through query execution plans and performance tests. I then move on to another interesting scenario involving date-time columns, where over-communicating with SQL Server’s optimizer by filtering for non-null values in conjunction with a date parameter resulted in suboptimal query plans. By stripping away unnecessary predicates and using the correct data types, we were able to achieve much more efficient execution plans that significantly reduced read operations and improved overall performance.
Full Transcript
Erik Darling here with, well, you probably, you may have guessed by now, with Darling Data. And in this video, I’m going to talk about a couple kind of funny, quirky things that I’ve seen happen with clients lately. And, you know, clients, you know, the nice people who pay me money. If you’re interested in hiring me as a consultant, you can do that from my website. You can hire me. You can say, Erik Darling, please do your shamanic dances. Shamanic? Shamanic? Shamanic? Please come make our SQL Server faster, which I will do in exchange for money. Anyway, a couple funny type things that I’ve seen happen at clients recently had to do with working with dates. And, of course, I hear especially for the current generation of kids, dating is weird. Apparently, dating is also weird in databases. All sorts of strange things can happen. And we’re going to look at a couple strange things that I’ve seen recently. So, let’s do that. Alright. So, I’ve got an index, as I usually do, as I am want to have, and want to come packing an index on the creation date column. And what I found the client doing, was something backwards. Now, newer versions of SQL Server, and I say newer, but it’s been around for like 8,000 years at this point, have this fun at time zone thing. And what these crazy kids were doing is exactly what you see here, where they were using the at time zone function on a column, rather than using the at time zone function on a column, rather than using the at time zone function on a column.
So, we’re going to run a parameter or variable. In this case, it was a parameter. But for the purposes of this demo, I’m going to use a local variable because it’s a little bit more convenient for me. But I’m going to throw a recompile hint on these so that the local variable effect is taken out of the picture. Alright. So, if we run these two queries, we will see a rather profound difference in performance between them. Alright.
Alright. And running, and running, and running, and running, and running. And in the query where we call at time zone, on the comments table, on the creation date column in the comments table, we end up taking about almost six seconds to scan the index, apply the at time zone function, and compare it to the variable here. Right. Which is not good from a performance point of view.
We have an index. We came prepared. We came fully indexed to support this query, but we still had to scan that whole column. And whatever at time zone does behind the scenes is filthy foul. Because, like, you know, a dot eight query taking, like, almost six seconds to do this stuff? Bonkers.
Right? So, please, if you’re going to be doing time zone stuff in your queries, do not apply at time zone to a column in your join or where clause. Because it’ll be bad for you.
And the query where we apply at time zone to the, in this case, the local variable that we’re using in conjunction with a recompile hint to make the local variable not awful. This thing finishes in 80 milliseconds. Right?
And we efficiently use our index. So, we did a good here. Right? We did a just fine here. And, of course, you know, the results match. We get the same number of rows back from both of these.
We didn’t screw anything up too hard. And, again, you know, like I keep saying, in this case, anything you do that makes your job easier, like this, makes SQL Server’s job harder, like this. Right?
No good. No fun here. And then, you know, so there was that. And then another funny thing that I saw was, you know, like when you work with date and time information in SQL Server, often it can be challenging to get filtering right. You know, and there are, there are a lot of, there are a lot of circumstances when writing queries, tuning queries, making sure that they go as fast as they can.
We’re, you know, over communicating to SQL Server’s query optimizer can be a good thing. Right? Making sure that it has all of the relevant information available to it in order for it to make the best possible choices and come up with the best possible plan.
It’s a noble effort. Right? Right.
But what I ran into recently, and, you know, again, stepping back a little bit, was, you know, when you’re, when you’re working with date time columns, and the date time, the time portion is populated, but you don’t necessarily care about it too much. Like, you know, you’re usually looking for like a whole day, or a whole like month, or something like that. The time is often irrelevant.
But a lot of people will do, at least, so, there really, there are two groups of people. There are people who will, like, cast a date time to a VARCAR 10, which will get them, you know, the year, month, day, you know, four digits, and two digits, and two digits with a little doodad between them. It’ll get you a 10 character, 10 byte string with, you know, reasonably formatted date.
Those are some of the worst people you’ll ever meet. Another thing that some people will do, and this is the slightly better group of people, is they will just use a date parameter or variable or something, because that chops off the time portion, which you don’t, might not care about for every single predicate. So what I ran into was a query that looked a little bit like this, where there was an index, there was a perfectly good index on a date column, a date time column.
And people were using the data type to cut the time off a search query. And they were also over communicating by saying, well, we also just want to get where the data is not null. We just, we don’t care about nulls.
Which, you know, again, not always the worst idea in the world to do this. There are definitely some times, especially with joins, where I found this very useful. But, in this case, something strange was happening.
I’m going to show you what that something strange is via the query plan. So, I’m going to run just this one. And we’re going to look at the execution plan.
And, you know, this doesn’t run for a terribly long time. This isn’t by, this is by no stretch of the imagination, the worst, most inefficient query you’ll ever see. It was a lot worse with the client that I was working with because it was a much bigger table, right?
The post table in the Stack Overflow database, about 17 or so million rows. So, we can at least work our way through that data with an OK index pretty quickly. The real life example that I was looking at was a couple billion rows.
And things were, things were not good. We were reading, we were reading through a lot more data, which was taking a lot longer because the data didn’t always fit in memory. And things were just kind of icky.
We’re doing a lot of physical reads. So, I’m going to show you what happens in here. Where, what we do is rather than just find all the rows that qualify for where the last edit date is greater than the date parameter. We have this funny empty seek predicate down here.
And that’s looking for the is not null part, right? That’s what this is. We’re seeking to where is not null first. And then we have a residual predicate after we’ve eliminated the nulls to filter to the dates that we care about.
And we end up reading almost, what’s that? 6599963. So, almost 6.6 million rows when we only produce about 170, almost 173,000 rows.
Right? So, that’s not really a good read to row ratio, right? We’re doing a lot more work than we need to.
And with the client example again, because they were physical reads and not just logical reads, that was the real problem that we were having. And so, what I did first was, well, it’s like, you know, we don’t really need this predicate. Well, if you, you know, we just do, if we just filter for this, we’ll find everything we need.
It’s not like SQL Server needs to figure out what’s null or not first. It’s because nulls are never going to eat this, equal this. And, you know, it’s one of those weird things where, you know, developers who come from different backgrounds and, you know, like, you know, purely working with databases, they have a much harder time conceptualizing how to work with nulls in a database.
You’ll see a lot of use of isnull coalesce just unnecessarily, right? Just bad idea genes, throw isnull on everything because they’re used to nulls creating errors, right? There was a recent developer issue with nulls, apparently, over at CrowdStrike.
At least I think that’s what it was. I keep reading differing takes on exactly what happened. Some, everyone sees, a lot of people say it’s a null pointer.
Other people, no, it was this other thing. I don’t know. I’m not good enough at all that stuff to figure out exactly who’s right or wrong. You know, everyone seems to have their own hot take on the stack dumps there. But the first thing that I did was think, well, you know, it kind of doesn’t make sense for our seek predicate to find all the not null stuff and then a residual predicate to have to do all the other stuff.
So I just took the is not null portion out of there, right? And, you know, if we run this now, we go from 600 something milliseconds with a seek plus a residual predicate to about 17 milliseconds at the very end with a seek predicate to just the rows that we care about, right? If we look at this now, our seek predicate is, you know, alone.
And just comparing the last edit date, right? So we just do a regular plain seek to the dates that we care about here. Now, that’s a much better plan generally.
And the date thing in here is, you know, it doesn’t really cause any performance issues. I think this should be the execution plan with the thing that I want to show you in it. But like SQL, like SQL servers optimizer has these cool built in rules for dealing with like temporal stuff that like would usually be like a convert implicit and lead to like the bad kind of scans and with other data types.
So built in to the optimizer, because we have a date time column that has a time portion on it, but our parameter in this case is only a date, we have this get range with mismatched types thing come in. And this is just a thing that the optimizer has built in to say, okay, like I can still compare very easily the date portion of a date time column to a date, right? I don’t need to evaluate the binary representation of the time, right?
I can ignore that. So you have this get range with mismatched type thing, which is very useful in this case, because we can take a date and compare it to a date time and just compare the date portion of it, right? So we have that ability.
It’s pretty nice. There are a couple other optimizer rules that you’ll see around temporal stuff like that. If you go digging in the execution plan XML, you won’t see that looking through like just any of the tooltips or the properties or anything. It’s something that’s only in the XML.
So the other thing that you could do, which, you know, of course, if you wanted to retain your not dealing with the nullness of things, and you still wanted to, you know, it’s like, you pass it, you declare a date time, you know, like down here, I’m going to use date time, you can use this without a time portion affixed to it, and SQL Server will just zero it out. So you get effectively the same thing.
But you could also just, you know, pass in the right data type here. And even if you maintain the, you know, greater than date time thing and the is not null thing, if you use the right data type, these two will flip, right? So in the very first query plan that we looked at, the seek predicate was to is not null, and the residual predicate was to the date range.
But for this one, the seek predicate is now the date range and the residual predicate is is not null. So, you know, you could keep you could keep the is not null if you want and use the right data type and get effectively the same thing. I kind of like this query, you know, even like with or without the is not null, I generally like this query plan a little bit better because there’s a little bit less weirdness to it.
So we just do a plain old index seek here, and we produce our results in about 19 milliseconds. If we go back to this query plan real quick, I don’t know if you remember too much about this, but we had a little bit of extra stuff in the query plan, where SQL Server was producing, had this constant scan compute scale our stuff in it. Again, it’s not the end of the world, right? And these two queries, you know, they finish within a couple milliseconds of each other if you just keep running them over and over again.
Sometimes they’re a little bit different apart, sometimes they’re just exactly the same. So it’s not really a big deal here. It’s just it’s one of those things where in other cases where you see query patterns like this, especially if these constant scans produce a lot of rows, you’ll see this a lot in like query like joins with or clauses. You’ll see query patterns that sort of look like this with like constant scans and merge intervals and distinct sorts and then like a loop join.
And when you’re producing a lot of rows from these, it can be really awful. When you’re just when you’re actually just producing one row from this stuff, it’s not that big a deal. You can just do a seek, but sometimes getting stuck in this nested loops join when these when the operators like this produce a lot of rows that can be hellish for performance. So we want to avoid that. So anyway, that’s about all I had to say here today.
Some quirky stuff when working with dates and SQL Server, right? Of course. And we’re going to go through our usual spiel again, aren’t we? We’re going to we’re going to we’re going to say or I’m going to say not we I have I have nothing else.
I do not contain multitudes. I get I get nothing. There’s just there’s just what you see is what you get. Thank you for watching. Especially you. Hope you enjoyed yourselves. Hope you learned something. If you like SQL Server videos like this, you can you can a like this video by way of a thumbs up or a comment.
The comment does not have to be positive. But I will fight you over excessively negative comments in person. Will physically feels it. I will physically altercation you. If if you like this video again thumbs up comments if you like this sort of content, you can subscribe to the channel and join over 4000 other data darlings out there in the data verse.
This is getting awful. This is getting awful. In subscribing to the channel so you can get notified when I do these things and I make a complete buffoon of myself in front of you.
So yeah, and you can also hire me to do this personally like I’ll give you like a like a private dance of me doing this. It looks wonderful. It’s a whole whole whole thing that people just love right. I’m coming up on 700 clients since since I started this business and I’m pretty pretty psyched on that.
So if you want to be lucky number 700 if you want to join the 700 Club not not the that channel thing, but like, you know, having been someone who worked with me. That’d be cool, too. Anyway, I’m gonna go do actual work now.
Thank you for watching and I will see you in the next video.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.