Date Math Annoyances In SQL Server Queries
Thanks for watching!
Video Summary
In this video, I delve into common pitfalls and best practices when working with dates in SQL Server queries. Erik Darling from Darling Data shares his insights on avoiding implicit data type conversions by using explicit date values instead of relying on zero as shorthand for 1900-01-01. He also highlights the importance of performing date math operations on parameters or expressions rather than columns, which can significantly improve query performance and accuracy. Additionally, Erik discusses the nuances of the `DATE DIFF` function and emphasizes the need for precise assumptions when using it to ensure correct results. By sharing these practical tips, I hope viewers gain a deeper understanding of how to handle dates effectively in SQL Server queries.
Full Transcript
Erik Darling here, Darling Data, doing Darling Data stuff with SQL Server. Apparently I have to get on an airplane today. What day is today? I guess we’ll find out because we’re going to talk about dates. I’m dumb. So, I’ve talked about some similar stuff on this, on the Chan before. I hate myself. But, I’m going to talk about things in a slightly different way here. Mostly because I still see a lot of screw-ups and weirdness and sort of buggy-ness with date math and date function queries. And I just want to kind of talk through a little bit of what goes on with date math and date functions. So, first and foremost is a giant pet peeve of mine, at least in recent years, because for a long time I kind of had no idea what what an issue this sort of thing could cause. But, more recent years and experiences have taught me that relying on implicit conversion of data types can really cause strange incorrect results bugs. And you don’t want to do that. For you or anyone, really. Why would you do that to a person? Be nice. Be kind. So, I see a lot of date math queries where people use zero as shorthand for 1900-01-01. And of course, there’s a lot of data in very simple circumstances. But, in other circumstances, you can run into some big issues with this. It would be a bit too much to gin up a demo to show you what those issues are. Just trust me, do yourself the favor of not being lazy and using this shorthand of zero to replace 1900-01-01. And we should probably make a note here and say, just so there’s no confusion, do this instead. Yeah.
There we go. That looks better now. So, these two queries will do about the same thing. They’re going to give me adding 45,477 days to 1900-01-01, which gives us the exact same value for both of them. We get the same results back here. But, like I said, there are more complicated circumstances where you might hit some weird bugs with relying on this implicit conversion from zero to 1900-01-01-01. So, always, please explicitly do this. Maybe even, like, say, convert date or date time 1900-01-01 to make extra sure. I think I’ve gone through all of my GitHub repo and replaced any place where I may have made that error in the past. But, if you see anything in there, please let me know and yell at me.
So, another thing that I also see quite a bit of is when people need to do things like compare, like, you know, they want to go back a certain number of days or months, and they end up doing some really shoddy date math like this. Now, this won’t cause incorrect results, but, you know, in the presence of a half-decent index, performance is not going to be great with this because you’re performing the date math functionality on the column, and you’re comparing that to just, like, this scalar thing over here.
And your life for the SQL Server query turns out a whole lot better when you do something like this, where you compare, like, the column in the table to the result of date math operating only on whatever parameter or variable you pass in. So, if we just look at the results of these two queries, and I stuck recompile hints on there so we don’t have to deal with the local variable effect with anything. So, you know, the first query where I do the date math on the column, that takes about two seconds.
We have to scan the whole column. No one is happy. On the second one where I do the date math on the, right, there’s a column compared to the date math. This does an index seek and runs for about 100 milliseconds.
So, that’s a much better way of doing things. Where it gets a little tougher is when you need to compare columns that are in different tables, right? So, saying where p.creation date, the date dip between, let’s say this query, well, this query specifically is looking for posts where, like, someone came along, like, a year later and commented on them, right?
So, someone, like, just wandered in drunk and was just like, this is out of date now. Something like that. And, like, if these were just two columns in a single table, you could, of course, what do you call it, create a computed column on them, index it, and be fine.
But, since they’re in different tables, really the only way to do that without, like, you know, creating a new table and storing both sets of data or completely screwing up the design even more of the post table and somehow adding comment dates in there, which blows my mind how little sense that would make. Like, give me a twitch. You know, like, the only realistic thing you could do would be to create an indexed view that would, you know, give you the results of this back all at once.
Now, this, I get it. I’m with you. This is not a very useful indexed view, right?
Like, because we’re just, unless you just need the count, we should probably involve some other things in here, like, probably, like, the ID column from the post table and the comments table, so we could, like, look those things up in the base tables, something like that. But this is just one quick way of showing you that, you know, an indexed view can be a pretty good way of making the non-sargable a little less painful. Another thing where date math gets sort of weird on people is the date diff function.
So what a lot of people are surprised by with the date diff function is that it does not actually look for, like, the duration necessarily. Like, the date diff in years does not look for, like, a 12-month difference. The date diff in month does not look for, like, a 28 or 30 or 29 or 31-day difference.
The date diff in days does indeed look for a one-day difference, but that’s the, but, you know, if, you know, we were to get into a situation with, like, you know, let’s say that we were looking at, like, 12-31 at, like, 11-59. And 2020-01 at, like, 0-0-01. Or even just, like, 0-0-0.
Actually, I don’t even know if that’s going to work. We’re going to find out together. Yeah. So, like, you know, there’s a one-minute difference between these, but SQL Server’s like, no, it’s a full day. And there’s, like, you know, a one-day difference between these, but SQL Server’s like, nope, that’s a full month.
And, you know, again, there’s a one-year difference between these, but SQL Server’s like, no, it’s a full year. So, the date diff function is a little weird in that way. And so, you have to kind of be careful about being precise about what it is you’re looking for.
All right. So, if you look at, like, this query, and you’re trying to figure out, like, all you want to do is get, like, this count that we were talking about. That’s about 20,000 rows.
If you wanted to write that using different logic, right, if you wanted to write that in a different way, you would have to do something like this. Where you would have to add a year minus one and then flatten the year, flatten the creation date column to the year. And it just gets very, very complicated to try and replicate that logic with, like, explicit query syntax.
Like, just doing something as simple as this just doesn’t work, right? If you look at this, the results of this, that’s a whole bunch more rows than we got from either one of those. So, this isn’t, like, you know, depending on what exactly you’re looking to count up and return in these things, you need to be really careful with how you write these queries.
The performance, of course, doesn’t matter, but the results here are different, and they do matter. Now, if we were to look at some of this stuff in here, the results wouldn’t look too weird for the most part. But if you kind of start scrolling down a little bit in the results and you look at sort of really down at the bottom is where the interesting stuff is.
So, where SQL Server is telling you that there’s a one-year difference between things, it’s also telling you that there’s a 23-month difference. So, there’s, like, almost two-year difference there, right? And there’s the 723, and I’m not going to figure out how many 30s go into 723, but it might start to creep up above these things.
So, you know, I realize that these things are documented, and, you know, if you’re keen on reading the documentation, you might think this video is stupid and be like, oh, the video is documented stuff. A lot of people don’t get that involved in this, not until someone tells them that stuff gets weird that they start to worry about stuff getting weird.
So, at the end of the day, when it comes to doing date math, one, please use explicit dates. Do not use numbers like zero to sub in for dates and rely on an implicit conversion for that. Please don’t do date math on columns as much as you can avoid it.
Do date math on, like, whatever parameters and expressions you need to and compare the column to that. If you need to do date math across tables, index views can be a good way of making that a little bit less painful. And if you need to use the date diff function to figure out what span of time exists between things, just be very careful in what your assumption is about exactly how much time has to be between two things in order for date diff to tell you that, like, you know, there’s a month or a year and a day between them.
Because, again, going back to this, you know, there’s a whole bunch of stuff in here where, you know, you know, like, yes, I agree. Go away, GitHub Desktop.
There is, you know, 2020 is a year ahead of 2019. But, like, there’s really a one-day difference between these dates. You know, and, you know, there is a, the first of the year, the first month of the year is a month ahead, is a month different from the last month of the year.
But, again, there’s only a one-day difference. And, you know, I agree that, you know, the first of the year is indeed a day ahead of the last day of the year. But there is only a one-minute difference between these.
So just be careful, like, be very specific about exactly how much time has to exist between two things before it is considered a one, like, a month or a year or a day difference. You might need to use, like, seconds to figure out it.
Like, you might have to do the date diff in seconds and then figure out how many seconds are in the span of time you care about, whether it’s a day or three days or a month or, like, how many days in a month you care about or, you know, like, the number of seconds in a year.
You might need to do a lot of that math to figure out exactly what it is you need to be precise about it because date diff on its own is not very precise. Anyway, thank you for watching.
Hope you enjoyed yourselves. I hope you learned something. I hope you read the documentation in the future so you can call me stupid for explaining parts of things that, how things work or something like that.
Of course, I thank you very, very deeply for watching. This is my last day on Earth. This is the last thing I record. Gosh, that’d be sad.
So I think I better not. Maybe I should record something else real quick after. Anyway, if you like this sort of SQL Server content or you like this video, you can like and subscribe to me and my channel, and you can get notifications when I drop the SQL Server content, and you can leave me likes and comments when I do, and then everyone’s happier, right?
It’s cool stuff. Anyway, thank you for watching.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.