Time Flies
Video Summary
In this video, I delve into an important aspect of SQL Server performance tuning: how date math in WHERE clauses can significantly impact query execution time. Often, developers use parameters or variables to filter data based on dates, but they frequently make a critical mistake by wrapping the column in functions like `DATEADD` or `DATEDIFF`. This approach can lead to poor performance as the dataset grows, even with usable indexes. I demonstrate this issue using a stored procedure that filters data based on a number of days, showing how it performs poorly when wrapped in a function and explaining why rewriting the query to avoid column functions results in much faster execution times. By creating an appropriate index and running queries both before and after making these changes, you can see the dramatic improvement in performance firsthand. This video serves as a reminder that simple adjustments in your SQL logic can have significant impacts on overall database efficiency.
Full Transcript
Aint no good news. So what I want to show you is how where you choose and how you choose to do date math in a where clause matters greatly to performance. Now a lot of times I’ll see people with a stored procedure that accepts a number. And that number will typically be a number of days or months or some other element of time. And they’ll use that to filter data. So if someone wants 30 days of data, they’ll use that to calculate 30 days in some manner. Now the big mistake is that you’re going to make. I always see people make. I always see people make. Is that they write their queries like this. Where they’ll say the date diff between date diff and days or whatever, whatever, whatever, whatever, uh, uh, uh, element of time. Between a column in the table and say today’s day, usually this will be get date. But since the, uh, the stack overflow 2013 database that I’m using only has data up until the end of 2013, I’m substituting to the beginning of the beginning of the beginning of 2014 so that I actually get data back. Otherwise, if I did get date, it would take hundreds of days. It would look crazy. So I have 30 days. So that will get me 30 days worth of data.
But this is the absolute worst possible way to calculate this because you are wrapping a column in a function. And this will, this is not a good way to do this. This is, this is, this is, this will, this will end poorly as data gets bigger and bigger, even with usable indexes, which I’ll show you in a minute. Right now, the only index I have is on the post table on the ID column and includes owner user ID. So let’s run this query and let’s see how long it takes. Okay, we’re done. Now, uh, this took about 2.6 seconds. If we, uh, head on over to the right and we look at the, look at the votes table, we can see that predicate.
DateDiffDay, convert implicit, dateTimeOffset3, stackOverflow, 2000, uh, creationDate, uh, less than, uh, oh, I mean, there’s a lot going on in there, right? A lot going on, less than 30. And of the 2.6 seconds that we spend in the plan, 2.1 of those seconds is spent in the votes table. Now, let’s change the way that we do that date math a little bit. Let’s do it so that the column is not wrapped in that function.
What we’re going to do is we’re going to say where that column is greater than or equal to, and then we’re going to subtract 30 days from, uh, 2014-0101. Now, and again, normal, like, you know, data that’s flowing into a database now, this would just be getDate or something. But, uh, again, old data, gotta, gotta, gotta look, it looks sort of out of date over here.
So, um, let’s run this query and see if we do any better. My goodness, that was much faster, wasn’t it? That took one second flat. And this isn’t because the data was magically in cache either.
If I run this query again, it’ll take just about two and a half seconds again. There’s not going to be a big time difference there. 2.6 seconds. And if I rerun this one, it’s going to be just about a second again.
Hmm. Yes. Yes. Look, we have tuned a query today. Now, we still scan the votes table, but notice how much less time we spend scanning the votes table. We go from about 2.1 seconds to about half a second there.
All right. We still have that predicate. We still have an unindexed predicate here, but we’re not converting that column over into anything. There’s no convert implicit.
There’s no column wrapped in a function. There’s no, you know, row by row evaluation of stuff. We do a much better job of this. This is a much less painful way of expressing that same logic. Now, where this gets even more pronounced is if we have a good index in place.
So I’m going to create this index. And if memory serves, this will take about 20 seconds. So I’m going to sing my favorite song.
No, I’m not going to sing. I’m not going to sing at all. But this is something that I see repeatedly with performance tuning clients where, you know, every single time they’re like, ah, slow query. Ah, we’ve got to do this thing.
I always find stuff like this. I mean, it doesn’t have to be, you know, a date function that’s wrapped up in a column. It can be really any kind of function can screw you up in this magnificent way. So now let’s run, now that we have an index on the votes table on creation date and post ID, let’s look at how this changes our filtering here.
And let’s look at how that changes our filtering here. So let’s go and run the query that took 2.6 seconds before and run it again. And ooh la la.
Well, okay, so we did a little bit better than 2.6 seconds. We’re at 2.541 seconds now. So we may be saved somewhere between 50 and 100 milliseconds. Good job.
Good, good, good query tuning there, pal. Pat on the back. You knocked that one out of the park. Now let’s run our query written in a smart way with that index. Remember, before this took just about one second.
All right. Ah, that got better too, didn’t it? Now we’re at about half a second here, 570 milliseconds. I’m willing to say that’s about half a second.
You can argue with me in the comments if you want. I’ll ignore you. And now let’s look at what happened over here. We have an index seek that takes 0.33 milliseconds. That’s not bad at all.
If we go back and look at what that other query did, written in a bad way, right? This is bad. This is the not good way of doing things. If we look at what that did, even with a usable index in place, we still have to scan the entire votes table and apply that predicate across.
That is not a seek predicate. That is a predicate that gets evaluated, row by row by row by row. So we have to read all of the rows in the votes table to evaluate that predicate.
We cannot seek to anything with that predicate. With this predicate, we are free to seek and we are free to read far fewer rows. We go look what we did there.
That is an index seek. And now this is one of those times where an index seek is better than an index scan. I’m willing to concede that there are times when that is true. But if we look at what the index seek did, we no longer have a predicate.
We have a seek predicate where we were able to figure out everything that met that predicate. But only that. We didn’t have to scan the whole table to figure it out. So when you’re writing store procedures that need to do some sort of date filtering like this, please, please, please, please, please leave the column out of it.
Do date math on parameters, variables, literals. Do date math on literally anything other than the column. There are times when you’ll need to figure out if two columns, right?
Like let’s say instead of this being a column and a literal or a static value like this, let’s say that this was a creation date and let’s say like a last activity date or something. There are times when I understand you’ll have to do that.
And that’s when you would want to use a computed column to figure that out. I’m not going to get into that in this video because that’s a little bit beyond what I’d want to do. And we’re already getting close to 10 minutes.
And apparently they’re coming for me. So I’m going to say goodbye.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 25% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.
Related Posts
- The SQL Server Performance Tasting Menu: Performance Issues With Parallel Query Plans
- The SQL Server Performance Tasting Menu: How DISTINCT Queries Can Hurt Performance
- The SQL Server Performance Tasting Menu: How Multi-Statement Table Valued Functions Hurt Performance
- The SQL Server Performance Tasting Menu: Tuning Modification Queries With #Temp Tables
Loved the ending.