Does The FORMAT Function Still Suck For Performance in SQL Server?
Video Summary
In this video, I delve into the performance implications of using SQL Server’s FORMAT function, particularly when dealing with large datasets. I compare its execution against a simpler alternative—using CONVERT—and highlight significant differences in query plans and runtime. By running two queries side-by-side—one utilizing FORMAT and the other employing CONVERT—I demonstrate that while both yield similar results for smaller row counts, the FORMAT function can significantly slow down operations on larger datasets, making it less ideal for high-throughput environments. I also discuss scenarios where FORMAT offers advantages, such as custom date formatting requirements, but emphasize the importance of considering performance when working with extensive data volumes.
Full Transcript
Erik Darling here with Darling Data. And today’s short video, so short in fact that I’m not going to do the full intro slides because they would actually be longer than what I have to say about what’s in the video. We’re going to ask and answer a rather simple question. Does the format function still suck? And by still suck, I don’t mean like, does it do something bad? No, it does. Does some actually kind of neat things that you can’t do very easily otherwise? But it can still be pretty bad for performance, especially if you need to put a lot of rows through it, which is unfortunate because, you know, I use it in a lot of my procedures to very conveniently format numbers and percentages and other stuff, but I do it over a relatively small number of rows, tens or hundreds or, I mean, maybe at the high end, like a few thousand. And it just doesn’t really cause much of an issue there. So there is, there is a real need to like, like specify that format really does like the penalty for it really is when you start to get up to bigger and bigger row counts. That’s not to say that it’s free for smaller row counts. It would certainly be slower than other alternatives, but it is, you’re just not going to see like big difference.
I think if you’re like doing like a select top five or something, right? Like you need to format something, use the format function five rows is not going to cause a meaningful difference. You know, if you’re using this in an OLTP environment where absolute throughput is crucial, I might, I might give it a, you know, a second consideration to not to get it out of there because you could be, you could be, you know, adding just enough overhead to every single thing to like mash on the system a little harder, but in general, it wouldn’t be the first thing that I go for. So without going through everything else, let’s go over to Management Studio. And I’m going to show you as soon as that highlighting comes the hell down. So, two queries, two queries. They both use, come on, wake up, zoom it. Almost. Are you there? There you are. There’s a pretty lady. I’m going to use my patented method of forcing SQL Server to fully process a result set.
By fully process a result set. I mean, do all the work in here without actually producing any results. By not producing any results, I mean by filtering to where row number equals zero. SQL Server isn’t smart enough to know that no row number can start with zero. So it does all this work and then throws it all away, filters it all out. So I’ve got this query up here that does this. And you can see that I’m using the format function not to do anything so crazy. I am just formatting this date as year, month, day, right? So in other words, basically like the equivalent of saying convert date, get date, which coincidentally is what I’m doing down here. Convert date, get date.
So these would produce equivalent results, right? Converting creation date to a date. I guess if you want to get really picky, I would have to say like convert and barcar 10 or something around this, but it’s just not going to make a big difference for what I’m going to show you next. The second part of this would be running these, but you know, like I said, aiming to keep this video rather short. So I ran these ahead of time. So you didn’t have to sit there and wait on all sorts of spinny things. And you’ll see that the top query plan, which runs through about 24 and a half million rows in total, that takes about nine seconds.
The bottom query plan, which does the exact same thing, takes about three seconds. So let’s stop here and let’s look at where the differences in this query plan really are. I have to remember which way to turn so that this happens correctly.
You’ll notice in the top query, my hand is conveniently cut off. That scan of the clustered index takes about 800 milliseconds. This one here, a little bit longer, right?
But that’s not because of the function. The function isn’t like in the where clause. It’s just in the select list. If you look up there, we have a sort that takes 2.1 seconds. And that sort’s pretty close, right? 2.104 to 2.181.
Those are little CPU timing differences, maybe. Then we have a window aggregate, a lovely batch mode window aggregate. Oh, dear. Well, you sure do love me a window aggregate.
25 milliseconds, 19 milliseconds. You know, it’s tiny little timing differences. But then we get to our friend the compute scalar. This is where things heat up a little bit.
The compute scalar for the format function. Takes 6.1 seconds. 6.1 seconds. The compute scalar for the convert function.
Takes 23 milliseconds. Then we have a little filter over here. This is where we filter out to row number equals zero. That’s why, this is why the row count goes suddenly from 24.5 million rows to zero rows in both of these.
All right. You can see that pretty steep drop off there. It’s a pretty big cliff. Boop.
Boosh. But all the time is spent in the compute scalar. And if you go to the properties of the compute scalar. So if you’re looking at an execution plan and you’re using the format function. And you notice that there’s a compute scalar that takes us a strangely long amount of time.
You might want to just verify that. So one easy way to do that is to go to the defined values. You’ll see the list of expressions that the compute scalar is responsible for computing.
And if we zoom in up here, we will see the function name for this one is format. And we’ll see all the stuff that format is doing in there. And that’s where the 6 seconds goes.
The, sorry, the compute scalar down here. Same deal, except this one is going to show us convert. And we’re going to see the work that the convert does in here. Where did the convert go?
There, there, there she’s hiding. So if you’re using the convert function over a large number of rows, think about like a big ETL processor, data import thing.
I would strongly urge you to use something different. Use convert with a style that gets you the formatting you want rather than using convert. Convert is very convenient and very easy to use.
And it’s actually flexible to, it’s a little bit easier to do some stuff with format than it is to do with convert. But convert has a lot less penalty and overhead to it from a performance perspective. Just as an example, I have one client who needs to present dates with like a four digit year, two digit month, two digit day, but then only with hours and minutes, like no seconds or milliseconds.
And format does that really easily. That’s a very easy specification in format. I haven’t really seen a way to do that with convert.
You can get convert pretty close and then use like, like, use like a substring or like left or right or whatever to cut off the minutes and seconds and milliseconds if you need to from using convert. But there’s no like, just automatic like style to do that. So just be careful out there.
If you are using format with like big data processing, loading, like modifications, stuff like that. You might want to take a second look at your query, your actual execution plans and see if you have any big time spent in compute scale hours because you might be able to make significant improvements by, by using convert instead. Nor did I said convert and not cast.
We do not use cast in the Darling data household. We use convert in the Darling data household. Cast is icky, except try cast because Microsofty reasons. But we’ll talk about that in another video.
Anyway, thank you for watching. I hope you enjoyed yourselves. I hope you learned something. And I will see you in the next video, which will be about something equally as useful. I promise.
All right. Thank you for watching.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.