Keep’em Comin!
While helping someone tame a bunch of rather unfortunate scalar valued functions, we eventually hit a point where they were able to do some of the rewrites themselves. During testing, they ran into a situation where performance got worse when they made the switch over, and it wasn’t because an Eager Index Spool popped up.
I was able to come up with a demo that shows a reasonable enough performance difference in a couple queries using the same technique as I gave them to fix things.
So uh, here goes that.
Muppet
The query they were tuning had a couple OUTER APPLYs already written into it, and so they added the function on as another.
SELECT u.DisplayName, b.Name, nbi.AvgPostsPerDay FROM dbo.Users AS u OUTER APPLY ( SELECT CreationDate = MIN(p.CreationDate) FROM dbo.Posts AS p JOIN dbo.Votes AS v ON v.PostId = p.Id AND v.VoteTypeId = 1 WHERE p.OwnerUserId = u.Id ) AS p OUTER APPLY ( SELECT TOP (1) b.Name FROM dbo.Badges AS b WHERE b.UserId = u.Id ORDER BY b.Date DESC ) AS b OUTER APPLY dbo.no_bueno_inline(u.Id, p.CreationDate) AS nbi WHERE u.Reputation >= 100000 ORDER BY u.Reputation DESC;
Since they didn’t want to lose rows to the function, they couldn’t use CROSS APPLY. Good enough.
Moutarde
But what they really wanted was to move the function up into the select list, like this:
SELECT u.DisplayName, b.Name, AvgPostsPerDay = ( SELECT nbi.AvgPostsPerDay FROM dbo.no_bueno_inline ( u.Id, p.CreationDate ) AS nbi ) FROM dbo.Users AS u OUTER APPLY ( SELECT CreationDate = MIN(p.CreationDate) FROM dbo.Posts AS p JOIN dbo.Votes AS v ON v.PostId = p.Id AND v.VoteTypeId = 1 WHERE p.OwnerUserId = u.Id ) AS p OUTER APPLY ( SELECT TOP (1) b.Name FROM dbo.Badges AS b WHERE b.UserId = u.Id ORDER BY b.Date DESC ) AS b WHERE u.Reputation >= 100000 ORDER BY u.Reputation DESC;
That way you don’t lose any rows like you could with CROSS APPLY, and the optimizer is more likely to holler at the function later on in the query plan, since the values from it are only being projected — that’s fancy for selected.
Mapperoo
The full query plan is a bit much to untangle quickly in this post, but the timing difference is noticeable enough for my purposes:
So if you ever end up rewriting a scalar valued function as an inline table valued function, make sure you test calling it in the same way. Moving query syntax around may produce logically equivalent results, but won’t always produce equivalent performance.
Thanks for reading!
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.
Just to clarify, The performance improvement was due to moving where the function was called within the query so that it was relocated to operate on fewer input rows, as more filtering had already been applied ?
Yep, you got it.