Quatro Enemigos
This post is really four different posts. Maybe five. The main points are:
- STRING_AGG can’t be inlined if a UDF
- Getting DISTINCT values into STRING_AGG is more obtuse than it should be
- The documentation for UDF inlining is misleading
- XML PATH can be inlined in a UDF
Let’s start at the top, because tops are top.
Strung Out
If you’re like me and you got excited by the induction of STRING_AGG into the T-SQL Lexicon because of all the code odd-balling it would replace, you were likely also promptly disappointed for a few reasons.
- Needing to convert the element to be AGGed to a MAX to avoid errors
- STRING_AGG aggregation result exceeded the limit of 8000 bytes. Use LOB types to avoid result truncation
- The WITHIN GROUP ordering is clunky compared to an outer ORDER BY (but hey, logical query processing…)
- No support for DISTINCT in the function, and an outer DISTINCT tries to make the wrong thing DISTINCT (see above)
And of course, it’s a breaking limitation for UDF inlining.
The UDF does not reference the
STRING_AGG
function
Let’s look at all of that stuff at once, in one function.
CREATE OR ALTER FUNCTION dbo.IsStringAgg (@UserId int) RETURNS nvarchar(max) WITH RETURNS NULL ON NULL INPUT, SCHEMABINDING AS BEGIN RETURN ( SELECT STRING_AGG ( CONVERT ( nvarchar(MAX), b2.Name ), N', ' ) WITHIN GROUP ( ORDER BY b2.Name ) FROM ( SELECT DISTINCT b.Name FROM dbo.Badges AS b WHERE b.UserId = @UserId ) AS b2 ); END; GO
Not exactly a thing of beauty, is it? Let’s hold onto that for one second, though.
XML > JSON
Okay, so let’s get back to that UDF documentation.
The UDF does not reference XML methods
That sort of reads like NOXML4U, which is my friend Jeremiah’s license plate. In reality though, it means all the .method thingies, like value, node, query, etc.
So if you were to try to follow my prior advice on string concatenation, the function couldn’t be inlined.
SELECT x = ( SELECT [text()] = b.Name FROM dbo.Badges AS b WHERE b.Id = 100564 FOR XML PATH(''), TYPE ).value ( './text()[1]', 'nvarchar(max)' );
Having the .value breaks inlining. So there’s that. I believe this is where Reddit users post something like le sigh with some poorly drawn stick figure.
Del The Funktion
We can write the function semi-correctly like so:
CREATE OR ALTER FUNCTION dbo.NotStringAgg (@UserId int) RETURNS nvarchar(max) WITH RETURNS NULL ON NULL INPUT, SCHEMABINDING AS BEGIN RETURN STUFF ( ( SELECT N', ' + b.Name FROM dbo.Badges AS b WHERE b.UserId = @UserId GROUP BY b.Name ORDER BY b.Name FOR XML PATH (N'') ), 1, 2, N'' ); END; GO
Alright, now let’s BRING THAT CODE BACK (air horn)
Sea Vessel Soliloquy
If we compare the execution plans for these two functions, the XML one gets inlined, and the STRING_AGG one does not.
SELECT TOP (5000) u.DisplayName, ThanksForAllYourHardWork = dbo.NotStringAgg(u.Id) FROM dbo.Users AS u ORDER BY u.Reputation DESC; GO SELECT TOP (5000) u.DisplayName, ThanksForAllYourHardWork = dbo.IsStringAgg(u.Id) FROM dbo.Users AS u ORDER BY u.Reputation DESC; GO
Here’s the plan for the inlined function:
Here’s the plan for the non-inlined function:
Stay Thirsty
The inlined function finishes about twice as fast, though one may pause for a moment to consider whether the 400ms difference is an earth shattering kaboom in this case.
Of course, the real kicker is when scalar UDFs are invoked as part of larger queries where parallelism, etc. is important.
Thanks for reading!
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.