Walk In Line
Far and away, some of the trickiest situations I run into when helping clients is rewriting scalar functions that have WHILE loops in them.
This sort of procedural code is often difficult, but not impossible, to replace with set-based logic.
Sure, lots of IF/THEN/ELSE stuff can be tough too, though that’s often easier to manage with CASE expressions in stacked CTEs or derived tables.
I ran across a really interesting function recently that I had to rewrite that had a couple WHILE loops in it, and I’ve simplified the example here to show my approach to fixing it.
Table Face
The original intent of the function was to do some string manipulation and return a cleaned version of it.
There were several loops that looked for “illegal” characters, add in formatting characters (like dashes), etc.
The problem the function caused wasn’t it running for a long time (we’ll talk more about that tomorrow), it was that the function was called in really critical code paths that Function Repercussions© were messing with:
- Row by row execution
- Inhibited parallelism
These are not the kinds of functions that are Froid Friendly© either. If they were, I could largely leave them alone. Maybe.
Depends on bugs.
Bad Way
The bad way of doing this is like so. If you write functions like this, feel bad. Let it burn a little.
Ten years ago, I’d understand. These days, there’s a billion blog posts about why this is bad.
CREATE OR ALTER FUNCTION dbo.CountLetters_Bad ( @String varchar(20) ) RETURNS bigint AS BEGIN DECLARE @CountLetters bigint = 0, @Counter int = 0; WHILE LEN(@String) >= @Counter BEGIN IF PATINDEX ( '%[^0-9]%', SUBSTRING ( @String, LEN(@String) - @Counter, 1 ) ) > 0 BEGIN SET @CountLetters += 1; SET @Counter += 1; END; ELSE BEGIN SET @Counter += 1; END; END; RETURN @CountLetters; END; GO SELECT CountLetters = dbo.CountLetters_Bad('1A1A1A1A1A');
Better Way
This is a better way to write this specific function. It doesn’t come with all the baggage that the other function has.
But the thing is, if you just test them with the example calls at the end, you wouldn’t nearly be able to tell the difference.
CREATE OR ALTER FUNCTION dbo.CountLetters ( @String AS varchar(20) ) RETURNS table AS RETURN WITH t AS ( SELECT TOP(LEN(@String)) *, s = SUBSTRING ( @String, n.Number +1, 1 ) FROM dbo.Numbers AS n ) SELECT NumLetters = COUNT_BIG(*) FROM t WHERE PATINDEX('%[^0-9]%', t.s) > 0; GO SELECT cl.* FROM dbo.CountLetters('1A1A1A1A1A') AS cl;
Pop Quiz Tomorrow
This is a problem I run into a lot: developers don’t really test SQL code in ways that are realistic to how it’ll be used.
- Look, this scalar UDF runs fine for a single value
- Look, this view runs fine on its own
- Look, this table variable is great when I pass a test value to it
But this is hardly the methodology you should be using, because:
- You’re gonna stick UDFs all over huge queries
- You’re gonna join that view to 75,000 other views
- You’re gonna let users pass real values to table variables that match lots of data
In tomorrow’s post, I’m gonna show you an example of how to better test code that calls functions, and what to look for.
Thanks for reading!
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.
I’d use a combination from TRANSLATE(), REPLACE() and LEN(). TRANSLATE() is new in SQL 2019 and can be replaced by multiple Calls of REPLACE() in older SQL versions. This way you don’t have to blow up the number of rows, so it would work on longer strings too without running into spills to tempdb because SQL has no way to know how much memory it needs. And it would work with almost any bad_char-definition too (you could even use a second parameter for the bad chars):
USE tempdb
GO
CREATE OR ALTER FUNCTION dbo.f_Count_Bad_Chars
(@input_string VARCHAR(1000))
RETURNS TABLE AS RETURN
WITH conf AS (SELECT ‘1234567890’ AS bad_chars
, CHAR(255) AS dummy_char)
SELECT @input_string AS input_string
, conf.bad_chars
, LEN(@input_string) – LEN(c3.input_with_removed_bad_chars) AS number_bad_chars
FROM conf
CROSS APPLY (SELECT REPLICATE(conf.dummy_char, LEN(conf.bad_chars)) AS dummy_string
) AS c1
CROSS APPLY (SELECT TRANSLATE(@input_string, conf.bad_chars, c1.dummy_string) AS input_with_translated_bad_chars
) AS c2
CROSS APPLY (SELECT REPLACE(c2.input_with_translated_bad_chars, conf.dummy_char, ”) as input_with_removed_bad_chars
) AS c3
;
GO
— test with ~1.5 mio row
DROP TABLE IF EXISTS #bigger_tbl
DROP TABLE IF EXISTS #tbl
SELECT ac.name + CAST(ac.object_id AS VARCHAR(10)) AS name
INTO #bigger_tbl
FROM sys.all_columns AS ac
CROSS JOIN (SELECT TOP 100 * FROM sys.all_columns) AS ac2
— takes about 3 seconds on my server
SELECT f.*
INTO #tbl
FROM #bigger_tbl AS bt
CROSS APPLY dbo.f_Count_Bad_Chars(bt.name) AS f
SELECT *
FROM #tbl AS t