Mix And Match
There are only a few data types that make me nervous when I see them:
- MAX strings, or approaching the upper byte limit (except for dynamic SQL)
- XML
- sql_variant
It doesn’t matter if they’re stored procedure parameters, or if they’re declared as local variables. When they show up, I expect something bad to happen.
One thing that makes me really nervous about string data specifically, is that many developers don’t pay close attention to varchar vs. nvarchar.
This doesn’t just apply to stored procedures. Many ORMs suffer the same issue with data types not being strongly typed, so you sometimes end up with all varchar(8000) or nvarchar(4000) input parameters, and other times end up with n/varchar strings with lengths inferred at compile-time based on the length of the string passed in. That means that if you have an ORM query that takes, let’s say a name as input, it might the infer the string as unicode when it shouldn’t, and if you were to pass in different names for different executions, you’d get all different plans, too.
- Erik: nvarchar(4)
- Kendra: nvarchar(5)
- Al: nvarchar(2)
- Tom: nvarchar(3)
You get the picture. It’s a nutty nightmare, and it’s made worse if the name column you’re searching is a varchar data type, regardless of length. But those are ORM problems, and we wield mighty stored procedures like sane and rational people.
Let’s play a game called pattern and anti-pattern.
Anti-Pattern: One parameter for searching many columns
The sheer number of times I’ve seen something like this justifies most of the self-medicating I apply to myself.
CREATE OR ALTER PROCEDURE dbo.BadIdea ( @SearchString nvarchar(whatever) ) AS BEGIN SET @SearchString = N'%' + @SearchString + N'%'; SELECT p.* FROM dbo.Posts AS p WHERE p.Id LIKE @SearchString OR p.OwnerUserId LIKE @SearchString OR p.AcceptedAnswerId LIKE @SearchString OR p.CreationDate LIKE @SearchString OR p.LastActivityDate LIKE @SearchString OR p.Tags LIKE @SearchString OR p.Title LIKE @SearchString OR p.Body LIKE @SearchString ORDER BY p.ViewCount DESC; END;
All sorts of bad things happen when you do this. You can’t index for this in any meaningful way, and comparing non-string data types (numbers, dates, etc.) with a double wildcard string means implicit conversion hell.
You don’t want to do this. Ever.
Unless you want to hire me.
Pattern: Properly typed parameters for each column
Rather than get yourself into that mess, create your procedure with a parameter for each column, with the correct data type.
Next, don’t fall into the trap where you do something like (p.OwnerUserId = @OwnerUserId OR @OwnerUserId IS NULL)
, or anything resembling that.
It’s a rotten idea. Instead, watch this video to learn how to write good dynamic SQL to deal with code like this:
Anti-Pattern: Passing Lists In Parameters And…
Splitting the string directly in your join or where clause.
When you do this, SQL Server can’t make good cardinality estimates, because it has no idea what’s in the list. Worse, I usually see generic string split functions that can deal with any data type as input and output.
You might be outputting wonky data types that compare to column(s) of different data type(s), and suffering implicit conversion woes.
A cleaner option all around is to use Table Valued Parameters, backed by User Defined Table Types with the correct data types.
You may still need to follow my advice below, but it’s a bit easier to manage.
Pattern: Passing Lists In Parameters And…
Dumping the parsed output into a #temp table with the right column data types.
When you do that, two nice things happen:
- SQL Server builds a histogram on the values
- You can index it for large lists
I find myself changing to this pattern quite a bit when tuning code for clients. They’re often surprised by what a difference it makes.
Even when using Table Valued Parameters, I find myself doing this. Since Table Valued parameters are read only, you don’t need to worry about the contents changing even if you pass them to other procedures.
Anti-Pattern: Using Unnecessary MAX Types
I’ve blogged before about why you shouldn’t declare parameters or variables as MAX types in the past, but the issue is mainly that they can’t be used to seek into an index.
Because of the internals of MAX parameters and variables, you’ll see a filter after data is acquired in the query plan, which is usually much less efficient than filtering out data when a table or index is initially accessed.
It’s also a bad idea for columns in tables for similar reasons. I understand that there is some necessity for them, but you should avoid them for searches as much as possible, and make them retrieve-only in your queries.
A good example is an Entity Attribute Value table, where developers allow searches on the Value column, which is either sql_variant, or nvarchar(max) so that it can accommodate any contents that need to be stored there.
Pattern: Using Reasonable Data Types
The number of times that I’ve seen MAX types used for anything involved in searches that actually needed to be MAX is relatively small compared to the number of times I’ve checked the max length of data and found it to be around 100 bytes or so.
Making significant changes like that to large tables is sometimes painful. Often, it’s easier to add computed columns in various ways to allow searching and indexes to be easier:
- TRY_CAST or TRY_CONVERT to integers, dates, etc.
- SUBSTRING to an appropriate string type with a reasonable length
- Hashing the contents of the column to make binary searches possible
As long as you don’t persist the computed columns, the table isn’t locked when they’re added. However, you do need to index them to make them useful for searching. That will only be painful if you don’t pay Microsoft enough money.
Plans Within Plans
As you review stored procedure code, keep an eye out for these anti-patterns. Fixing small things can have big downstream effects.
While not every change will yield many seconds or minutes of performance improvements, it helps to follow the rules as well as possible to clarify what the real issues are.
Getting a stored procedure to the point where you’re sure exactly what needs to change to improve performance can be a tough task.
Having a mental (or written) checklist of things that you know to fix makes it faster and easier.
Thanks for reading!
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.
Related Posts
- Indexing SQL Server Queries For Performance: Equality and Inequality Predicates
- How To Write SQL Server Queries Correctly: Literally Everything You Need To Know About OUTPUT
- How To Write SQL Server Queries Correctly: Literally Everything You Need To Know About CTEs
- How To Write SQL Server Queries Correctly: Case Expressions
Next, don’t fall into the trap where you do something like (p.OwnerUserId = @OwnerUserId OR @OwnerUserId IS NULL), or anything resembling that.
It’s a rotten idea.
Why is it a rotten idea? I don’t propose to watch an hour’s YouTube video to find out.
Thank you for enriching my blog with your thoughtful commentary.