All About SQL Server Stored Procedures: Data Types
Video Summary
In this video, I delve into the intricacies of data types in SQL Server stored procedures, a topic that often causes headaches for many database administrators and developers. I share my experiences from real-world scenarios where improper data type handling has led to performance issues and other complications. By discussing common pitfalls such as mismatched date and datetime types, and exploring more nuanced problems like max string types and ORM data typing, I aim to help you avoid these traps in your own projects. Additionally, I cover how certain data types can lead to unexpected results and the importance of using temp tables for better query performance when dealing with dynamic inputs. Whether you’re a seasoned DBA or just starting out, this video offers valuable insights into optimizing stored procedures and maintaining clean, efficient code.
Full Transcript
Erik Darling here with Darling Data. In this video we’re going to continue our rhapsody on store procedures. And in this one we’re going to talk a bit about how the data, just data types, okay? Data types. That’s what we’re going to talk about in this one. Because I see a lot of people screw a lot of stuff up with these and as the person who comes in well after these screw ups have been in place and occurring for them, for a very long time, who has to clean them up. It’s a terribly unpleasant task. Sometimes it is a little bit easier because sometimes there will just be a problem with whatever parameter was being passed to the stored procedure. Other times there’s a problem with the underlying table data types. And at this point the tables have grown quite large and, you know, sure you have options like creating computed columns that do data type conversions for you, but you still really do need to index those computed columns. So even if you like, you know, get the sort of lock free added non persisted computed column thing to your table that, you know, does your, you know, try cast, try convert, try parse, whatever to your column to a different data type to a more correct data type, you still incur that sort of, you know, locking, well, potentially locking if you’re on, if you’re on standard edition, you’re definitely locking, but in either case size of data index creation operation to create an index that now helps your query use that computed column in a efficient way. So we’re going to talk about some of this stuff today.
Today. And just some of the some of the ways that you can get around the more more basic common problems with incorrectly typed stuff in your stored procedures. Before we do that. Mm hmm. Mm hmm. Mm hmm. We should talk about how cool I am, how you should give me four bucks a month. How you can click a link in the video description to do that. If you think I’m cooler than four bucks a month, you can give me more than four bucks a month. But I think four bucks a month is like my entry level cool purchase. And that all goes towards supporting the channel in general, making sure that Bats here stays full of delicious Pez and that I can pop a Pez and have the energy to carry on creating this content.
If you disagree with Pez or if you just hate Bats, you can do other stuff. You can like, you can comment, you can subscribe. And if you want to ask reasonable questions that I will answer on an Office Hours episode, the link to do that is down in the video description as well. And if you are out there struggling with a ill-performing SQL Server and you need help at a reasonable rate, well, I’ve got nothing but reasonable rates. And you can hire me to do any of this stuff, which is a pretty reasonable set of stuff, again, at a reasonable rate. So we’ve got all this going for us.
Now, if you would like some training content about SQL Server performance tuning, I’ve got beginner, intermediate and expert level stuff. Really some noodle baking activities here. And you can get all of mine for about 150 USD for the rest of your life. Long live you. Fully assembled link for that is, of course, in the video description as well.
If you would like to see, well, I mean, I will be there in person. I’m not presenting anything unless someone gets sick or dies or doesn’t show up. Then I’ll fill in a session. But if you would like to come hang out and have me give you a bag of lunch, you can come to SQL Saturday, New York City 2025, taking place on May the 10th. With a performance tuning pre-con by Andreas Vorta on November. I’m sorry, not November. May the 9th.
May the 9th. I don’t know where November came from. November is not involved with this. So we have that going for us. But with that out of the way, let’s talk a little bit about data types and store procedures. Now, I see a lot of people screw up a lot of stuff.
Mismatching date, date time, date time to stuff is a big one. That’s going to be the first thing that we talk about because there are actually some like weird bugs. And some of them depend on, well, this, the one that I’m going to say, does depend on compatibility level, but whatever.
Of course, mismatching string data types. I’m going to say, well, I guess, bear care and bear care. I just have a hard time with the bear, even though I’m pretty, pretty harsh on the car there.
Character and bear, variable character, whatever. I don’t want to sound Southern. No offense to the Southern folks.
I just don’t want to appropriate your accent in an unfortunate manner. But then, like, you know, you just see absolute boneheaded stuff like this, where, you know, people have two tables. And they’re like, oh, eventually I’m going to join these two tables together on this column and this column.
But for some reason, even though both columns will have commensurate data in them, one of them is like a varchar 255. The other is an integer. And you’re like, how’d that happen?
And lots of shrugs. And they’re like, is that a big deal? And you’re like, huh? Yeah, suppose it is.
Since we’re talking, suppose it is a big deal. Anyway, let’s move on. So this demo comes from a question that got asked a little while back on dba.stackexchange.com. And it’s something about why date time value return that matches the predicate something.
There’s a lot of words in there. Kind of a noodle bash in that one. And then that ended up with the fellow who answered it, Martin Smith, brilliant fellow, creating a feedback item.
If I remember, the links will be in the video description. If I don’t, sorry. But if you run through this demo, it’s a very strange one.
And you might want to read the question and the bug report because they do add some more detail in there. But the general gist of this is that we declare a variable called dt, which is a date time, and we set that equal to something. And then we declare a variable called dt2, which is a date time 2.
And we set that equal to the date time parameter. Then we insert the date time local variable into a table variable with a primary key on it. And when we ask if dt equals dt2, right, coming from the table, or if we say where cast dt2 as date time, which it already is a date time, like here and here equals dt2, then we get some strange results back, right?
And if you go and you read through, right, like that doesn’t make any sense. But what’s really weird is if you go and you read through the feedback item, there is a comment in there by Sir Pablo Blanco where he says if you take the index out entirely, right? So let’s just delete where it says primary key.
Look what happens now. Now neither one returns any rows, right? With the primary key thing in there, we did return a row from one of them. But without the primary key, we lose that.
So I think Microsoft does potentially have some work to do here because that is a strange result. So be really careful with the date and date time stuff. But there are generally a few data types that will always make me nervous when I see them, either as a parameter or a variable or a column data type.
That will be max strings. That goes for any variety of string that is a max data type. Except for when you’re using it for dynamic SQL, then I am a-okay with it.
XML, always nerve-wracking. JSON apparently is a data type in Azure SQL DB, which is really just a synonym for VARCAR max anyway. So, but let’s not say too much.
But like those, like XML has been making me nervous for years. Also annoying me because now I’m like, great, I’m going to have to write more X query. Now I’m going to have to go look at all the X query I’ve ever written to make sure I’m doing it right.
And of course, SQL variant is another scary one to see because you just don’t know what you’re going to get. I just expect something bad to happen whenever I see any of this stuff show up in the words I see in SQL Server Management Studio. Another thing that is very common that gets screwed up is ORM data typing.
I wish I knew what this pattern looked like in Entity Framework. But when people don’t strongly type whatever arguments they’re feeding to their Entity Framework queries, they can end up as like all like VARCAR 8000 or NVARCAR 4000 or maxes.
Or sometimes what will happen is Entity Framework will at runtime infer the nature of the string. Usually it will like default to Unicode because Unicode is like more safe, like you don’t like just like in the in the sense that you’re not going to lose fidelity if you use Unicode.
Like if someone actually has a Unicode character in there and you don’t and you don’t use a Unicode string type or parameter type, you will end up with like question marks or like blank spaces or empty boxes or something. So they all usually use like infer that some string is going to be a Unicode thing.
But then it will infer the length of the parameter based on the length of whatever you pass in. So if you were to if you had the same query running without strongly type parameters and you passed in like a variety of different arguments, you would end up with a variety of different like entries in the plan cache and query store with all different parameter lengths in them.
So like if you said like where some name equals Eric, you’d end up with an EnVARCAR 4, Kendra would be EnVARCAR 5, Tom would be EnVARCAR 3, Al would be EnVARCAR 2. And if you were trying to find Q Lazarus, you would end up with an EnVARCAR 1.
And that like you would just end up with different like I mean, even if it was the same plan, there would be different plan cache and query store entries for the same query with different plans across all of them. So that’s not very fun at all.
Now, another thing that I will see people do quite a bit is when they are trying to have, you know, they’re like you can search anything with this store procedure and it ends up, you know, being something like this where they never know what this is going to be. And in some cases, you know, like just like, you know, like within the context of Stack Overflow, like the body column is EnVARCAR max.
Like even though I think internally like Stack Overflow sets a character limit on this, like 30,000 or 50,000 or something like that, like this is what you get. Like it’s still like declared or defined in the table as EnVARCAR max.
So what I see a lot of people do is just have like, you know, their universal search string and they’ll stick some parentheses around it. And then they’ll use that to search across a variety of columns with a variety of data types.
Some of them, like these three, might be strings. Then some of them might be dates, like the ones that say date in the name. And some of them might be numbers, like the ones that say ID in the name.
And this is just a whole host of problems that you don’t want to deal with. You are, of course, in these cases, much, much better off defining a, you know, having actual search parameters for each one, like not doing this, like just having a parameter for each one of these.
What you want to avoid unless you are okay with like a recompile hint is doing something like this with it, right? Like this has lots of problems in SQL Server.
SQL Server 2025 is sort of maybe supposed to give you some help with this. We’ll see how it goes. I haven’t seen how that all works yet, so I can’t say anything. But if you want to write like, you know, this kind of search procedure in a way that doesn’t suck, I have a whole video presentation.
It’s a talk that I do on defeating parameter sniffing with dynamic SQL, which gives you all sorts of ways to write these queries in a way that will perform well across a variety of execution.
So that’s nice. Another big anti-pattern that I see is people passing in lists of things, right? Usually comma-separated lists.
Now, table-valued parameters can be a good option to get around this, but table-valued parameters, unfortunately, are backed by table variables and have their own sort of weird set of issues that you might see there.
So you might need a temp table even if you’re already using a stored procedure, even if you’re already passing a table-valued parameter to it so that you can get statistics generated on your temp table and get like better performance when you join that thing off to other things.
If you’re not using your table-valued parameter in a way where performance is critical or where, you know, you get okay performance anyway, don’t worry about it. But if you’re ever using table-valued parameters and you’re finding the query performance behavior a little weird and hard to predict, you might want to start dumping the contents of your table-valued parameter into a temp table.
So a big anti-pattern is to do something like this, though, right? Like let’s say you have your search string, and let’s say this is a CSV list of like IDs, right?
And you do something like this, and you say, ah, well, I’m just going to say where p.id is in when I split this out. And this goes for like any kind of like string dissection method.
The most evil person in the world who contributes to the entity framework, Microsoft stuff, just tried to like fix a problem with in-clauses like this with JSON and JSON string splitting.
And I’ve just had like a bunch of my clients have like, you know, been like, oh, EF Core, it’s so great. But then they get to this JSON thing, and it’s like, why do these queries all take three minutes now?
Why do these queries suddenly take 15 minutes to run? Well, it’s because of the JSON parsing. Like SQL Server doesn’t have statistics on what the contents of this string are. Even if there’s one thing in this string, even if there’s one thing, it doesn’t matter.
Right? SQL Server has no idea what’s in it, right? It does not take every value out and do cardinality for each value. It’s a complete black box for it.
So like, don’t do this. This is a bad idea. That’s why this is anti-pattern up here, right? This is not good. A much better idea, like I was saying, is a good alternative. If you’re having problems with table-valued parameters, would be to do something like this, where you create a temp table with the correct data type for whatever data you’re putting into it, right?
Like you can even, you know, index that temp table in a useful way. And you can do things to protect your query from having, like the contents of your temp table from having problems, right?
So like what I end up doing a lot is something like this, where, you know, like, and I’m not like for, like, like obviously like select distinct on like, like big results with lots of columns in them would be like a, like nasty performance hit in some cases.
But on something like this, where I’m just pulling, I’m pulling a list of values out of a string. I’m not worried about distinct for this. This is fine.
But what I want to do is make sure that like, like, you know, I’m respecting the uniqueness of the primary key with distinct and that I am not doing anything that might result in an error. So like, let’s say like there’s something malformed in the string.
And something gets messed up. And like, you end up with like the letter A as a, as one of the CSV values.
This would obviously mess up if you try to insert the letter A into an integer column. So I always do tricast. And I use tricast instead of triconvert here, because triconvert is compatibility level sensitive, where tricast you can use anywhere.
And then I also stick a where clause on there. And now, granted, if I saw, this is another one of those things where like if I saw this in like a real query, like with tricast wrapped around like a column and a table, I’d, you know, obviously have some, we have to talk about this a little bit.
But in this case, just like the distinct, because I’m just pulling a list of that, like a list of values out of a CSV string with the built-in string split function, this isn’t going to be a big performance hit.
Even if you had a lot of stuff in there, this wouldn’t be that big a deal. But once all that data is in that temp table with the correct data type, and with a good useful index on it, stuff like this starts performing way, way better.
So we do have like, this is a much better pattern for the CSV thing. Another really unfortunate anti-pattern is when, so like it’s important.
What’s important here is that like, if the base table column data type is in VARCHAR max, you’re going to see this behavior no matter what.
Because SQL Server runs into the same problem, whether a parameter or variable data type is max, or whether the base column data type is max.
And that problem is that you can’t push those predicates down far enough into the query plan all the time. Even if you have a good index, you can only partially seek to some of the values.
You’re still going to see a late filter operator in the query plan. But if I run both of these, even with a recompile hint on them, where a SQL Server should be able to say, hey, you know what?
I think something’s weird here. I think this in VARCHAR max isn’t necessary because we don’t actually have a max data, like a max being passed into this. The query plans for both of these are just about identical, right, across the board.
They both have these filter operators in them, right? So we pull all the things out of the clustered index, like all the rows out of the clustered index, all 17 million of them. And then we filter them down over here, right?
And like even with the recompile hint, where we see the literal value, SQL Server has to apply that after the fact. And that’s no good at all. So like without the recompile hint, you would see the same thing.
The recompile hint doesn’t help here. What you want to do instead is use the right, like make sure that like, you know, you’re not using a max.
If you could use in VARCHAR 4000 here, it would be fine because it’s not a max. But like what you want to make sure is that you are as much as and often as you can using the exact right data type for both of these.
Now, this isn’t going to have a profound performance difference because we don’t have a good supporting index on the post table to search by title. But notice for this one, we can like at least we don’t have that filter happening afterwards, right?
We’re going to like for like, if we look at this one where it’s parameterized, like we can at least apply the predicate while we’re reading through data pages. We don’t have to pull out all the data pages and then filter them later.
If we had an index on the title column, you know, we could do some other stuff. And, you know, kind of coming back to like some of the stupidity of the Stack Overflow database, like we could also do some stuff with like post type ID since like both questions and answers are in the post table.
There’s not like a like post questions and post answers tables like that has them separate. Like there’s a post type ID column and only post type IDs of one will have a non-null title. So like you could even do some stuff with like, you know, indexing and like instead of like, like maybe, I mean, probably be a good idea if you’re searching on title that have title available in the index, but like you might want to do something, maybe like a filtered index with post type ID equals one or like, you know, like have that in your query so that you’re only looking for questions because answers can’t have titles.
But, you know, beyond the scope of what we’re talking about a little bit, but you might have, you might have a similar situation with some somewhat denormalized results in there. So when it comes to data types and store procedures, these are definitely things you want to keep an eye out for, whether they are completely mismatched data types or whether they are just, you know, string like max data types in there.
Fixing small things can have a lot of, a lot of big effects, right? Like SQL Server can suddenly do a whole bunch of stuff better. You know, I would, I would be especially careful about accidental max data types if you are, if you are hoping for good performance out of columnstore or batch mode query plans, because, you know, that, that sort of deep data problem can really screw up which operators can like, even like, like get, get or use batch mode effectively.
I’ve seen a lot of problems with that in queries that I’ve been tuning lately where, you know, some column that didn’t need to be a max data type is a max data type and like, like batch mode on rowstore, you still get like a row mode, like hash something for that data, for like joins to that table and it messes everything up.
Anyway, thank you for watching. I hope you enjoyed yourselves. I hope you learned a little something about data types and I will see you in the next video where we will talk about whatever topic number 12 is, because this is, this is clearly topic number 11.
So we, we still have some, some things, some things on our minds, don’t we? All right. Cool. Thank you for watching.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.