Bleached
Video Summary
In this video, I explore the often-maligned table variables in SQL Server and delve into their pros and cons. I start by comparing table variables to temporary tables through a series of stored procedures that insert data into both types of objects and measure execution times. The results show that for high-frequency, low-row operations, table variables can outperform temp tables due to reduced overhead. However, when dealing with indexes and needing column-level statistics for better query optimization, temp tables shine, as demonstrated by a scenario where the temp table’s plan was significantly more accurate and efficient than the table variable’s. I conclude by highlighting that while table variables have their uses, especially in scenarios requiring frequent execution on small datasets, they should be used judiciously considering the potential performance implications of incorrect query plans.
Full Transcript
I was going to write a song about table variables, and then I realized that I have no songwriting abilities whatsoever. That rhyming with variables is unpleasant. Not something that I would wish on anyone in this difficult time. So, I need to turn off query plans according to this. I don’t know what drunk wrote this thing, but whatever. So, table variables have, I think, a deservedly bad reputation for the most part. You can get terrible estimates from them, and even with recompile hints or trace flags, you still don’t get column level statistically information about the distribution of values in your table variables. You’ll get table cardinal. You’ll know how many rows are in the table, but you won’t know the breakdown of the values in those rows.
And also, when you modify a table variable, modifications are all forced to run serially, unless you’re sneaky about it. I have a couple blog posts about those things here. Hopefully, those bit.ly links will be easy enough for you to remember. They are case sensitive, though, so if you’re going to get your little fingers to work, remember to capitalize the correct letters on those. I’ll leave those there and count to a number beyond your imagination as quickly as I can, so that you can memorize them for later. So, first, I want to show you where table variables can be better than temp tables. Like, they have a bad reputation, but sometimes they can be good, too. So, I’m going to create two stored procedures here, and I promise I’m not farting that’s a motorcycle or something.
But I have these two stored procedures here. This first one is called temp table test, where we’re going to insert some stuff into a temp table, the pound sign, and then I have the exact same thing, except we’re going to use a table variable, the at sign table. So, I’m going to go and I’m going to create these two stored procedures. I’m pretty sure I already have them created, but what the hell. Let’s make double short. And then what I’m going to do, and this is why I have query plans turned off, if we’re being honest about things, I’m going to run a test where, for 50,000 iterations, I’m going to, as quickly as possible, execute the temp table stored procedure, and then execute the table variable stored procedure.
I’m going to see which one finishes more quickly. And while I do that, oh, I went up too far. Ha ha! My mouse wheel has been a little wonky lately, but I have both of those. Okay, good. And, well, I run that. I want to bring up that with SQL Server 2019, you do get better table cardinality estimates for table variables, but you still don’t get column level statistical information from table variables. And I’m pretty sure I have a post about that coming up. But this is SQL Server 2017. Not RTM. It’s just, SSMS doesn’t report CU levels for some reason. Microsoft made this big stink of, oh, we’re not a servicing model. We’re not going to do more service packs.
But SSMS is still like RTM, despite the fact that I am on CU 19, SQL Server 2017. Not weird or confusing at all. But we can see, now with these results returned, that the store procedure in the loop that inserted into the temp table, rather, took 17 and a half seconds. And the store procedure that did the same thing with the table variable took five seconds. Now, there is overhead with temp tables. And that overhead is unfortunately where the performance comes from. I know generating the statistics, all that other good stuff, that has overhead when you do it. And when you do it a lot, well, you have to do it a lot. That adds up.
So, if you are, if, and this is a thing that I learned from Michael, Mr. Michael J. Swart, that if you have very, very high frequency execution code, table variables can be a lot better to use than temp tables. Now, this also assumes that plan shape does not matter, that there are no performance issues that could come from, you know, say, joining that temporary object off somewhere else, putting a lot of rows in it and going to do something. You know, it’s just, there’s a lot of circumstances where, you know, like the overhead, the overhead might be worth it.
But, but if you just have very high frequency execution code operating on a small number of rows, and there is no, like, a query optimization impact to the bad guesses that come from table variables, then it might not be the worst thing in the world to use them in those scenarios. I mean, they are there for a reason. Microsoft didn’t devote developer time to them for no reason. I mean, maybe, maybe, maybe, maybe, maybe it could have been like a, a code portability thing between other competitive database platforms and SQL Server, but who knows at the time?
I have heard nothing about the Oracle checklist from people at Microsoft, so whatever. Now, let’s look, let’s look at when table variables are less good. All right, when they’re not quite as hot.
So the first thing I’m going to do, actually I should put a little, put a little space in there, happy little space. No mistakes, just happy little spaces. So I’m going to create a couple indexes on the post table and the comments table on the owner user ID and ID.
And the first thing I’m going to do, the first thing I want to show you rather, is, well, after these indexes create, the first thing I’m going to show you, is that if we create a temp table, when we insert values into that temp table. Now, this is where the column level statistics can matter a lot for temp tables and table variables.
The first thing I’m going to, the first value I’m going to put in there is John Skeet’s ID. John Skeet is legendary. John Skeet has many questions, well, not many questions, but he has many answers, many comments.
He is quite prolific in the Stack Overflow data. And the other set of values in there that I’m going to put in is people who barely use the site, people with a reputation of one, ordered by how recently they created their account.
So people who are not avid site users, if they have any answers, I’d be surprised. They might have like one question. Some people sign up, ask a question and leave.
Some people sign up, then do nothing. They’re just like, oh, I have an account. I was going to answer the data to do something else. I was wrong. You know, there’s like lots of stuff. So the first thing I’m going to do is now that those indexes are done is I’m going to stick those values into a temp table, pound sign T temp table, and then I’m going to get a count from those.
All right, so I’m going to do that. I’m going to turn execution plans back on. And I’m going to prove to you that this is a reasonably fast executing piece of code. The insert takes 232 milliseconds.
There are perhaps things we could do to make that better. I hadn’t really thought about it too much. And then the join down here is a reasonably fast join at 0.016 milliseconds. Now, if we were to repeat this experiment with a table variable, it would take a long time.
All right, so if I do the same thing here, I’m going to run that. But I’m not going to wait to show you the results. I’m going to show you a saved copy of the plan.
Not because this is slow. This part is fine. The insert is fine. What’s not fine is the query that we run next, which runs for, your eyes are not lying to you, five minutes and four seconds. And we have our table variable guess here.
We have our index seat guess here, which is not so good. We guess 24 O’s and we get 2700, 901. And then we go into this nested loops join, expecting 24 O’s, getting 27901.
And then we spend four and a half minutes in a nested loop join, going and getting values out of the comments table. When we are expecting 638 and we got 1304009037. I’m not going to bother trying to figure out what that number is.
It is a big number. And this takes a very long time. This was not a happy time for this query running. And the reason that we got such a better guess from the temp table was because we got column level statistics about what values were in there.
We were able to use those statistics to come up with a better joined execution plan. And we were able to use those to not end up running in nested loops hell for five minutes. Crazy that, isn’t it?
Crazy. Absolutely insane. Crazy. So, if you’re going to use table variables, high frequency execution is one of the primary things that I would look for in a workload that might benefit from table variables. But also you need to look at how you’re going to use table variables afterwards.
If I took those two store procedures that I showed you earlier and I said, hey, we’re going to see what happens when we run this in a loop. Well, this one getting to 50,000 executions at five minutes per execution, that’s not happening. I’m not sitting here for that long.
I don’t love you that much. There are limits to my love and patience. The temp table one would finish rather quickly, even for 50,000 executions. So there are things to consider with how we’re using the table variable down the line.
And if column level statistical information might help those activities further on down the line. So, there we have it. For the most part, if we care about performance and statistical information, then we would probably want to err on the side of temp tables.
If we care about speed of execution, the frequency of execution, then we might want to use table variables. This changes a little bit, like I said, in SQL Server 2019. But I have another post about that where I address that specifically.
Anyway, that is all I have to say here. I’m going to stop talking. I’m going to hit the stop button. I’m going to eat some cashews and probably record another video once I get done feeling gross about cashew things.
Quarantine is not fun. Goodbye.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 25% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.