For The Last Time, It Doesn't Matter How Many Rows You're Putting In A Table Variable

For The Last Time, It Doesn’t Matter How Many Rows You’re Putting In A Table Variable

Thanks for watching!

Video Summary

In this video, I delve into table variables in SQL Server and why Microsoft’s advice about their usage is often oversimplified. I start by challenging the notion that a table variable containing fewer than 150 rows is safe to use, drawing parallels between this advice and other dubious recommendations from various sources. The core of my discussion revolves around the limitations of table variables when it comes to cardinality estimation and plan shaping. By walking through an example with a table variable and comparing its performance against a temporary table, I illustrate how SQL Server’s lack of statistical information on table variables can lead to suboptimal query plans, even for small datasets. This video aims to provide a more nuanced understanding of when it’s appropriate—or not—to use table variables in your T-SQL code, emphasizing the importance of considering how these variables will be used later in queries and their potential impact on performance.

Full Transcript

Erik Darling here with Darling Data. And in today’s video, we are, well, I had my sights set on something completely different. I had my gaze set upon far grander shores, but then there was a YouTube comment earlier saying, Microsoft’s advice about table variables is that as long as there’s fewer than 150 rows in them, they’re okay. Everything’s hunky-dory, just go about your business. And given the quality of Microsoft’s advice on a lot of other things, I would treat it with the same level of suspicion that I would treat, I don’t know, politicians, alligators offering to help you cross rivers, I don’t know, homeless people handing you soiled bags. There’s a lot of things. There’s a lot of things that, like, I would just be like, mmm, no thanks. Not what you’re selling today. Not interested.

So, the thing with table variables is that the number of rows that you put in them does not matter. What matters is what you do with them later. Right? So, if you’re going to use table variables, this is when you probably shouldn’t. This would be your mental checklist for times when you probably shouldn’t use table variables, regardless of how many rows you’re going to put in them. So, if you’re going to put in them. So, if you’re going to put in them. So, if you’re going to put any amount of data into a table variable, and you’re going to correlate that table variable to larger, to one or more larger tables in any way, you probably shouldn’t be using a table variable.

And that goes for anything in this list and more. Joins, where’s, subqueries, ins, not ins, exists, not exists. Anything that requires a relational activity between a table variable and another table of any meaningful size means you shouldn’t be using a table variable. So, when table variables are sometimes okay is if you want to pass data between procedures. Now, I talked how, another video where I talked about how if you use temp tables for that, you can get a rather polluted plan cache, which may not be the end of the world. Passing in lists, right? Instead of passing in a list of comma separated values and parsing it out. Often, using a table value parameter is a better choice. But even then, I might even suggest dumping the contents of your table value parameter into a temp table and then using that instead.

Where table variables usually win out in a typical speed test, unless you have in-memory tempd enabled, which if you’re on stupid managed instance, you can’t do, is if you have very high frequency code calls, like hundreds or thousands of calls a minute or second or millisecond or microsecond or nanosecond or whatever unit of time you want to deal with. I don’t know what, I don’t know what, I don’t know what the difference between metric and metric units of time is. Do they have kilominutes or kiloseconds? I don’t know. Weird thing to think about. So, and you’re only ever like putting a little bit of data into a table variable and then retrieving a little bit of data just from that table variable.

Right? So, when you shouldn’t use them, most of the time. When you should use them, very, very small controlled portion of the time. So, I’m going to talk a little bit about why the number of rows, and hopefully this is the last time I have to talk about this, because I feel like I’ve recorded this video 17 times. So, hopefully this is the last time I have to say it.

The number of rows that you put into a table variable is not what makes using a table variable okay. So, here’s a simple example of why table variables can screw you up regardless of how many rows you put in them. You ready for this? Alright.

I’ve got a table variable called at t. It’s got one column in it. It’s an integer. It’s got a clustered index on the integer. Actually, let’s fix that. Let’s make sure everyone knows that’s an integer.

Let’s not minimize it by calling it an int. I don’t like that. And I’m going to put two groups of data in there.

I’m going to put one group of data that represents a larger number of rows across tables in the Stack Overflow database. So, user ID 22656 belongs to a gentleman named John Skeet. His name has come up a number of times across my videos.

He is prolific in the Stack Overflow community. I don’t know if he still is, but at one point he was quite prolific. And then I’m going to put 99 other rows in there. And the 99 other rows that I’m going to put into my table variable are very low reputation users.

And that’s ordered by the most recent creation date. So, these are the people who joined the site and haven’t been around a long time. The reason why this matters is because table variables, even with indexes created on the columns, do not get column level statistical histograms created for them.

And SQL Server has no idea how to produce cardinality estimates for them. Now, that stuff has changed a little over the years. There’s trace flags and recompile hints and compat level 150 plus, I believe, enterprise only.

We’ll get you something called table variable deferred cardinality estimation, which tells you how many rows in total are in the table variable. But it does not tell SQL Server what the values in those rows are. Very specific about this.

Say that very, very slowly. SQL Server has no idea what’s in your table variable. It just knows how many things are in your table variable. Alright.

And then what I’m going to do is I’m going to just get a count from my table variable joined off to two other tables. And I’ve run this ahead of time. Because if I were to sit here and make you watch me run this now, I don’t think we’d be friends anymore.

So, here’s our initial insert into the table variable. Alright. Let me get rid of that tool tip and all sorts of goofy things popping up on me. And this is fast enough.

300 milliseconds. Very few people would complain too much about that. Now, we still get the inability to perform a parallel insert here. But for putting 100 rows into anything, that doesn’t matter.

Right. What matters is down here. Now, let’s look at this query plan a little bit. And we can see where SQL Server, because we’re using compat level 150 here. SQL Server correctly estimates that 100 rows are going to leave the table variable.

But again, it has no idea what the contents of those 100 rows are. And so, when it starts attempting to do joint cardinality to the other tables, everything goes right to hell. So, we mess up here by almost 2,900%.

We mess up here by a really, really big number. I don’t even know what percent that’s going to end up being. But that’s a big number with a very small estimate here.

And this all drags on for nearly two minutes in total. All right. You look at the final operator here. We have a minute and 54 seconds.

And then, for some reason, this batch, this hash mode, this batch aggregate, sorry, this hash aggregate runs in batch mode. There’s a lot of hab-hab-hab-habas in there. And so, we have to add 500 milliseconds to the 150, sorry, to the 1 minute and 54 seconds here.

So, nearly 1 minute and 55 seconds in total. All right. Tally that up.

It’s a long time. All right. It’s a bloody long time. That was a bad accent. I apologize. Never going to do that again. And the thing is that this kind of stuff does not happen with temp tables where SQL Server can generate histograms and do proper cardinality estimation with data in temp tables.

So, a couple things to do in here. All right. Let’s run this real quick.

And what I want to show you is that under different circumstances, under sort of normal circumstances, the temp table variables are backed up by sort of hidden temporary objects up in tempDB. Here we have AB575B71. All right.

That is our secret temporary object backing up the temp table. And if I run this query a couple more times, we will just get different sort of values in here. They’ll be a little bit different every single time I run this.

There’s a B11, blah, blah, blah. Now, the reason why this query is quick is because we’re just, like I said, one of the times when table variables are generally okay is if we’re just selecting a relatively small amount of data out of them. But what’s interesting here is that, you know, the data that we put into this table variable, which is ID 22656, and then a bunch of pretty small numbers.

SQL Server knows how many records are going in total are going to be in the table variable, but has no idea how to make a guess for this. If you notice, go back to the query plan, or sorry, go back to the results, we get zero records back from that. No single row in that table variable qualifies for that where clause.

But SQL Server still thinks that 30 of the 100 will. So we get a 30% guess there. We had 1,000 rows in there, we would guess 300.

We just get a stock 30% guess. If we had an equality predicate, it would guess 10%. And if we had a unique index on there, we might guess 1. But, I’m sorry, unique index and an equality predicate, it would probably guess 1.

But even that would be wrong. And all sorts of just wrong things in there. So temp tables don’t have that issue. Right?

If we run this code, now this code matches the exact same query that I ran the first time with the table variable that took a full two minutes. If we run this, notice this number might look familiar if you remember the really bad estimate from the initial plan. But now SQL Server, well, a couple of things happen.

One, we actually get parallelism in this plan. So this improves by a little bit, right? This is like 300 milliseconds, now it’s 55 milliseconds. And this query down here, that used to take a full two minutes, finishes in 15 milliseconds. Why?

Because SQL Server can make good guesses. Right? SQL Server is now, oh, you know what? I forgot to change that. Let’s make this equivalent. That’s supposed to be 99.

Let’s do that again. There we go. SQL Server can make just fine guesses now. Right?

They’re not perfectly spot on, but it was a lot closer. Right? That was only off by 1,400%. This was only off by 2,400%. Not off by like 483 million percent, whatever that was. So, again, not perfect join cardinality, but SQL Server was able to do some stuff differently, and we got a much faster execution plan because it at least had some reasonable idea of what was inside that temp table.

Right? The reason for that is that SQL Server, like I said, generates statistics on temp tables that it does not create on table variables.

Now, what I’m going to do is I’m going to use a couple trace flags in here to print out cardinality estimation information for this query. And if you look at the messages tab, what you’re going to see down a little bit is text that looks like this.

Right? And what do we see in here? Well, we see a cardinality estimate on the base table posts where we get this number, 1.7e plus 07.

And then this, CST call black box card equals 1. Cardinality of 1. Right?

And that’s going to be repeated throughout all of the cardinality estimation attempts for this query all throughout here. This black box thing shows up over and over again. And that black box thing is indeed the table variable because it has no statistics.

Right? There is no statistical information about… Oops.

Sorry. I got a little carried away with the highlight there. There is no statistical information about what is in our table variable. Right? That just doesn’t exist there. Now, that will exist is with the temp table.

Right? So keep in mind, the only thing I’m sticking inside my temp table is the value 22656. And if we run that same thing for that, well, what do we get back?

Well, we get back a bunch of junk that we probably don’t need that, you know, other stuff that came through there because I had a very expansive query to show all these things. But the thing that’s important is this first line where we have…

Well, let me squeeze that over a little bit. There we go. The thing that matters is this top line here where our pound sign temp table gets a single row histogram with 22656 in it. And so SQL Server can do accurate cardinality estimation.

SQL Server can figure out exactly how many rows are going to qualify in there. Right? Pretty good.

Pretty good stuff. Remember, SQL Server wasn’t able to do that for the table variable. So, let’s go back to this plan. Look what we got.

That was 27,901 of 12. Right? That one of one here, but no idea how to match that one to that one down there. Temp tables just didn’t have that problem.

So, just to reiterate, does not matter how many rows you’re going to put into your table variable. It matters how you’re going to use your table variable later. Table variables do not get statistical information about the data that populates them.

So, if you’re going to join those table variables off to other tables, where cardinality estimation and plan shape and a whole slew of other things might matter for performance, you probably shouldn’t be using them.

All right? Now, this kind of stuff might not matter at first. What I’m going to relate it to is, since I obviously just came back from the gem, right? If you’re the type of person who does not have a very challenging database size-wise, or does not have a very challenging workload, or does not have a very challenging performance criteria for their queries, you can almost use whatever you want and get away with it.

All right? But if you’re the type of person who wants to get better at this stuff, and who wants to someday maybe work on challenging databases and challenging workloads, these are the kind of small technique things that you’re going to have to get used to fixing.

Now, relating it to the gym, if you’re the kind of person who walks in, maybe puts a couple of 45-pound weights on a bar and does some squats, your form is almost never going to matter.

135 pounds is not going to be enough to destroy your life. But if you’re the type of person who wants to get a 400, 500-plus-pound squat, these little technique things are what’s going to make a big difference.

All right? Breathing techniques, where you place the bar, foot position, hand position, right? How you descend, how you rise, all these things.

All these technique things make a far bigger difference when you’re dealing with far bigger weights. In the same way, all these little technique things with T-SQL make a far bigger difference when you’re dealing with far bigger databases, data sets, far more challenging workloads, and all sorts of other stuff that make your job hard.

So please, stop telling people that the number of rows that you’re going to use is what makes a table variable okay or not. It’s all about how you’re going to use that table variable after it’s populated.

Okay? Because every time I hear someone say that, the smackin’ hand comes up. Just, you’ve got to fight it.

You’ve got to keep the smackin’ hand down. It’s like that movie Idle Hand, where it’s just like, all of a sudden, I have a knife. I don’t actually have a knife. It’s just the smackin’ hand.

The big smackin’ hand. Anyway. Thank you for watching. I promise tomorrow’s video will not be about table variables. I hope.

God, I pray it won’t be about table variables. I hope you enjoyed yourselves. I hope you finally learned that the number of rows you put into a table variable is not what makes using a table variable okay.

If you liked this video, please, you can do the thumbs up thing.

You can leave a nice comment. You can tell me how good I look after the gym. If you like this sort of SQL Server content, you can join the, let’s see, let’s get a fully updated number here.

All right. Here we go. The nearly 3,632 other people who have subscribed to this channel, so you can get a little ding every time I post one of these web gems for you.

All right. I’m going to go not think about table variables for a while. Might think about the bottom of a bottle of wine for a little bit. That’s about where I’m at.

Anyway, thank you for watching. Please stop telling people that the number of rows that you’re going to use is what makes using table variables okay. For the last damn time, it’s not okay.

Thank you.

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

8 thoughts on “For The Last Time, It Doesn’t Matter How Many Rows You’re Putting In A Table Variable”

Michael Utas says:

June 19, 2024 at 3:30 pm

Hi Eric.
I tested this on my (Microsoft SQL Server 2022 (RTM) – 16.0.1000.6) and the StackOverFlow2010 Db.
I dont get that same queryplan when using a table variable.
It uses “Parallelism” and is pretty fast even if estimates are off.
I changed compatibility_level to 150

I dislike table variables as much as you but I need to convince my devs to not use them by showing them when it´s bad using our servers and Db´s.
In order to do that I have to try your stuff and see the same results as you.

Could you send me the queries for Table variable and Temptable, so I am sure that the queries are correct or hint me to what could be going on?
1. Erik Darling says:
  
  June 19, 2024 at 3:40 pm
  
  What you’re describing is impossible. Table variables do not under any circumstances allow for parallel inserts, unless the query is executed in a different context, like from dynamic SQL.
  
  Are you sure you’re using a @table_variable and not a #temp_table? If so, please show me what you’re doing. I’ve already shown what I’m doing 😃
  1. Michael Utas says:
    
    June 20, 2024 at 2:25 am
    
    I was sure as hell I would get the same results as you so I guess i´m doing it wrong 🙂
    
    The part in queryplan that shows parallelism(Gather Streams):
    
    Query:
    DECLARE @tblvar TABLE
    (
    Id INT
    , INDEX ix CLUSTERED (Id)
    );
    
    INSERT INTO @tblvar
    (
    Id
    )
    SELECT dta.Id
    FROM
    (
    SELECT Id
    FROM
    Users
    WHERE
    Id = 22656
    UNION ALL
    SELECT TOP (99) Id
    FROM
    Users
    WHERE
    Reputation = 1
    ORDER BY CreationDate
    ) dta (Id);
    
    SELECT COUNT_BIG(1)
    FROM
    @tblvar t
    INNER JOIN
    Posts p
    ON p.OwnerUserId = t.Id
    INNER JOIN
    Comments c
    ON c.UserId = t.Id;
    1. Erik Darling says:
      
      June 20, 2024 at 8:18 am
      
      Try it with these indexes:
      
      CREATE INDEX dog ON dbo.Posts (OwnerUserId) WITH (MAXDOP = 8, SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);
      CREATE INDEX cat ON dbo.Comments (UserId) WITH (MAXDOP = 8, SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);
Michael Utas says:

June 20, 2024 at 2:27 am

Gahh, xml wasn´t the best to send in a Reply obviously.
Lets try it without the xml stuff.
RelOp AvgRowSize=”15″ EstimateCPU=”0.00181815″ EstimateIO=”0″ EstimateRebinds=”0″ EstimateRewinds=”0″ EstimatedExecutionMode=”Batch” EstimateRows=”171.014″ LogicalOp=”Inner Join” NodeId=”2″ Parallel=”true” PhysicalOp=”Hash Match” EstimatedTotalSubtreeCost=”125.696″
1. Michael Utas says:
  
  June 20, 2024 at 2:38 am
  
  Post reply bombardment:
  Screen shot:
  https://www.dropbox.com/scl/fi/u73fzvrz3safw92vo2ewt/TblVarParallelism.png?rlkey=xtebvskpdw3z376vqyrlw42fd&st=nj6l7ajz&dl=0
  1. Erik Darling says:
    
    June 20, 2024 at 5:51 am
    
    Yep, just like I said in the video, table variable modifications can’t use parallel plans. At no point did I say that select queries can’t.
    1. Michael Utas says:
      
      June 20, 2024 at 6:46 am
      
      Of course. I was assuming that my plan would be more or less exactly like yours and didn´t listen very well to what you didn´t say 🙂
      I´ll pay better attention going forward.
      
      Thanks for your time and have a great weekend.

Comments are closed.