Learn T-SQL With Erik: How to Pick a Temporary Object
Video Summary
In this video, I delve into the nuances of temporary object comparison within SQL Server, focusing on table variables and temporary tables. I explore their respective strengths and weaknesses to help you choose the right one for your specific use case. Starting off, I highlight the downsides of table variables, such as preventing parallel query plans when modified and lacking column-level statistical information. In contrast, temporary tables offer more flexibility with full parallelism support and better handling of statistics. The video then dives into practical examples, comparing execution plans between a table variable and a temporary table to illustrate how cardinality estimates differ significantly. By the end, you’ll understand why temporary tables might be preferable for complex queries that rely on accurate cardinality estimates, making this content invaluable for anyone looking to optimize their SQL Server performance.
Full Transcript
All right, Erik Darling here with Darling Data and continuing on the theme of learning T-SQL. I suppose it’s worth saying that all of this content is teaser content, of course, for my Learn T-SQL with Erik course. Today we’re going to be talking about temporary object comparison, not how to like compare data from one to another, because that’s dull. Intersect, accept, whatever. But more how to choose the right one. one for you and for each particular use case that you may have for a temporary object. So, before we move on to that, if you are saying, wow, a course where I can learn T-SQL with Erik. Well, yeah, there’s a description, in the video description, there’s a link to do that. The course is still currently at the presale price of $250. But as soon as the advanced content is all done, which probably has like maybe like two more months left on it, and that like fully recorded and produced, like a quarter of the way written, you will have to pay $500 for it. So I would strongly suggest buying it now. You can also do other things down there, like hire me for consulting. You can contribute money to the channel to support my work to bring you SQL Server content. You can ask me questions for my Office Hours episodes that I really enjoy answering. And of course, if you enjoy this content, but not in a monetarily productive way, you can always like, subscribe, and of course, tell a friend, because maybe your friend is secretly rich, wants to hire me for anything. Cool. I’ll be leaving the house a few times between now and November. I will be in Dallas September 15th and 16th. I will be in Utrecht October 1st and 2nd. And I will be in Seattle November 17th to 21st. What do all of these things have in common?
Well, they are all past events put on by the nice folks at Redgate, who were kind enough to decide to have me show up at all of them and talk at you about SQL Server performance tuning. So with that out of the way, let’s compare temporary objects here. Let’s make sure that I go to the right SSMS spawn because, well, I need multiple of them. So let’s talk first about table variables. They have some upsides and some downsides. Downsides would include preventing parallel query plans when you modify them. At base, you need to put data into a table variable via an insert. And of course, that will force whatever query populates the table variable to run fully single threaded. There is a non-parallel plan reason for that.
There is no column level statistical information. There are no histograms available for table variables. This is true even if you index them. You can get some benefit if you create a unique index on them for things, but you still don’t know what values are in there that are unique. You often get poor cardinality estimates with them when you start joining on them to other tables or filtering on them locally with a where clause.
There is no recompilation threshold with some exceptions. There are some trace flags out there and there are some newer SQL Server features out there where you may find a recompile does happen. Most notably, if you put a recompile hint on the query. There we go.
You cannot select into a table variable. Table variables use the collation of the local database that your procedure is executing in. They are only available in the current scope and batch, so you cannot declare a table variable out here and then reference it as easily as a temp table, either in dynamic SQL or in another store procedure.
Modifications to table variables are not automatically rolled back on error or explicit rollback. They can be used in some functions like a multi statement table valued function or a T-SQL scale R UDF. And of course, you can create indexes on them when they are declared.
Temporary tables have a bit of a different profile. There’s no inherent parallelism restrictions. You can insert, update, and delete. And as long as SQL Server thinks a parallel plan would be a good idea, you can get yourself a parallel plan.
They do allow for statistical information and histograms. They can sometimes suffer poor cardinality estimates, though, because statistics can be cached when they are used in store procedures. They do have a low compilation threshold, so you don’t often run into the cached temporary table statistics problem, but never say never.
You can select into them. Temp tables use the collation of tempDB. So if you might ever look through my store procedures that create temp tables, you might see a lot of things where I need to collate to the local database.
So like collate database default, and that is to prevent string comparison issues if the collation of tempDB does not agree with the collation of the local user database that the store procedure is running in. They are visible to child batches and scope, so they’re very easy to reference in Dynamic SQL or in other store procedures that you might call. Modifications are rolled back on error or explicit rollback.
They cannot be used in some functions, unfortunately. And you can create indexes with the table. When you say create table, you can make the index definitions inline, or you can add them on later, depending on which you find to be most beneficial. There are a lot of insert patterns where you want to insert into a heap temp table to get a fully parallel insert, and then you can create an index later on that.
And you also get the benefit of a full scan statistics histogram when you create the index later, rather than the sampled histogram that you would get if you loaded the data with the index already there, and then maybe use the column in a where clause or a join clause or group by or something. There are some similarities though, and these are non-negotiable. They both live in tempDB and take up space there.
Table variables are not in memory unless you are explicitly using the in-memory SQL Server feature and creating in-memory table variables, which do not look like the declaration of a normal table variable. They both log equally to tempDB’s transaction log. They are both session scoped, meaning that once the session that created or declared the temporary object is gone, they are gone.
Global temp tables have slightly different rules, but we’re not talking about those. Simultaneous sessions can use temporary objects with the same name without existence errors, but that, of course, does not apply to global temporary tables. Now, that’s a lot of talking, isn’t it?
We’re already seven minutes, and still, all we did was talk. But now we can get down to some of the nittier, grittier stuff with them. So what I’m going to do is create a table, or rather declare a table variable called user scores. And I want you to note that I have some check constraints on here, and I have a couple indexes on here.
Now, the point of these indexes is to aid later queries. So what I’m going to do is I’m going to run all of this stuff, and I’m going to talk about these two hints in a second. But let’s run all of these, and let’s wait a moment, and we’ll get the actual execution plans for those when this is all finished.
So because I am using SQL Server 2022, and my compat level is 150 or higher, keep in mind this is a feature that started in SQL Server 2019 with compat level 150 up, assuming you’re on Enterprise Edition, of course.
You have paid the Microsoft friendship tax, and they have decided to try to help the performance of your workload. Because of that, I have an intelligent query processing feature called table variable deferred compilation at play here, which is going to get me table level cardinality for like a full scan of the table or something, a full join of the table.
But you will see in a moment that this does not help us with cardinality when we have a where clause. So if I were to apply either one of these hints to either disable that feature or to use the optimizer compatibility level 140, the table cardinality guesstimate for table variables would disappear.
But this is all finished running, and I’ve got a few queries down here. I’ve got one query saying where owner user ID equals 22656. I’ve got one query here asking for where score is greater than zero.
And I’ve got one query down here asking for both. So this should be interesting because, like we went over, when I created this table, I had a check constraint for score being greater than zero.
And I have an index on owner user ID and an index on score. When I inserted data into the table variable, both of these things were written in a way to get us the same owner user ID for every entry in here, right?
Every row in this is going to be 22656. And of course, we have score greater than zero in order to not run into any check constraint errors. But the cardinality estimates that I get from this, I mean, the totals are all correct, but, you know, not much else.
Starting with the top query that does the insert, you will see that there is a clustered index scan here, a clustered index insert here, and then there’s an assert over here, which the check constraint uses to make sure that all of the scores are greater than zero.
If you hover over the clustered index insert, you’ll see the two additional indexes listed that were inserted into for owner user ID and score. Anyway, moving on.
If this query ran for 4.295 seconds, and you’ll notice that there are no parallelism indicators at all for this plan. This is because if we right click and hit properties, we will have this non-parallel plan reason over here, like I described earlier, where SQL Server will not be able to generate a parallel execution plan when we go and, what do you call it, run the query, do the insert.
There we go. Anyway. All right.
Moving on. Now, because I have the table variable deferred compilation intelligent query processing feature enabled here, and it happens for us, what SQL Server is able to do is when we just get a count of the entire table, where one equals select one will always mean true, but when we do that, SQL Server is able to guess that 27,029 rows will come out of here.
Great. But this is where things do start to fall apart. Why?
Well, because all SQL Server knows is that there are 27,029 rows in the table variable. It still has no description of what values ended up in there. So when I change the query a little bit, and I say where owner user ID equals 22656, SQL Server, well, we still return 27,029 rows, but SQL Server guesses that 164 of them will meet that where clause.
There’s no histogram. If there were a histogram, we would see that. Now, excuse me.
If we look at other places, we’re going to see where I say, hey, show me where score is greater than zero. Well, we know that every row in there is greater than zero because of that check constraint, and we have an index on score, but SQL Server says, well, I think you’ll get about 8,100 rows back.
And look at the check, neither the check constraint nor the index were of any benefit to cardinality estimation here. And going down even further, if we say where owner user ID equals 22656 and score is greater than zero, SQL Server says, I think you’ll get about 90.
But we still get back all 27,029. So it is difficult to prove a negative when looking at things like this. But if we look at the properties over here, there’s a thing missing from over here.
There’s usually, if you look at a query where statistics were in use, you will see an optimizer stats usage entry in the properties of the root operator in the query plan. But we do not have one of those here.
Again, it is difficult to prove a negative, but we can go on and use a temp table and prove a positive, can’t we? We can do something very scientific. So what I’m going to do is I’m going to create a temp table.
And I’m not even going to festoon it with all of the things that we had on the other one. I’m not going to do that. I’m just going to say we have two columns in this temp table.
They are both integers and they are both not null. There’s no check constraints. There’s no indexes. So that’s what we have going for us. So let’s drop this table if it exists, even though it doesn’t.
And let’s do that exact same insert. Now, if you look at the execution plan, you’ll note that we do get a partial parallel plan here. SQL Server does not use a fully parallel insert.
Why? I don’t know. Didn’t meet the cost. I think it was maybe the number of rows because SQL Server estimated 213 rows. And, of course, we got 27,000.
But, you know, I think there’s a cardinality estimation thing at play with if SQL Server decides that a parallel insert is going to happen or not. But anyway, we do not have, clearly do not have a non-parallel plan reason for the insert into the temporary table. And you’ll note that it does run in about 700 milliseconds.
I think the other one was, was it 3.4 or 4.3 seconds? I forget. But it was much longer than this, right? It was a long time. Now, we’ve got this table and we’ve got it populated.
So that’s our good first step. Now, let’s run the same queries, but now against the temp table that we had against the table variable before. We’re going to get back all our 27,029 results, which is a good first step for us here.
But if we look at these, of course, you know, table cardinality, very easy to guess. So this top one, you know, it gets a good, good grade here, right? 27,029 to 27,029.
Not much difference just yet. But now, boy, howdy. Look at, we get accurate cardinality for both of these. When we, when we said where owner user ID equals 22656, SQL Server looked at, generated statistics on the, on the, on the temp table, just on its own. Remember, there’s no indexes on this thing.
SQL Server just generated statistics and said, hey, that’s 27,029. All right. I know. I got this one. I’m your friend.
And when we say where score is greater than zero, SQL Server said, hmm, got your back, homie. 27,029 of 27,029. So now not only do we have accurate table cardinality, we have accurate column level cardinality when we start filtering on our table variable.
This becomes very important if we were to start joining, sorry, our temporary table. This becomes very important if we were to start joining a temporary object off to other larger tables where perhaps, you know, different, having statistics on columns that we’re joining to very large tables would be beneficial in SQL Server generating an optimal execution plan, don’t you think? And if we combine these two predicates where owner user ID equals 22656 and score is greater than zero, SQL Server will once again, remember last time it said like 90 or something?
Now we get the full number of rows that would actually come out of that temp table. So not to say that table variables do not have any uses, but for me as a performance tuner, generally I am very interested in not only materialized results, but getting either better or more accurate cardinality estimates when I start doing other things with those materialized results. And this is where I mostly want to sway people to use temp tables instead of table variables, but I still have to spend a lot of time going over things like this.
So perhaps this channel just does not have enough reach. Perhaps not enough of you have told a friend about this channel and perhaps you should because there are a lot of people out there who still need to know these things. Anyway, thank you for watching.
I hope you enjoyed yourselves. I hope you learned something and I will see you in tomorrow’s video where we are going to talk about some of the fun and joys and quirks of Razor. It’s of course a much smaller portion of an error handling module, but you only get teasers because you didn’t pay for it.
If you buy the full course, you get the full content. All right. Anyway, goodbye.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.