Learn T-SQL With Erik: Identity Annoyances
Video Summary
In this video, I delve into some common annoyances related to the `IDENTITY` property when working with temporary tables in T-SQL. Specifically, I discuss how using an identity column can hinder parallel inserts and provide alternative methods for generating auto-incrementing IDs that maintain query performance. I also highlight the benefits of using a `ROW_NUMBER()` function instead of relying on the `IDENTITY` attribute, especially for larger insert operations into temporary tables. If you’re working with heavy data movement queries or need to optimize your T-SQL scripts, this video offers valuable insights and practical solutions. Whether you’re just starting out with SQL Server or looking to refine your advanced query tuning skills, there’s something here for everyone.
Full Transcript
Erik Darling, your friend from Darling Data. I don’t know, like at this point, I don’t know, I either need to change my name or the company name and spice up these intros a little bit. Something new to say. This is going to be another video of material from my new T-SQL course.
This video is going to be, I don’t know, hopefully a rather short one to just go over some annoyances with the identity property when you are trying to insert data into a table. And not for like single row inserts, but let’s say, you know, you’re creating a temp table that you want to work off of and, you know, you need some incrementing number in there.
Some stuff that I run into, well, like performance tuning queries to do that. All 23 hours of the beginner content is live and published. The course is at the pre-sale price of $250. It will double in value.
After the summer when the advanced material comes out. So I do, I would urge you to buy now and save yourself $250. Unless you are, unless you’re just like, if you’re psyched on giving me an extra money, cool.
But I appreciate that. But, you know, if you want to save yourself some money, get in there, get in there sooner rather than later. So let’s talk about these identity annoyances. Now we don’t have a ton to cover here. It’s just some example queries that I want to show you.
So one thing that I lean on a lot when I am tuning queries is, you know, like, like temp tables are a wonderful mechanism for making queries go faster. You can materialize a result set. You can, you know, create indexes across, you know, a combined set of columns. Like if you have to join two tables together, you can put those two, you can put those rows into a temp table.
And that would allow you to create an index across columns that might come from different tables, which ordinarily you couldn’t do. And would make maybe your performance tuning or just general indexing job a little bit tougher to make things go faster. So there are very, very good reasons to put data into a temp table.
Something that I see a lot of people do is add an identity property to a temp table. It’s never entirely clear to me why, but I see this quite a bit. And, you know, like later in queries, it does end up getting used a little bit for stuff, which is, you know, of course, curious to me.
But there are better ways of giving yourself that same outcome without hampering the performance of your queries. So just for example, let’s say that we create a temp table with the identity property, and then we try to insert into that temp table. Now I’m using the tab lock hint here because one thing that is very useful generally is to like when you’re inserting into a temp table and the goal is to get a fully parallel insert is to use the tab lock hint with the insert.
It doesn’t always work. Sometimes select into does work better. But in our case, if we are if this is our big attempt here, then, you know, one thing that will prevent the insert from being in the parallel zone is going to be the identity property.
If we look at this query plan, what we would hope for is that this gather streams operator would be way over here. Right. And that the table insert would happen within the parallel zone, like over here.
But we don’t get that. Right. The this this query does scan the post table in parallel, does aggregate the data in parallel. And but then we we exit the parallel zone early and we repartition streams.
Like I said, the identity property is something that will prevent you from being able to do a fully parallel insert. Right. And there’s another sort of silly way of doing this.
And that’s when you say select into like this. And I don’t know why this I need to change that keyboard shortcut. But if you do something like this, you can you can also say ID equals identity integer one one.
And you can create an identity on the fly. The thing is, this is this is one of the few things that will like stifle the insert from being parallel. We’re going to see the exact same estimated plan here where the gather streams ends early, the in the table insert is outside of the parallel zone.
So this is not what we want to see again. Again, this compute scale are right here is going to be where that identity starts getting assigned. And this is what causes this is that requires the parallel zone to end where it does.
You can do this yourself with a row number with by using row number. Remember, row number starts at one and just counts up until you run out of rows. So this is often a much faster way of giving yourself what is essentially an auto incremented ID column, but with what with better performance along the way.
Note in this query plan, when we get the estimated execution plan, this is what we wanted to see in the first place. We have everything happening, everything that we care about happening within the parallel zone. So there is the clustered index scan.
There’s the aggregate. In this case, we need to sort some data in order to generate the row number. But, you know, you can index your way around that. I just haven’t gone that far with this simple demo. So then we use our window aggregate in order to produce the row number.
And then notice here, this is what we this is what we cared about last time. This is what we cared about in this video that wasn’t happening in the last couple of queries where the table insert is before the gather streams. So all of the rows that end up in this table would have been spread out across multiple threads.
And this can really, really help larger inserts into temp tables. Small, small row, small row count inserts is probably not going to make a very meaningful difference. And depending on like what your max stop is, sometimes it really helps to bump max stop up to make sure this goes quickly.
But like, you know, there are some like factors involved with figuring out if this is worth pursuing. But when it is, this can make a very, very big difference into how fast rows end up going into your temp table and getting loaded in from the was part of the insert. So just something to consider when you’re tuning queries, if the speed of your insert into a temp table is not what you want.
You might try the tab lock hint. Of course, if you’re going to use a tab lock hint, there are many things that you like. Like you should really just start with a insert into a heap temp table and then add in any indexes later that you want.
But this does have some ramifications around like plan caching and reuse and recompiles and stuff. So, you know, but if it’s the type of thing where, you know, it’s a process and you’re loading a lot of rows in and it’s not like, you know, some old crazy OLTP query that needs to run every, you know, three mil, three milliseconds or something. Then this kind of like larger effort tuning is worth it.
So something to think about when you’re tuning your queries, especially your big, big heavy lifting data movers. If you want to get the fully parallel insert much easily, much more easily. It can be a very, very neat sort of workaround to just use row number instead of an identity attribute.
Get yourself that auto incrementing thing. Create whatever indexes you need later, perhaps, and just use that. Anyway, thank you for watching.
I hope you enjoyed yourselves. I hope you learned something and I will see you in the next video where I think are we talking about? We might be talking about something sequency. I forget.
I don’t know. I don’t want to go look and spoil it for both of us at this point. Anyway, thanks for watching.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 25% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.