I’m still looking into this a bit, but I ran into this issue when helping set up a process to migrate data to a table because the original table was running out of integer identity values.
The process looks something like:
- Create a new table with the right definition
- Write a loop to backfill existing table data
- Create a trigger to keep updated data synchronized
- Eventually swap the tables during a short maintenance window
All well and good! Except… You can end up with a really weird execution plan for the backfill process, and some pretty gnarly memory grants for wide tables.
Demonstrating with the Votes table:
CREATE TABLE dbo.Votes_bigint ( Id bigint IDENTITY NOT NULL, PostId int NOT NULL, UserId int NULL, BountyAmount int NULL, VoteTypeId int NOT NULL, CreationDate datetime NOT NULL, CONSTRAINT PK_Votes_bigint_Id PRIMARY KEY CLUSTERED (Id ASC) ) GO INSERT dbo.Votes_bigint ( Id, PostId, UserId, BountyAmount, VoteTypeId, CreationDate ) SELECT TOP (100000) v.Id, v.PostId, v.UserId, v.BountyAmount, v.VoteTypeId, v.CreationDate FROM dbo.Votes AS v ORDER BY v.Id;
The query plan has a Sort in it, which is weird because… Both tables are primary key/clustered index on the same column. Why re-sort that data?
I’m showing both serial and parallel plans:
The Sort is always on this expression:
Which in the XML looks like this:
[StackOverflow2013].[dbo].[Votes_bigint].[Id] = [Expr1003]
Thanks for reading!
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly.