Column Store, Delta Store, and More
Video Summary
In this video, I delve into some fascinating aspects of columnstore indexes and the Delta store in SQL Server, showcasing how data is inserted and managed within these structures. I walk through an interesting demo where I create a table, insert rows multiple times to observe changes in the Delta store and columnstore index, and discuss the tuple mover—a background process that compresses row groups but can sometimes disrupt my carefully planned demonstrations. Despite this little hiccup, the video offers valuable insights into how SQL Server handles data insertion and compression, providing a deeper understanding of these crucial components for database performance tuning.
Full Transcript
Erik Darling here with Darling Data, celebrating another beautiful Friday in the life of a SQL Server consultant. Lord knows why. In this video, we’re going to talk about some stuff with the columnstore, the Delta store, and a little bit more. This video doesn’t have one sort of overarching point to it. It’s sort of a few different spectacles, all in one lovely demo. So, we’re going to initiate ourselves, we’re going to avail ourselves of some interesting stuff. But before we do, I need to harass you about some other things, some financial priorities. If you need help with SQL Server, I am, like I mentioned, a SQL Server consultant. I specialize in SQL Server performance tuning, and you can hire me to do that for you. I will make your SQL Server faster in exchange for money. There’s a link down in the video description that offers you one of many ways to hire me to do that. There are also links down in the video description where you can buy my training, which is always reasonably priced. You can also choose to support this channel with money, as low as $4 a month, you know, pre-tax in my pocket, in order to keep me doing this sort of thing.
You can also do some other stuff, like ask me office hours questions. I do enjoy an office hours question, or five, every Monday. And of course, if you enjoy this content, you should probably like, subscribe, and tell the giant, massive friends who hang eagerly on your every word about this channel. So, I will be out and about in the world, oot and a boot for the Canadians. Data to Nashville, March 6th and 7th. And data Saturday, Chicago, March 13th and 14th. I will have advanced T-SQL pre-cons at both of those.
So, please do, please do show up. Because if you don’t, I will cry and drink and it will just be a real meltdown nightmare disaster on stage. And you don’t, you know, I mean, maybe you do want that. Depending on, depending on how much you’ll, you may enjoy that actually. I don’t know. But anyway, let’s go, let’s go talk about this columnstore stuff. So, what I’m going to do is, I’m going to do, well, of course, I’m going to drop a table if it exists. And I’m going to create a table. And then I’m going to load some initial data into a columnstore table.
And then, just for fun, what I’m going to do is, I’m going to reload those same rows into the columnstore four times. Right? So, we’ll say go three here. And every time we run this, so go is fun because it’ll start an execution loop when you put a number after it. And so, every time we insert rows into this table, we’re inserting a larger number of rows. Right? So, it’s like another million and then two million and then four million go into this table.
Now, something kind of interesting happens on this insert. And if we did this insert differently, and I’ll describe what differently means in a moment. But when we do this insert, SQL Server chooses to take data from a columnstore, spool it into a row mode eager spool, and then insert it into the columnstore. Right? So, this is, again, row mode. So, SQL Server unbatches all those rows to insert them into the table.
If we put that data into a temp table or even a temp table with a clustered columnstore index on it, SQL Server would not unbatch it. But because SQL Server row modes all that stuff, we end up with a bunch of sort of Delta store data. Right? We can see we have this clustered index.
So, the reason why this is funny is because if you go to the documentation for the view over here that we’re looking at, Sys.dmdb index physical stats, it’ll say, where is it? Down here at the bottom, maybe. Oh, no. It was up higher. This thing is lying to me.
Limitations. Does not return data for clustered columnstore indexes. Right? And, well, I mean, that’s true. There’s no data about the columnstore index in here, but there is data about the Delta store, which is a clustered index that sort of brain leaches onto the clustered columnstore and shows us uncompressed stuff.
Now, SQL Server does have a background process called the tuple mover. That’s when we ran that that time. There were fewer rows in there. So, SQL Server does have a background process called the tuple mover, which will, in the background, go find open row groups and attempt to, sorry, close row groups and attempt to compress them down.
So, when we look at the table now, we see we have some of these compressed. We have one of these that’s open. We have three of these that are still closed. Now, that’s all a side effect of, again, the row mode eager spool between the table and the insert.
So, now, if we run an alter index and we say compress all row groups, well, this will run for a little bit. And then, if we look back at this, now we’ll have these two tombstones in here. So, SQL Server is, like, sort of, like, compressed a bunch of other stuff, but left these two tombstones around.
Right? So, if we go and look at, like, what’s in here now, we go look at the physical stats for this. Now, we just have this one thing remaining. Maybe the tuple mover kicked in and did something. I don’t know. Maybe the tuple mover is ruining my demo.
Yeah, it is. See? Now, those two tombstone row groups are gone. Normally, I would have to run reorganize again a second time to force the tombstone sort of cleanup process to run. But this time, the tuple mover beat me to it.
I just timed this demo terribly because every other time I had lots of time to talk about this stuff before the tuple mover did anything. So, now this reorganize won’t do anything. But I do think that there is one interesting element to this in that if we look back at the physical stats for the columnstore thing, we still get one row back from here.
And there are a couple columns in that view that do reference columnstore data. Apparently, this is non-clustered columnstore stuff because, like, this just says not valid for the clustered columnstore one. So, I guess non-clustered columnstore, it does show more.
But what this is showing us now is an empty delta store. All right? Because we have no pages and we have no records in there. All right?
So, this is the empty delta store. SQL Server doesn’t remove the delta store entirely. It just leaves this empty thing hanging about just in case we need to put more data in there. If I ran, like, a bunch of inserts again to this table, then I would end up with probably rows in the delta store as long as SQL Server continued to choose that eager row mode eager table spool in between the insert and the select. So, as long as we still have this thing unbatching rows, then we end up with a delta store.
Even though we are inserting a number of rows that qualify for compression. So, that’s fun. Anyway, I found this all terribly interesting and also terribly annoying.
And despite the tuple mover kind of ruining one of the highlights here on needing to do the double reorg, I’m still okay with it because, you know, at least we got to see the tuple mover work in action, right? We got to see that it actually does something.
It doesn’t just sit there idly waiting for us to do things. It wakes up about every five minutes or so and says, Oh, look at all these row groups. They’re closed.
I got to compress them. And it does that. You can turn that off, but we’re not going to talk about that. Anyway, it’s Friday. I’m tired. I don’t want to do this anymore.
So, thank you for watching. I hope you enjoyed yourselves. I hope you learned something. And I will see you back Monday with our usual office hours thing that we do on Monday. Because that’s what we do on Monday.
All right. Have a great weekend.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 25% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.