Learn T-SQL With Erik: Partitioning and Column Store

Learn T-SQL With Erik: Partitioning and Column Store


Chapters

  • *00:00:00* – Introduction to Partitioned Columnstore Tables
  • *00:05:02* – Querying Data and Segment Elimination
  • *00:10:37* – Vote Type ID and Common Data
  • *00:13:02* – Conclusion and Future Topic

Full Transcript

Erik Darling here with Darling Data, going through some more of the Learn T-SQL with Erik snippets, things that I feel are important for people to know about, regardless of if they purchase the course or not. But, of course, because there is a course for sale, then I would, of course, appreciate it if you purchase the whole course so that you can learn all the things that I think are important for folks to learn. So, today we’re going to talk about partitioning in columnstore because there are important differences between partitioned columnstore tables and partitioned rowstore tables. One of the sort of superpowers that columnstore has is the ability to use metadata about which row groups have which data in them, and it can skip entire segments that do not contain relevant data.
That’s a wonderful thing. That is something that rowstore indexes do not really have. All right. So, la-dee-da. But today we’re going to kind of look at how those two things, how these two things pair together and how, you know, much like with, you know, normal partitioning, the way that you access data does sort of rely on the partitioning key in order to make this as efficient as possible.
Before we do all that, down in the video description, you will find the most important links in your entire life. You can hire me for consulting. You can purchase my training materials, including the Learn T-SQL with Erik course, which has a little coupon code attached to it at the moment. You can become a supporting member of the channel. And then totally for free, you can ask me office hours questions and you can like, subscribe and tell a friend all about how wonderful all of this content is.
So they can, they can, they can learn as much about SQL Server as you, right? Wouldn’t that be just great? Another incredibly important link down in the video description is where you can get my free SQL Server monitoring tool from. Uh, it’s totally free, totally open source. There is no obligation or requirement on your part to, uh, talk to me or, uh, you know, pay me anything for it.
Uh, it’s just something that I wanted to do for the SQL Server community because I don’t feel that the big monitoring tool companies are doing their jobs well and haven’t been for a long time. And this is my way of writing that situation. Uh, it is a bunch of T-SQL collectors that would grab the same information that I would if I were, uh, doing a performance, uh, analysis or tuning engagement.
Um, it displays all that information in beautiful charts and graphs and gives you all sorts of abilities to click through those charts and graphs in order to find the root causes of your performance problems. If you’re too busy for all that, you can have the robots do it. There are optional MCP servers where you can, uh, let the robot of your choice talk to your performance data that I’ve been collecting.
And it can give you all sorts of nice summaries and talk through things and, you know, just give you general advice on what’s going on or at least information about what’s going on on your server. Uh, so that you are better prepared for the work you have to do. Uh, anyway, if you prefer the, uh, the old human to human thing, uh, you know, I, I still like human contact once in a while.
That’s why I’m going to all these things. Holy crap. That’s a lot of travel. Uh, I am just in order. I will be in Chicago, Illinois, uh, May 7th and 8th for pass on tour East, which is sort of weird cause it’s in the Midwest, but you know, what, what can you do there? Uh, then I will be traveling from Chicago to SQL day in Poland, May 11th through 13th.
Uh, then I’m home for about a month and I will be back out, uh, in the world going to data Saturday, Croatia, June 12th and 13th. And then, um, at least until someone makes me an offer, I can’t refuse, I will be, uh, doing nothing until November where I will be in, uh, Seattle for, uh, pass summit West, November 9th through 11th. I’m not quite sure what to call that one right now, but I’ll work that out.
Anyway, the meantime, let’s, let’s get beery and cheery and let’s talk about partitioned columnstore tables. So, uh, what I’m going to show you first is this query, right? And this query is going to hit a table that I have pre-created, uh, that is the votes table from the Stack Overflow 2013 database.
Uh, it is a clustered columnstore table and it is partitioned by the creation date column by year, right? So we are not partitioned by bounty amount, but the bounty amount column is still interesting. Because if we select some data from this table and we look at the messages tab, we will see that we read 33 segments and skipped 71 segments.
But why? Well, if you’re interested, uh, a part of the course materials is a view that I’ve created that gives me details on what’s going on with my, uh, columnstore stuff. And if we look at that, we will see that we have partitions two through seven and partitions two through five are eligible, right?
Where bounty amount equals zero exists. That is two plus eight plus 11 plus 12, which I’m pretty sure is 33. And then we have, uh, some ineligible parties, segments and partitions, uh, where we have, of course, six with sort of overlap between and partition five there.
But there are, uh, six ineligible segments in, uh, partition five and then 28 and 37, which I am fairly sure adds up to 71, but, uh, I haven’t got 71 figures to sort that out with. So I will let you do that math on your own. So the message really is with all this.
And if you just feel like this being a TLDR for the whole video, that’s cool with me, but just like with, uh, any other index, um, but particularly with partition columnstore, uh, because the creation date column is our partitioning key. If we want to really team up, um, you know, the, like, like segment elimination plus partition elimination, we need to access data via the partition column. So if we run this query, we will see this is, um, uh, looking for bounty amount equals zero.
And the bounding is 2010. Well, just the year 2010, right? Greater than equal to 2010.01 less than, uh, 2011, less than 2011.01.
And we look at the messages tab, we read 11 segments and skip zero, right? So all 11 segments that we, we could have read, we read, right? And that the segments in here are going to be ones that like, like, like I said, in another video, like the batch mode plans don’t show us which, how many partitions we touch.
You know, kind of leaves a little bit of guesswork to the, to the curious mind. But, um, this is how many, uh, like, because this is partition by year, we’re just looking at the year. We’re just looking at the partition that has data for the year 2010 in it.
And there are 11 segments within that year, right? So we didn’t skip any of them, but we were able to skip like all the segments and partitions that aren’t relevant. If I run this query where I’m looking at 2010.12.02 through 2010.12.31, right?
So this is just a portion of, uh, the year. And we look at the messages tab, then we’ll see that we only read two segments and we skipped nine of them, right? So we were able to like, even within one partition, we were able to skip segments that didn’t have relevant data to us.
So that’s, that’s pretty cool there. Um, I forget what this query is supposed to show. I should have left some notes for me for myself, maybe.
But, uh, anyway, uh, we can see that only some of these, um, have the bounty amount equals zero in them, right? So the min data ID in these is zero. The min data in these two is not zero.
So, um, there are two partitions where that have no zero bounties in them, which is a little weird, but not the end of the world. Anyway, uh, if querying the data in other ways kind of gives us different, you know, um, sort of fingerprints, uh, for segment and partition elimination. Uh, none of these are, I don’t think any, there are any examples of partition elimination for the following queries cause we’re not hitting creation date anymore.
But if we look at, come back to this query, we’ll see that there are, you know, all like the max data ID and, uh, partitions three through seven is 550, right? So if we run this and look at it, we’re going to hit like just about everything, but we are able to skip, like, because we’re like, we’re going to read all the partitions, but we’re able to skip a lot of segments based on the metadata that SQL Server has about, um, like what’s where. And, uh, the same thing, if we have a really high post number, right?
Like SQL Server is going to be able to figure out like, Oh, like I, I don’t need to read a lot of this because like, like this post ID has to be like, like, like it’s a very high post ID relative to the, like, like max post ID in the post table. So that, that post ID doesn’t, hasn’t been voted on until like recently. So we’re able to skip over a lot of segments in order to get to just the segments where that post ID might exist.
But if we, um, if we query by a very low post number, right, this is a very high post number. This is a very low post number, like post ID one, three, eight people have been voting on post ID one, three, eight, like basically since, you know, I don’t know, like forever, right? That’s one three is a very low post number.
So we can’t eliminate anything with that one. We have to read all hundred and four segments. We don’t skip any of them. Uh, same kind of deal with vote type ID 16, right? Uh, well, I mean, maybe not totally, but we’re able to skip eight or sorry.
We read 83 segments and we skip 21 vote type ID 16 is of course approved edit suggestion. Uh, so I guess that’s kind of okay. Um, another thing that will not help you in your journey to, um, and you know, it, it, it, it, it, it, it giveth and it taketh away.
So like, you know, people are most interested in data that is, you know, unfortunately sort of common, right? Um, like at least in the stack overflow world, vote type IDs two and three, those are up votes and down votes. Those are the two most important types of votes that, you know, um, that we can get.
So like, let’s, you know, these two, right? Up mod and down mod, right? Oh, that, not that first one. So like, these are going to be common. These are going to be like scattered throughout all of the partitions and all of the segments because, you know, they’re the most common thing.
We could pair this maybe with, you know, like some year boundaries in order to get partition elimination. But if we just need everything, you know, we still have to read through all the segments where this data might exist, but it’s okay. Like generally, I think, because I mean, you know, reading through a 53, actually this, this table is twice as big because I have to put double the amount of data in the clustered columnstore table in order for it to, um, really make, uh, like any sort of performance blip happen.
Um, so this table is actually very, very, uh, about a hundred and something million rows. Um, you know, like we, we do read a lot of them and we do a lot of that, but like the whole thing takes 18 milliseconds reading from that. And, you know, the whole query finishes very quickly anyway.
So columnstore great for reads, right? Very, very, very much speed up read queries. Uh, you can certainly pair it with, uh, partitioning in order to get partition elimination plus segment elimination, which is, uh, can be very useful for, uh, very large tables, but most tables probably won’t need it.
Um, most tables that, you know, I would say, uh, under the like a hundred million or so roll mark, aren’t going to see vast benefits from adding, from, from commingling partitioning and columnstore. That being said, if you anticipate this, these tables growing to very large sizes, you may want to start that journey early. You may not see the benefits immediately, but once that table gets, if that table that’s like 200 million rows, doesn’t see a lot of like, you know, uh, action from that.
You might try, you might start seeing it around like 500 million or a billion rows where all of a sudden you’re able to cleave out not only, uh, segments, but also you’re able to get partition elimination. And whittle down and figure and just read smaller numbers of segments, even within a partition. Anyway, that’s about it for me here.
Thank you for watching. I hope you enjoyed yourselves. I hope you learned something. I hope that you will buy the course, which is helpfully linked for you down in the video description. And I will see you in tomorrow’s video where we are going to talk about partitioned views. So we have that brilliant thing to look forward to.
Anyway, thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.