Why Partitioning Is Not A Performance Feature In SQL Server
Video Summary
In this video, I dive into why partitioning a table is not inherently a performance feature but rather a data management tool designed to facilitate quick swapping of partitions in and out. I demonstrate how setting up a table for partitioning can negatively impact query performance, especially when using aligned nonclustered indexes. Through practical examples with the Stack Overflow database, I show that queries on partitioned tables can significantly slow down compared to their non-partitioned counterparts, even when using clustered indexes. The video also explores the trade-offs between creating aligned and non-aligned indexes, highlighting scenarios where non-aligned indexes might be necessary but come at a cost in terms of performance.
Full Transcript
Erik Darling here with Darling Data, and today’s video is sure to be a video. We are going to talk about why partitioning is not a performance feature. It is a data management feature. When you partition a table, the primary objective is to be able to quickly swap partitions in and out. Anything else that you do with partitioning, it’s not that much fun. Splitting merging and merging partitions, all that other stuff. Managing partitions. Not fun, not performance. But this video is specifically about how when you set up a table in order to facilitate data management with partitioning, the performance of some queries can quickly go south. What we’re not, what I’m not talking about in this one is mixing clustered columnstore indexes with partitioning, which can in some cases help performance by getting you additional segment or two additional segment or row group elimination, however you prefer to talk about it, because most people are not doing anything quite that exotic with their tables. Most people are in the world where they are, they got it through somehow got this idea through their incredibly fixed goals that if they partition the table performance will get better. Wrong. If you look down in the video description, you’ll see all sorts of helpful links. You can hire me for consulting by my training, become a paid up member of the channel. If you want to support this high quality SQL Server content that I put out here. You can also ask questions on office hours. Those are free. And I answer five of them every week. And of course, if you enjoy this content, I do I do ask that you like subscribe and tell a friend maybe even 1020 30 100. If you have another like 1000 friends or something, just send them all the links.
Spam them. Spam them. Screw them. The only conference that I have left on my schedule is going to be past data community summit taking place in Seattle, Washington, November 17th to 21st, where me and Kendra Little are going to do two days of probably the most spectacular T SQL pre cons you’ve ever seen in your life. So I hope I hope you’re prepared. Anyway, it is it is Halloween month. So we are getting we’re having a Halloween database party in here. So let’s let’s let’s get cooking with this partitioning stuff. Why don’t we. So what I’ve done in the stack overflow database is really the only table and stack overflow that lends itself to partitioning well, because it’s long and skinny is the votes table. And I partitioned by a column in the votes table called creation date.
Because creation date gives me sort of the most even spread of part of partition data, even though it’s not like perfectly symmetrical, it is far more symmetrical than if I did like vote type ID or user ID because lots of people who voted once. So like, you know, stuff like, you know, stuff like that. So like even post ID, things get like no votes, right? Like things get like one vote. So it doesn’t make a lot of sense. So creation date makes the most sense. The partitioning setup itself doesn’t matter too much. It’s a partition range, right? For every year that’s in the table. So it’s like 20 2008 to 2013. So if I say, give me the min creation date from the partition votes table.
This is very quick, right? This is very quick, right? This is very quick, right? This is very quick, right? This is very quick. This turns out just fine. I have no complaints with this. This is okay. Now I’ve created some indexes below that we’ll talk about in a minute. But I just want to show you a quick difference.
Now we’re going to be not using the partitioning column anymore, right? So but I want to force SQL Server to use the clustered primary key on both the votes partitioned and the votes table here. Okay, so we’re doing the same query. We’re just hitting the partition table versus the non partition table and using the clustered index for both of these queries. When I run these two, things start to get a little wonky. They both return the same row, of course, but the partition table takes nearly two and a half seconds to do the exact same thing that the non partition table did.
All right. And this is again, these tables are identical aside from the partitioning. They both clustered on the the I mean, this one is only clustered on the ID column. This one is, of course, clustered on creation date and ID, right? Clustered primary key. But we’re selecting vote type ID. So that’s a little immaterial to this. They’re both page compressed. So there’s nothing weird there.
This just the just immediately aggregating vote type ID from the partition table takes like two and a half times as long, right? Go from one second to two and a half seconds. So already we’re at a little bit of a detriment. So the indexes that I’m creating the nonclustered indexes I’m creating this to show you some differences are we’re going to create one on vote type ID on the votes table, right? So this is not partitioned. We’re going to create one on the partitioned votes table that is aligned to the partitioning scheme, right?
So this is aligned on the partitioning scheme on creation date. And then we’re going to create a non aligned index on the votes table. So this is on primary rather than being on the partitioning stuff. And the reason I want to show you this is because once you create a non aligned index on a partition table, you lose all the ability to swap things in and out. Having a line nonclustered indexes means you can quickly swap partitions in and out of the table. Having non aligned indexes mean you’ll get an error if you try to do that.
Now, if your environment can if it’s palatable to your workflows and environment, you could, of course, create non aligned indexes to help queries. And then when you need to switch data in and out, you could drop them, switch the data and recreate them. That still might be faster than whatever you were doing before. I don’t know. It’s all about if you’re if it’s tolerable to your workflows or not.
I can’t tell you that unless you hire me. Don’t worry. My rates are reasonable. Anyway, we’re going to compare a few different queries using the aligned. Well, we’re going to compare to the non partition table and then we’re going to compare using the aligned and non aligned indexes on the part on the partition table. So the first one we’re going to do the same thing as before, but now we’re going to tell SQL Server use like this is really just for demo clarity.
And also because sometimes the optimizer will choose the non aligned index anyway, because it’s like, oh, yeah, I’d rather use this thing. So like I’m forcing SQL Server to use these just to show you like what things might look like if you only have aligned indexes on your partition table. So we’re going to run this one. This is against the non partition votes table, right?
And we’re telling it to use the the vote type ID index. This all finishes very quickly. Zero milliseconds, yada, yada. Fine. If we use the non aligned index on the partition table now, right, this is the non aligned version of the non non aligned nonclustered index. This also goes just fine, right? SQL Server very quickly finds that once we go and use the aligned index on the partition table.
So this is the one that we created that is aligned to the partitioning scheme. It is not so instantaneous anymore. This takes nearly two seconds. This almost exactly reflects the scanning that doing when we use the clustered index for this query took two and a half seconds. So adding a nonclustered index shaves like half a second off this. Let’s just be fair and say like, well, 1.898. That’s almost 1.9.
So let’s just call it like 600 milliseconds. We save 600 milliseconds creating an index. Not great. Right. Not not moving the needle for me. So like we can see that like having aligned indexes on partition tables is immediately slowing some types of queries down. I’m going to show you another type of query that slows things down. So let’s just say that we wanted to get the top five rows. Right.
So we’re going to say get the top five from the regular votes table and we’re going to tell it to use the vote type by the index for this. Right. Just to get demo clarity. We’re forcing the index here. This very quick. Right. Execution plan. Just what you would expect. We get five rows. We loop join. We key look up. We get the top five. Right. Take zero seconds across the board.
So we’re going to say that. Right. If we use the non aligned index on the partition table. Also very quick. Does the same thing. We have a compute scale error in here now, but the whole thing takes zero milliseconds anyway. Okay. Well, what if we do this on the aligned index? I could run this here. I could. But we’d be waiting a little while.
So I’ve run this ahead of time. You’re welcome. I try to save you some time because Microsoft won’t. If we run this and we get the top five from the partition table using the aligned index. This takes a full minute and six seconds. This looks a little bit different, doesn’t it? Right. If we look at what this does, we will be scanned the whole index. So we go from six seconds to almost 13 seconds. That takes about seven seconds. And then we spend the rest of the time. Well, we spend 45 seconds doing a 52 million row key lookup.
And, you know, nested loops joined to facilitate the key lookup. Right. So one row comes out of here. We go look it up down here and we do that 53 million times just about. Right. And then, you know, I guess a minute and three seconds there and then another three or so seconds getting out to the end of the plan here. So that plan looks a whole lot different from the plan that we got using the non aligned index. The non aligned index just goes, finds the top five rows, goes and looks up the extra columns we need and we’re done.
You’re not do all that other stuff. We are not able to get down to just the five rows when we use the aligned index. Again, the query plan for that is over here. And we are going to end the video with this nicely framed up because that’s a good way to end this video. So to summarize, partitioning, it’s a data management feature. It’s there for you to quickly swap partitions in and out of your partition table.
If you combine partitioning with clustered columnstore, you may see some performance improvements around segment or row group elimination. We didn’t dig into that in this video. We’ll do that in another video, probably. But there are lots of queries that SQL Server might have a very hard time with in your workload. If you decide to partition a table and not not index things any differently, you might need to create non aligned indexes to make these queries faster.
And you may need to drop and recreate those indexes around your partition swapping, assuming that you’re going to do that sort of thing. If you’re not going to do that sort of thing, there is absolutely no sense in you partitioning a table and then creating a bunch of non aligned, nonclustered indexes on it anyway. All right. I think that’s good enough here.
Thank you for watching. I hope you enjoyed yourselves. I hope you learned something. I hope that you won’t partition your tables and I will see you in the next video where we will talk about something equally as interest. All right. Thank you.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.