Learn T-SQL With Erik: Partitioning != Performance
Fair warning, the audio got pretty garbled on this one. If you don’t like space alien voices, hit mute and/or skip ahead to the transcript.
Chapters
- *00:00:00* – Introduction
- *00:01:32* – Course Information
- *00:02:19* – Personal Appearances
- *00:03:17* – Upcoming Events
- *00:04:11* – Data Management Feature
- *00:05:01* – Table Partitioning
- *00:05:58* – Batch Mode Limitations
- *00:06:33* – Query Execution
- *00:07:02* – Indexing Issues
- *00:08:04* – Column Wrapping
- *00:09:01* – Query Plan Analysis
- *00:10:22* – Non-Aligned Indexes
- *00:11:03* – Index Optimization
- *00:12:19* – Query Performance
- *00:13:15* – Time Comparison
- *00:14:00* – Query Rewriting
- *00:15:02* – Partitioning Challenges
- *00:16:00* – Data Swapping
- *00:16:45* – Conclusion
- *00:17:06* – Next Video
Full Transcript
Erik Darling here with Darling Data. And today’s video, we’re going to talk about how in SQL Server, at least with rowstore indexes, let me make sure that I lay out all the caveats up front before some actually person shows up and like, before they even watch the video, because Lord knows there’s a lot of that on the internet. Partitioning with rowstore indexes is not a query performance feature. It is, of course, a data management feature. Before we get into that, though, I did want to thank the nice folks at Mother Duck for hooking me up with a nice rubber duck thing here. It says data person over there, backwards hat and glasses, kind of like Run DMC Duck. And on the bottom it says, Mother Duck, you’re the one, you make data so much fun. I thought I was making data fun, but apparently they make data more fun. So thank you, Mother Duck. Down in the video description. You will see all sorts of helpful links. If you feel like spending some money with me, on me, together, you and I. We can go shopping. You can hire me for consulting, buy my training, become a supporting member of the channel, ask me office hours questions. And of course, if you would like to, you know, I don’t know, hang out and make this channel more socially acceptable, well, you can like, subscribe and tell a friend.
Get notified when I publish these works of art. The material that we’re talking through today is, of course, part of my Learn T-SQL with Erik course. This is just a small snippet bit of the full course material. And there’s a link with the coupon code down the bottom where if you feel like purchasing the entire thing, then you can do that at a discount. Another thing that you can get absolutely for free is my new SQL Server performance monitoring tool. All right. It is a free open source, no email signup, no weird telemetry telling me about your SQL servers. It’s just a bunch of T-SQL collectors going in, collecting all the stuff that I would look at during my consulting engagements. And it’s all there to help you get a handle on SQL Server performance.
It is basically all the stuff that I answer questions about when I’m working with people. So I thought it was a pretty good thing to just let loose into the world. You know, there’s only one of me. I can’t scale beyond this one of me. I’ve had a very hard time teaching people to do the correct things with SQL Server. Despite all the years of blogging and videoing and training and everything else, it seems like people are just scared of SQL Server. I don’t get it.
But if you feel comfortable with doing robot stuff, there are optional opt-in MCP tools that you can use. It’s got a server built right in there so you can have your very favorite robot talk to your performance data and answer questions about it. How good those answers are, I can’t tell you. The summaries are at least pretty good, but at least some of the advice is maybe not quite all there yet.
The robots haven’t gotten the message on some things. Anyway, if you like human-generated advice and all that other stuff, you can catch me out and about in the world, traveling all over the place, trying to bring enlightenment from one human to another.
Sort of in order here, I’ll be at Passa with my right hand removed from being my body. Pass on tour in Chicago. That will be May 7th and 8th. That is, I don’t know, I guess about five weeks from now.
At least as of this recording. SQL day in Poland, May 11th through 13th. That is a three-day adventure. That is not a two-day adventure. Then I’ll be home for about a month and heading off to Data Saturday, Croatia, which is going to be a grand old time.
That’s June 12th and 13th. And then, at least as far as I know, that’s all I have going on with my life for a little bit. And then in Seattle, Washington for the Big Pass Summit, November 9th through 11th.
So, if you, again, want to come get a human hug, this is where you can get human hugs from. At least from me. I mean, you might have other sources of human hugs that you prefer.
You may just want to learn about SQL and not human hug me. I understand either way. It’s fine with me. But for now, man, there is so much baseball to go that I am just a very happy person. If you are the type of person who likes buying stocks, playing the market, buy some Coors Light stocks because it’s going to be a busy summer for me.
Anyway, let’s talk about partitioning. So, I’ve got a table called Votes Partitioned. It’s already partitioned because who would want to sit there and watch me partition a table?
It’s not an enjoyable experience, right? But it is partitioned by, let me make that a little bit bigger. Thank you, Aaron Stilato, for project, for product managing this wonderful live result set scrolling, zooming into SSMS.
Of course, you could always scroll results. Now we can zoom on results. But this is my Votes table.
It is partitioned by the creation date column. And it is partitioned by year, right? And you can see that I have followed the partitioning bible. I’ve got an empty row group on either side.
And, well, everything is pretty okay there. Now, it’s sort of annoying. Now, this has nothing to do with partitioning performance in general. So, it’s annoying that batch mode doesn’t show the details of which partitions were accessed in the show plan XML.
So, for a lot of these demos, I’m going to be disallowing batch mode so that I can show you kind of what’s happening with them. So, if we run this and we look at this query, or rather, we look at this query plan, we say, look, we executed our execution plan. We did a great job.
We have obeyed many, many rules of partitioning. And we will see that we only had to access one partition to find our data, which is wonderful, right? Because we just looked for the year, the data from the year 2013, right?
And since our data is partitioned by this, we can find that data easily. The thing is, there is absolutely no difference between seeking into a partition like this and there is seeking into a B-tree index that would happen to lead with creation date. Because our clustered primary key on this table leads with creation date.
So, we can do that sort of thing. It’s almost like just having an index. It’s like, now we have all sorts of other stuff now kicking, going in, and getting a problem, and causing problems for us. So, some things that we normally need to not pose tremendous issues for C++ server, things that cause issues for partitioning.
For example, our partitioning problem is in the daytime. And it seems that Antiglator’s optimizer hasn’t been able to handle this as a study for lately. But, uh, when, uh, when, uh, when, uh, when, uh, when, uh, when we, uh, when we declare a local variable in the day, we think that we need an option to compile that we’ve done here.
That will allow the primary writing optimization to happen. Uh, um, we don’t necessarily get the C++ data results. Unless that battery maybe gets not quite as snappy as the other area. And now we have this whole kind of a strange area where we are.
And we can see the access to all of the partitioning, even though C++ server is able to handle that in a, in a, in a similar way. But, uh, the thing that we know more in that sub-clarity reform is also sub-clarity, uh, partitioning in the nation. So, wrapping a column, like, uh, wrapping a column in the, in the year function, just as bad partitioning as it is with, uh, as it is with the normal index.
Um, and, of course, if you try to VCC and run on any kind of, uh, even, even something that is sort of transparent from the optimizer is convert for eight. Right? Because it’s, like, like, again, C++ server is a smart number of this stuff.
But, um, if you try to convert from eight, eight, uh, here, we, we do not get the results that we would, we would want to get, you know. So this query is also not quite as snappy. And if you look over here, here, you will see, uh, well, this actually doesn’t, it doesn’t show anyway.
I’m not sure what happened here. We don’t even get the actual number of partitions that we, that we, we used on there. So, that’s a fun one.
Apparently, we eliminated it and did not. So, so, the other problem you can run into with, uh, tables that you have partitioned is around line indexes. So, uh, it, what, what, what we’re showing here, you’re getting an end on creation date.
Right? So, creation date dates, the partitioning column. So, those bills don’t justify, like, getting in a creation date, these, these, these query plans, and it’s sort of, or, if you, you know, I, first of all, this used to look at the query plans like this, that, like, basically just says they need to top one, like, from, um, the, from the table, and, like, it’s fine, right?
So, it’s very easy to do. But, if we, uh, run on that query read, we say, hey, I want to get no type by date. And I have an unhonustered index.
I have a line to the partitioning scheme and everything else. Because I want to be able to swap data. And, oh, I do not find my indexes. And I cannot swap data.
And that would lose the data. And it manages the partitioning. Well, this, this doesn’t, doesn’t wait until so soon. Right? See, the results of my list does not have the yielded ability to do things in the same way. Notice, when I’m joking, we have that possibility.
Now, we have a plan with a two-screen magnet. There’s a paracarons. Yadda, yadda, yadda, yadda. And you look over here. Remember, we’re going to see, we, just, just, just to find the new type ID. We looked at all of the politicians.
Right? Right? So, non-aligned indexes can compose real, real politicians. We looked at all of the politicians. Just to sort of contrast that in some of the same thing. If I want to get, like, a min, min, type ID in the votes table.
I already have it. I have the same, basically, like, that index up, you know, I’m going to hover it over right over there. There.
But we need to type ID. So, see, this should be explained to my min, min, type ID very, very reasonably. So, instead, it looks at all of the partitions and scans things. Right? The optimizer is just very, very interesting. The same thing with that.
If we look at this, this unpartitioned table. And we say, getting an min, type ID, I have. I have the same, basically, like, of course, the table of the partitions and the index can’t be aligned.
It’s just, like, I have another index that we can use a type ID. We can get a query branch, which you want. We’ll say, say, again, get the top one, so I can find . Partitioning doesn’t mess that up.
So, we look at some differences here, right? I’m going to run all these queries together. So, the first one is using a non-partitioned table, right?
I’m saying . The second one is using a non-aligned index on the partitioned table, and the third one is using the aligned index on the partitioned table.
Three very different performance quiz proposals. The first thing, we have the plan shapes that we want. We just, in order to find my name from a query from a table, it has that index, right?
The index presents that data in that order. Even if it was in the sense in order, it would just say, okay, go to the other third and work. These are the plan shapes we want.
But using that aligned index, we have to scan all that, but the 3.5 steps in a query, instead of that 0, the second query we do using the non-aligned index. On the partitioned table, right?
So, that’s also something to think about. There are other types of queries that can pose similar performance outputs. So, for example, if I say, you can give, like, you can give, like, you can give, like, you can give, like, you can give, like, you can give, like, you can give, like, you can give this list of the query we’re in shape that you would expect.
So, I say, you can give the top five, I know this index is equal to type ID, just to make sure that, like, it stays consistent. And, and, and, a lot about type ID, this decision is very quickly, we have a short story in here, here, like, this is all zero analysis, right?
Nothing in here is taking time. If I choose the non-aligned index on the, pardon, that is still fast, and that is still, just about the same data we got before, and maybe not exactly the same, but it is, it is close enough here.
Here. This line index, if I tried to run here, it would take a lot. If we come over here and look, you’ll see this query, when it ran, took a full minute, exactly, to the second, one minute, right?
We scan that index, that 35 seconds, that can be sorted, maybe this build, that in about six seconds into the mix, and then between, uh, let’s see, uh, 41, what’s, that’s 52, so, uh, this is about six seconds, the loop joint itself, 41 seconds, going to be, look out, 52 million, right?
Uh, that’s not great. That’s not a good strategy. Right? Uh, all sorts of queries, in the central server, can get very, very strange, once you lose partitioning.
So, so, the next time someone sees the partitioning of the formal feature, laugh at them. Say, it’s a date of the internet feature, and if you’re looking at the closest, it’s, I mean, it’s not what you can find, right?
Because, they’re asking for it. This is, it’s not new information. Uh, you can work with robots a bit, but, like, like, if you grab the, uh, partition numbers from, like, all these tables, and dump them into a table, like a table variable, you find here, here, and if you say, give me these things, and then, and you rewrite my query, right?
So, we just put all the partition into that table, and we have to rewrite our query in kind of a strange way, right? Let me say, type ID, uh, from, the cross-apply button here, right?
In here, we are saying, okay, well, we need to take the partition numbers that we have just put into that to them table, and we need to record them, we need to use those weird, weird dollar sign partition functions, and we need to feed them the creation date column, and we want the table to partition it by, and we say, hey, match the partition number, and that turns out, alright, right?
It’s like, almost the square here, and shape, shape, and line. It’s a little bit more complicated, but it does, does, does best define it. But there’s a lot of times when you just, you’re not going to rewrite my query, as much as possible.
If you’re using ORM, good, good luck with that, right? Dummiesies. If you’re dealing with a vendor application, where you can’t rewrite the code, maybe you’re as free as you want, good luck with that, right?
There’s like, things that you can do, but, they’re not always straightforward, right? They’re not always easy as things in the world, well. And, if you, but if you’re allowed to rewrite the query, you’re going to take advantage of some of this stuff, you can work around it, since the social distancing is partitioning, and get performance links you would get, if you didn’t bother to partition that table in the first place.
So, again, partitioning, and these are sort of, with rows or indexes, it’s not really a strong component feature. If you need to swap data, and all that other stuff, I, I get using it.
But, you also have to, in order to do that, you have to have your indexes aligned, to the partitioning, and if you don’t do that, and you will not really need to do the switch choices. Which is, probably, you might not want to do that in the first place.
But, when it comes to this, like, like, like, normal, running and hitting stuff, partitioning is not a solution, that you should be exploring for that.
Like, just this normal indexing the table, you know, much, much more abundantly, much, much more efficiently, without having to redo the entire table, and worry about all this stuff that comes along with partitioning, and then getting a very, very client, when you at least expect it.
Anyway, thank you for watching, I hope you enjoyed yourselves, I hope you learned something, and again, this is, this is a snippet from my larger environment, he sees, there’s a link down that video description, with a coupon for the cash, if you want to purchase the whole course, and learn the full breadth of the material, you can do that.
Anyway, thank you for watching, I hope you enjoyed yourselves, I hope you learned something, and I will see you in tomorrow’s video, where we will, inspect, partitioning with columnstore indexes, where there can be some performance benefit.
Alright, thank you for watching.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.