Learn T-SQL With Erik: Pivot and Unpivot
Video Summary
In this video, I dive into some essential T-SQL techniques by discussing pivot and unpivot operations, key components of data transformation in SQL Server. Starting off, I explain the basics of pivoting data, showing how to transform vertically-oriented results into a horizontal format using aggregate functions like COUNT. I highlight the importance of using derived tables to maintain accurate results when working with pivots, emphasizing that pivots can be sensitive to implicit grouping and can produce incorrect outcomes if not properly managed. Additionally, I cover unpivot operations, demonstrating their forgiving nature compared to pivots and how they handle null values differently. By walking through practical examples, I aim to provide a solid foundation for anyone looking to manipulate data effectively in T-SQL. This content is part of my new course, “Learn T-SQL with Erik,” where you can find comprehensive beginner-level material available at a pre-sale price of $250, with the option to purchase advanced modules later this summer.
Full Transcript
Erik Darling here with Darling Data, and we’re going to do some more teaser-y content from my new course, Learn T-SQL with Erik, in which I teach you all of the terrible things that I know about T-SQL, which is unfortunately a lot. Don’t worry, there’s a workaround. So all 23 hours of the beginner content is out and available. You can start watching that now. The course is available at the pre-sale price of $250 bucks, and that goes up to $500 buckaroos when the advanced material drops after the summer. I do have some things that I must attend to and then get that all wound up for you. So anyway, let’s talk in this video about pivot and unpivot because we might as well talk about something, right? As we’re doing a video about T-SQL. Now, most of the queries that you write will return results that look something like this, where the results are vertically presented and we have a column called type and a column called total type, which I have, of course, aliased as the count of things in here. And then in this column, we have what, you know, the text value of what each type of post is named and account for them. But there may be times in your life when you need to sort of flip this data on its side. And when you do that, that is when things like pivot are very helpful. So what I’m going to do, and I’m going to explain why I have this, this query in a little derived table in a moment, but we’re going to start with this query, which is sort of the foundation of the query up here, just without any grouping or aggregations. And then we’re going to, so we’re going to alias this as PT. And this is going to be, I guess, because it’s like post types or whatever.
And then we’re going to pivot, which we’re going to alias is PV down here. This is our pivot. And what we’re going to do is say, we are going to count post type IDs for all of the post types for all of the things that we saw in that first query, right? So question, answer, wiki, tag wiki, excerpt, tag wiki, moderator, nomination, wiki. I don’t know what half these things are, to be honest with you. But those are all the things that we saw in here, right? All of these things, all this text, we put that down in our pivot. So for all of these types of things that we have in our type column, we are going to count the IDs so that we can figure out how many of these things have occurred. And when we run this query, our results look a little bit different, right? We are no longer working vertically. We are now working horizontally.
And the way that my, I forget what grade I was in, but I had some teacher explain it, where vertical, the V makes an arrow, right? So the V from vertical makes an arrow going like down, or I guess it could, if you flip it, it would be going up as well. So vertical is up and down and horizontal, right? Like this is going across. So that’s how I remember horizontal and vertical in case you need to know where I’m at.
But now we have these horizontal results where each column is labeled with the count for each of the things. So question, answer, wiki, tag, wiki, all this other stuff I don’t really care about. Now, the reason why I put this part of the query into a derived table is because pivot is very, very sensitive. It has this sort of implicit grouping thing going on. So if we just wrote the query like this, and we said, select PV dot star from post types, join to posts on yada yada, and then we pivoted this stuff, we would immediately start to see very, very different results.
And this query would immediately take a very, very long time. I’m going to cancel the execution there because these results are clearly incorrect. But what I am going to do is I’m going to scroll over this way just to show you that the pivoted columns do still come out of these results. But because we are grouping by like every column in the post table now, we are getting woefully incorrect results for all of these.
Like we see some of these are questions and we see some of these are answers, but we are not getting the grouping that we want. And you do have to be very careful of this. So like let’s let’s write the query just in a slightly different way. And what we’re going to do is have a derived table now that selects this stuff.
But now we’re going to look at the score of each one of these things. So we’re going to put score in here and we’re going to group by ID and type here. Right. And when we when we do this and we we write our pivot down this way and we say to sum the score. Right.
Because that’s going to be our aggregate that we that we pivot on. And then we say where type in all that other stuff. The results are going to look a little weird at first. Right.
Like if we if we run this, what do we like? This looks just bizarre. Like we’re just like doing this like stepladder thing down here. And we have all these nulls in the results. And the reason why we have all these nulls in the results, because we have we have selected too many columns in our derived table. If we quote out this ID column, right, we’d say don’t even bother getting that. Right.
We’re not putting we don’t want that anywhere. And then we rerun this. Now we get the sum of scores just without all the nulls in there. So you do when you’re writing pivot queries, you do have to be very, very careful about, you know, what like what you’re doing.
It’s very rare that the table that you are trying to pivot things from is limited to all the columns that you want to group by. And nestling things in a little derived table like this can make your like can make your pivot queries return correct results, which is fantastic. What we want, correct results, sometimes, most of the time anyway. Right.
I mean, we got no lockins everywhere, but hey, we didn’t return any nulls. So be very, very careful, even when you are even when you are using the derived table to only put the columns that you really need grouped by and everything in there. There is, of course, some alternate syntax that you can use if you want to do something very similar.
And we could what we could say is just select a sum and then put a case expression into. Oh, go away, you silly tip. Go and then just put a case expression.
We’ll say, hey, for questions, if post type ID equals one, then add a one. Right. So just sum up that. If it’s not a one, then just add a zero. So we don’t add more to that because we don’t need to do anything.
But this will give us the same thing in there, too. Right. So this will this is equivalent to like the count query that we ran before. So we could, of course, you know, rework this to do the thing with the score above.
But just to give you an idea that there is a different way of writing these queries. There’s not a very much of a performance difference between the two because they both just result in one sort of scan of the post table and then a grouping to do all this stuff. If we come back up to the original pivot query that we ran, we’ll see just about the same thing where it’s just one sort of scan.
Well, this, I mean, obviously has to hit the post types table, too, but it’s just one scan of the post table to do anything. So pivot versus the crosstabs thing, like, you know, there might be some performance difference depending on what you need to do with the queries. So, you know, of course, test both ways if your performance is a big concern to you.
But if you just need to write the query, sometimes, you know, once you at least remember the pivot syntax, it can be a little bit less mentally taxing than writing out all these summed case expressions. Unless you like formulate it with Excel or Dynamic SQL or something. So the next thing to talk about is unpivot.
Now unpivot is a lot more forgiving when it comes to the implicit grouping thing. We don’t have to worry about that so much. The one thing that we do care about with pivot, though, is that pivot will explicitly get rid of nulls. So, like, if we look at what this values clause is returning, there are one, there’s one null in each row, basically, right?
Null, null, null. And if we run this query with the unpivot syntax and we get, we look at the results, we’ll see that we do not retain those null values. SQL Server explicitly filters nulls out of these results, right?
It says no nulls in here. So if you want nulls, you have to write the query a little bit differently. We’re going to use our old friend cross-apply with the values clause. And we are going to put the columns that we want to unpivot on in here like this, along with an alias for them.
And then we want to, or with a name for them. And then we are going to alias the values as call and callValue the way that we have in the table, or rather in the values clause. And then when we run this, you’ll see that this one retains the nulls from the results.
You can, of course, filter those out if you add an aware clause that says where u.call is not null. That’s this thing in here. So we can, of course, get rid of nulls doing this.
But just the basic way of cross-applying values to unpivot things will keep nulls in. So anyway, that’s just about enough pivoting and unpivoting. Again, this is teaser material.
This is a shortened version of the full module of pivot and unpivot. I hope you enjoyed yourselves. I hope you learned something. And I will see you over in the next video where we’re going to talk about more T-SQL fun learning activities. All right.
Thank you.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.