Learn T-SQL With Erik: Window Functions and Aggregates

Learn T-SQL With Erik: Window Functions and Aggregates


Video Summary

In this video, I delve into a fascinating aspect of window functions that isn’t always highlighted in discussions about these powerful SQL Server features. I demonstrate how to use window functions not just for simple column aggregations but also for more complex scenarios where you need to preserve and order aggregated data within a temporary table. By incorporating row numbers based on aggregate values, you can ensure that when querying the results later, they are ordered by the most impactful findings first—something many overlook in their SQL practices. I walk through an example using Stack Overflow data, showing how to calculate averages of counts over different time spans and order them effectively. This technique is particularly useful for troubleshooting and reporting purposes where you need to prioritize high-impact issues. Additionally, I share insights on the flexibility of window functions, illustrating that they can be used in a variety of creative ways beyond just row numbering, making your SQL queries more powerful and versatile.

Full Transcript

Erik Darling here with Darling Data. And to continue the Tickler material for Learn T-SQL with Erik, for which all of the beginner content is now published. There’s about 23 hours of it across an entire 69 modules. Do with that information what you will, but again, the presale price is still $250 until the advanced stuff drops after the summer. So, we are going to, I’m going to show you something that I think is very neat about window functions that not a lot of people pick up on about window functions. So, you may have noticed that I have a couple of store procedures that try to help people troubleshoot various aspects of their SQL Server.

Some of those store procedures have sort of like a roll-up of findings, an aggregated roll-up of all the findings in there. And what I found was that, of course, you know, it’s very easy to, you know, produce the sort of summarized output. What was not easy to do was to order that summarized output later.

So, what I ended up learning to do to make that easier is when I insert the data into the table, I have not only the aggregations of the things, but also a row number that gets produced based on the aggregations of the things. So, for example, like if we were just going to query the stack overflow database, the aggregation plus the ordering would look something like this. We’re going to get a count of all the posts, right?

So, we’re getting the post type ID up here and we’re producing some text. And so, like, because we’re producing some text, you couldn’t order by the text in the output. So, it’s just going to be like texty ordering.

We’re not ordering by like what had the most of something, which is what I would want to do. I’m going to like prioritize the high impact stuff first. But what I learned to do was like put that count into the order by clause of the row number function, which looks like this, so that I can preserve the numbered output for when I want to order it coming out of the findings temp table.

Like to just do this in one query would be trivial to do this when you’re like putting data into a temp table and then you want to select it out later with a specific order. This, this, this becomes a little bit trickier, but the results look like this. And of course, I’m not connected to SQL Server, which would help.

But now, when I run this query, I’m not ordered by the text output. I’m ordered by which posts had the most rows. Now, I know that for this standalone query, I could have just said order by count big descending.

That’s, that’s no problem. The point is that now I have this row number column. So when I want to, when I want to select data out of like my findings table later, I’m going to go to the next one. So I can say like order by finding ID and then this row number column.

So I have the top stuff at the top. So like, you know, post type ID to having 11 million posts and post type ID one having 6 million posts. Those are up at the top of like whatever thing I want to show you.

But, you know, this is something that not a lot of people understand about window functions is that you can partition by, you can order by and you can pass aggregates into them. Just like you do. Like, you know, like another aggregate or something like you could do like sum divided by count or something like that.

Like you could, you can, you can have these things intermix and sort of live together. So just like an easy example of that would be like doing some average, like averages of counts. Like you don’t like a lot of, like I’ve seen a lot of queries written like this where like someone goes and like gets the count and like one CTE or something.

And then does the averages after they’ve like done the counts, which is kind of silly because you can just do it all in one place. The only thing you have to be a little bit aware of, of course, is how you choose to set up your, your, your window range and row specification to deal with that. So what I’m doing in this query is I am hitting the post table again.

I’m doing a little bit of fanciness in here in order to, in order to like give, give myself like consistent, like accessible aliases for these two expressions across my whole query. And what I’m doing is I’m asking for an average of the count over different spans of time. Now, since I want three, six and 12 month averages for the three month average, I have to say between two proceeding and current row, which feels a little funny.

Cause you’re like, I want a three month average. I should put three proceeding in current row, but that would give you four rows, right? You want two proceeding and the current row.

That’s three total. And it’s the same thing for the six month average and the 12 month average for the six month average. You want five rows. And for the 12 month average, you want 11 rows and proceeding. Now, granted for the, the 12 month average, it wouldn’t matter as much because it may be like the last row anyway.

So you’re like, like you could just say between unbounded proceeding and current row if you wanted. But if there’s more than that, if there’s more to that, then you would need to, you know, obviously make sure that you are very specific about it. But now when I run this query, I’m able to just all in one sort of fill swoop of things, get the average number of posts across three, six and 12 and all 12 months in the year 2013.

If we were doing multiple years, that’s where you have to be really careful with the 12 month one. Um, but what you’ll see here is that when we look specifically at these three columns that produce the averages, uh, you’ll see the, the three month average here across all three of these, uh, that, that is the same. But then for the six month average, this number changes from this number.

So this is just another three month average here, whereas this is really producing the six month average across these, right? So like this 12 month average here too, we’ll have the same thing across the first six months, but then, uh, like where this thing for every three months will reset and give us a new set of averages. This gives us two, two window frames of add of averages, the six months and six months.

And then this one, like where this agrees up to the six month part, this is, this really, this really like departs here. And then we have like the full 12 month, uh, um, you know, sort of like, uh, average across all 12. So the main message here is that, um, window functions, uh, are not just sort of limited to, you know, like a column in your tape, in your table.

You can do all, you can, you can assign all sorts of stuff to them. Uh, you saw there where, uh, the first query where I was ordering by account big, uh, and this one, I’m getting the average of count bigs over different spans of time. So there’s a lot of neat stuff you can do with window functions that, uh, are like, may not be just like immediate, obvious and apparent to you based on the way you see a lot of window functions out in the world written.

But there are some really cool things you can do with them that often get overlooked. Like you’ll hear people talk about like, you know, how cool and powerful they are, but they kind of just give you the same examples of like, you know, the kind of like, I don’t know, like, here’s a row number over and over again. Like here’s this different things you can do with row number and you’re like, okay, row number. Great.

But like, like think of all the other neat stuff that you can do for your sort of like neat analytical queries. Anyway, uh, that’s enough here. Again, this is all tickler material from learn T SQL with Eric, uh, still on sale for 250 bucks. There’s a link down in the video description. If you want to see much, much more material like this, uh, I don’t know.

Otherwise, I don’t know. Go live in the T SQL dark for the rest of your life. See if I can. All right. Anyway, thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.