Learn T-SQL With Erik: TOP and OFFSET/FETCH

Learn T-SQL With Erik: TOP and OFFSET/FETCH


Video Summary

In this video, I delve into the world of T-SQL paging queries, specifically focusing on the use of `TOP` and `OFFSET FETCH` in SQL Server. I explore how these techniques can be used to efficiently retrieve large sets of data while maintaining performance. I demonstrate practical examples using Common Table Expressions (CTEs) to illustrate both methods, providing a clear understanding of when and how to apply them effectively. Whether you’re just starting out with T-SQL or looking to enhance your skills, this video offers valuable insights into optimizing query performance for pagination scenarios.

Full Transcript

Erik Darling here with Darling Data. And today’s video we’re going to continue our T-SQL love fest where I’m previewing some of the content from the beginner portion of the T-SQL course that I’m working on. You can of course purchase this course down in the video description for the pre-sale price. Once the course is fully published with the beginner and the advanced material, the price will go up to double what it is now. So now is a good time to buy. And of course, if you are attending Kendra and I’s pre-cons at Past Data Summit in Seattle this November, you will get access to this content for free with the admission to your, with the admission to our pre-con. Your admission to our pre-con. There we go. Anyway, let’s talk a little bit about offset fetch then, first top, at least in the context of the one place where they really compare well is writing page inquiries. Now, assuming that you’ve written top, like to say select top and number of rows from some query, order by something, better be an order by in there at this point, hammer at home that you must order by when you use top.

But assuming that you’ve written top without, with ties or like percent or something, you get that number of rows back. If you say select top 100 rows, but you only have 95 rows that qualify for your query results, SQL Server does not invent five rows for you to work to like just make you happy. If you say select top 100 rows, you can get some kind of weird stuff back. By weird, I mean like if you’ve done it enough, you know what to expect, but it might look strange to someone who has not done it before. So just to show you what I mean there, if we say select top one with ties where reputation equals 3622, we say top one, but there are 11 ties. So we get 12 rows back.

All right. Look over here. There are 12 people with a reputation of 3622. Guess what? They all tied. So we get for top one with ties, 12 rows in this case. Kind of fun. But top does not have a natural syntactical offset in T-SQL, right? Because T-SQL is the only language that has top in it. Most other languages use limit, but most others also have incorporated offset fetch at this point. What offset fetch gives you is the ability to limit, not only limit the results the way top would, but also skip rows within that result set to page through things.

So like the fetch portion of offset fetch is like the top end rows, right? You say offset end number, sorry, fetch end number, next end number of rows only. That gives you like the top-ish behavior. The offset is what gives you the skip behavior. So if you want to have skip behavior with top, you have to do a little bit more work generally using the row number window function. So the way to think about paging queries in general is first you want to mentally separate the columns in your mind between relational columns.

By relational columns, I mean things that you’re going to perform some relational activity on, whether it’s a where clause, whether it’s a join, whether it’s a group by, whether it’s an order by, stuff like that. Those are your relational columns. And then you have your informational columns. Your informational columns are just the ones that you show the end user. So when you say, you know, select these columns, those select columns are your informational columns.

There might be some overlap, of course, but in general, if it’s only in your select list, it’s informational. If you are saying like, you know, from this table where this column equals something, join to this other table on this expression, then those are relational columns. When you’re writing paging queries, you want to avoid getting the informational columns for as long as possible.

You want to do all of your initial filtering, all of your initial joining, existing grouping, ordering, all that good stuff with as narrow a set of columns as possible. This gives you a much better chance at consistent index usage, not blowing your server out with gigantic memory grants from having to sort this big long list of every string column you’ve ever put into a table that’s in VARCAR max because your developers are lazy. You avoid a lot of performance issues with paging queries that way.

So that’s really the main sort of gist of it. Now, without getting too deep into indexing for paging queries, because that is a big, giant, complicated, messy hornet’s nest of a topic, we’re going to skip ahead to just kind of show you what a good starting point. Like when people say this house has good bones, we’re going to say this paging query has good bones, right?

So what we’re going to do here is show you what a good starting point for a paging query looks like. All right. So like notice that we’re only selecting one column. That is the clustered primary key of the post table.

And we’re going to, again, you know, avoid selecting all the columns you want to show people for as long as possible. In this case, thankfully, it’s not too long, right? Because once we use this CTE to find the rows that we care about, filtering to the owner user ID that we care about, and applying our offset in our fetch, we are going to join that CTE, which is called paging, back to the post table just for the rows that we found in the CTE.

All right. So if we run this, we will get back our necessary required 100 rows, and we will be very, very happy with the results. This was speedy enough in this case. Was it the fastest query I’ve ever written? I don’t know. I don’t know what the fastest query I’ve ever written is.

Probably a lot of zeros involved, but it gives us the results that we want, right? And it’s set up the way that we care about. There are two ways you can do this with top. The first way I’m going to show you just uses a single CTE, and in that single CTE, we generate a row number over results from the post table where the owner user ID equals the owner user ID that we care about out here.

Once we’ve located those rows, then we are going to do some math, some very, very fancy math, on the row number that we created in our CTE. That is f dot n. n is our row number right here.

Okay? And we need to do a little bit of math on our page number and page size variables so that we get the correct page, the correct number of rows positioned in the correct point in the results.

And, of course, we will need to finish up by ordering by our wonderful last activity date and tiebreaker ID columns. So if we run this, we’ll get the same results that we got back from the last query. And, again, lickety-split pretty quick there.

Same basic deal, only selecting the ID column and then joining to the post table outside here. The second way is a little bit more complicated. And this way is, this is a method that I, I’ve told this story many times, but way back in, like, 2009, I had only been working with SQL Server a very short amount of time, but I was tasked with writing a page inquiry, and I found this blog post by Paul White.

And this was the method that I learned from Paul in that blog post, where you use not one, but two stacked CTE. Now, if you’ve heard me talk about CTE before, I do want to say that this format of writing CTE, where one CTE draws from the one before it, doesn’t have the performance issues that I’ve talked about with CTE a lot in other videos.

If you haven’t seen those, hang on, because there’s CTE coming up in a few videos here. But if we run this query, what we’re going to see is, in this first CTE, again, only selecting the narrowest set of rows that we need, but up in the top, we do the page number times the page size.

And what this gives us is a somewhat bigger result for us to page through. So, if we’re on, I mean, currently we’re on page number one, and with a page size of 100.

So, page number times page size gives us 100. If we were in the top 200, then it would be page two, sorry, if we wanted page two, then it would be two times 100, so top 200.

So, this part of the CTE does give us a longer result set the deeper we go in. But what we immediately do is filter that out here, where we only select the top page size from the previous CTE, where the row number is greater than the page number minus one times the page size.

So, for, like, page one, we get the 100, page two, we get 200, all that other stuff. And we’re going to do the same thing out here, where we join back to the post table to get all of the columns that we care about, and again, ordered by the two columns that we care about for, A, the initial presentation ordering of the results, and then the ID for the tiebreaker, just in case there are any dupes and last activity date.

So, if we go and run this query, we will get, again, the same results back, and it will be just as lickety-split-quick as the other two. Right.

We’re not doing anything too big and crazy here, but, you know, as page inquiries get bigger and more complex, then we, that’s when we need to start thinking about other things like indexing, and, you know, perhaps some additional query tricks to stabilize result sets.

But for most page inquiries that you write, you will want to use one of these three methods and just sort of test them and figure out which one performs the best, based on what your sort of normal workload is and what users are requesting.

Again, offset fetch tends to suffer a little bit performance-wise as you get deeper into result sets, but if your users aren’t typically going past, going very deep into result sets, then it’s not really a concern.

Anyway, thank you for watching. I hope you enjoyed yourselves. I hope you learned something. And I hope that you’ll buy the course because it’s a good one and put a lot of good stuff into it.

Anyway, thank you for watching. Goodbye. Goodbye.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.