A Little About Paging Queries And Batch Mode In SQL Server

Thanks for watching!

Video Summary

In this video, I delve into the nuances of paging queries and how they can be optimized for performance on SQL Server. Specifically, I explore the differences between using `OFFSET FETCH` in more modern versions of SQL Server versus leveraging Paul White’s method from 2009, which involves using `ROW_NUMBER()`. I highlight that while `OFFSET FETCH` is syntactic sugar for the `TOP` expression and doesn’t support batch mode execution, `ROW_NUMBER()` can be used to achieve similar results in a way that does benefit from batch mode. By walking through both query setups, I demonstrate how recompiling hints and compatibility levels play crucial roles in optimizing these queries. The video also touches on common pitfalls such as improper indexing, unnecessary locking hints, and the importance of separating informational columns from relational ones to improve performance.

Full Transcript

Erik Darling here with Darling Data, the only company that will hire me. Recently awarded by BeerGut Magazine is the only company that is using AI appropriately to generate BeerGut Magazine covers. It’s not really that great for anything else. Sorry to dash your illusions. I’m going to say something that’s kind of wild. If everyone says they have next-gen AI, no one has next-gen AI. That’s just the sad truth of the matter. In today’s video, we’re going to talk about paging queries in batch mode and how you write your paging queries can make a big difference if you’re on the type of SQL Server that can get batch mode happening. Of course, I have another video. I believe the title of it is still Paging Dr. columnstore, where I talk about actually creating a columnstore index to replace a whole bunch of nonclubs.org. Clustered indexes that were there to support paging queries. As you may know, when you’re supporting paging queries, they’re often based off of dynamic SQL generated in some way, either in the application or in a store procedure that generates dynamic SQL or whatever other bastardized way you have of generating dynamic SQL somewhere between the application code and the SQL Server that runs stuff.

Because those queries are all dynamic SQL. You don’t know what people are going to search on, order by, what columns they’re going to select, what other tables you might need to join to. And it just makes indexing for all that stuff, even if it’s just one table indexing for all the different stuff. Because I know the tables that you people make out there, they’re like 400 columns wide.

And indexing for every possible permutation of columns that could get passed in is a very tedious and difficult process. And having one columnstore index that makes all of those searches kind of generally good enough is a lot easier than having a billion nonclustered indexes to support a billion different potential search criteria. But here I’ve said too much. I’ve gone on too long.

What we need to do right now is talk about financial security, mostly mine. If you feel so inclined and so kind and so bold and you feel like Erik Darling deserves $4 a month for all this YouTube content, you can click the link in the video description and you can get a membership to my channel where you will be rewarded with more videos.

It’s your lucky day. Anyway, if you don’t feel that I’m worthy of $4 a month, perhaps you feel that I’m worthy of some other expression of gratitude, like maybe liking a video or maybe commenting on a video or maybe subscribing to the channel where you can join like 47 other data darlings out there in the YouTube-verse who watch my YouTubes and learn a lot about SQL Server for free.

It’s nice, isn’t it? If you are in need of SQL Server consulting, maybe from someone who looks a lot like me and you think, hey, that Erik Darling fella seems capable.

Let’s get him to help out with our SQL Server problems. I can do lots of stuff. And as always, my rates are reasonable. If you would like training that is not free, but you want to get inexpensive training that lasts for the rest of your life, you can get my everything bundle, which is about 24 hours of performance tuning training at the beginner, intermediate, and expert slash advanced slash pathologically brilliant level for 75% off, which is about $150 with that discount code.

The link to that magically is also in the video description, so you don’t have to worry too much about remembering all those letters and stuff up there. If you want to catch me live and in person, and actually, I’m actually hitting sort of a weird point with this, because I’m recording this video at the end of September.

Passes at the beginning of November, but these videos are already scheduled out past pass on my blog, but on my YouTube channel, they’re only about like two weeks ahead, or two and a half weeks ahead.

So pass is still, if you’re watching this from YouTube, pass is still going to happen. If you’re watching this from a blog post, pass is probably long done. You missed it.

Sorry. As soon as one catches up to the other, I’ll most likely take this slide out. But yeah, Pass Data Summit, November 4th and 5th. Seattle, me and Kendra, if you’re unaware of that arrangement, because this is your first video, I highly suggest you watch some other videos where I go into far more detail about what we’ll be doing.

If there is a SQL Saturday or a Data Saturday, but not both in your area, or some other SQL data-ish event, and you think, hey, Erik Darling, that fella sure knows a lot about those sequels.

I’d like him to come teach the sequels at my event. Let me know what that event is, so I can come do that, because Lord knows I need a break from this.

But with that out of the way, let’s talk about page inquiries and batch mode. This is not another weird, shilly video about paginated reports and Power BI.

I hate those words altogether. Individually, some of them are all right. But as a group, ruthless. Sort of like people.

Individually, some of them are all right. As a group, holy cow. Sweet meteor of death. End this madness. So, in order to talk about this, I need to go back in time.

More years than I want to admit at this point. I could grow a beard then, but it was all this color, not this color. And two of the first blog posts that I ever read were from around 2009 or so by Paul White, because it was a problem that I was trying to solve.

My first SQL Server job was at a market research company, and my initial exposure to SQL Server was doing a lot of file loading of stuff, and eventually I got entrusted with more and more stuff, and part of that was writing reports, and part of that was people being like, I don’t need a 100,000 row report all at once.

I just want to see what the top 100 or 500 or 1,000 rows are. And so I was just like, I don’t know how to do that. And so I started searching around various terms, flailing mercilessly at my keyboard, trying to figure out a good way to do this, and I came across these SQL Server’s central articles by Paul White.

Really only the first one stuck with me at first, because the second one was way over my head at the time. I’ve made peace with the second one over the years, but the first one is really what made the difference.

And not to get nostalgic and probably kind of emotional here, but these articles are really the first thing that kind of showed me that in SQL Server, the way that you write queries and do things differently can really make a big difference to performance, because I had found some other blog posts about doing similar things, but the results that I got from those were just terribly, terribly slow.

So these ones really, really changed stuff for me. And to this day, I still see people struggling with writing paging queries for a lot of reasons.

Like they put the entire select list that they want within the same query that they’re like figuring out the rows that they want.

You know, like one concept that I talk about a lot, it’s maybe not enough, both in my YouTube videos and in my actual for money training is the concept of separating informational columns from relational columns.

Relational columns are the ones that you, you know, filter on like your where clause, you join to, you have your group buys, your order buys. And like typically, like you can find some key column using like that, like those relational aspects of the query and then filter down to just the rows you’re interested in and then match those rows to, you know, the full table with the columns you care about.

So those are like the informational columns, the ones that you’re displaying to people, your relational columns are the ones that you’re doing all the other stuff to. Anything that requires math to do. You don’t need math to return a select list.

You need math to do like cardinality estimation and filtering and joining and grouping and all that other stuff. But you don’t need math for a select list. So think of it that way.

So I see people screw this up a lot. They’ll use join when they aren’t returning columns from the table and like exist would have been fine or not exist would have been fine. Because they use a join and not one of the exist clauses, they like have to put a distinct on it.

They’ll sometimes stick a bunch of logic in a view thinking that that makes life easier. And like one thing that almost everyone overlooks at first is indexing.

Because having good indexes in place to support your page inquiries is pretty gosh darn important. You know, another thing that comes up a lot when we’re talking about page inquiries is, you know, you’ll find these queries that are just absolutely festooned with no lock hints.

They’re just everywhere. And someone will be like, well, I don’t like this idea because someone might see weird results if they go from page one to page two.

And we have to have that query go back to the server. Let’s just cache the whole result in the application. I’m like, good lord. You care about data consistency when someone clicks to page two, but you don’t care about data consistency when you’re getting all the data that you’re going to shovel into the application.

Okay, sport. I got you. So since SQL Server 2012, when offset and fetch were introduced into SQL Server, there have been like basically two, you know, pretty approachable ways of writing paging queries.

I’m going to show you like a pretty good setup for one where within the paging portion of the query, and I forgot where the control key was, I am only getting the primary key of the post table, this ID comms, the clustered primary key.

And then outside of the relational part of the query, I get the informational part of the query by joining back to the post table on that clustered primary key to get the rest of the columns out.

Right? So that’s the first thing that I’m doing there. The problem with using offset fetch in more modern versions of SQL Server, not SQL Server 2012, is that there’s really nothing in there that’s eligible for batch mode.

And I’m going to show you a big difference between this query and another one, but what I need to do is add in a couple of hints here just to push the point across. One, I’m using a recompile hint so that there is nothing about the local variables I’m using to determine page size and the page number.

And the second is that I do have this query running in compat level 160. 150 would also do for this, but 160 is just the highest one. So I always sort prices high to low.

So we’re going to sort compatibility levels high to low too. And we’re going to use that. If we run this query and we look at what happens, we should turn on execution plans because that’s the kind and decent thing to do, isn’t it?

Make sure that we have a query plan to look at. Where would we be if we didn’t have query plans to look at? Imagine if someone said, imagine if someone just took query text and they put it in an email and they said, why is this query slow?

What can I change? And you’re like, why don’t you show me an execution plan? Because maybe an execution plan might tell me what’s slow in the query, not just having to guess it, you know, how many rows might be in these tables or what your indexes might be or any of that other stuff, right?

Good information to have. If we look at the execution plan, this thing takes, well, this is another sort of weird operator time thing around parallel exchanges.

This query actually takes about seven seconds. Why the gather stream says it takes 13 and a half seconds. I don’t know. I don’t write, I didn’t write that code. I’ve never seen that code.

I wish whoever is still at Microsoft and responsible for this code would put a little bit of effort in that, a little bit of elbow grease, get the summer intern, actually, summer is over, I guess, get the fall interns to do some code review on this and maybe start fixing some of this stuff.

But this takes about seven seconds. And offset fetch is what, as a lot of people accurately describe it, it is syntactic sugar for the top expression.

Since top is a SQL Server specific thing and offset fetch is, I guess, an ANSI compliant thing. It’s like every database, every other database does offset and fetch.

SQL Server is the only one that has top in it. It doesn’t have bottom. You can, of course, do top descending order, but no bottom. It’d be cool if SQL Server had a top and a bottom.

I would just get a kick out of typing, like, select bottom 100 or something. That would be fun for me. But anyway, this whole thing takes about seven seconds and we don’t have a good time with it.

And that’s going to be a lot different from another way. And this is the Paul White method. This is the method that you will read about. And these links will, of course, be in the video description.

But if you were to follow the link to optimizing, for some reason, with an S server side paging part one, you would see a query setup similar to this one, though I think mine has the superior formatting at this point.

We can give Paul a pass because that was 15 years ago that he wrote these queries. So perhaps his formatting standards have definitely improved since then in some ways. In some ways he’s questionable.

But, you know, New Zealand’s kind of like Europe. They all have weird tastes and habits. So in this one, instead of using offset fetch, we’re going to use row number.

Right? See, we are row numbering here. And then outside of the row numbering, we are filtering on that row number with some very fancy math, with some very fancy parentheses to get what we want. And just like the last query, we are recompiling to avoid any local variable weirdness.

And we are using the most recent compatibility level to reinforce the fact that batch mode on rowstore is something that can happen for windowing functions, but not something that can happen for top and offset fetch.

If we run this query, you might notice that it was a tad bit snappier. This was not a seven second ordeal for us to run this. This was, this whole thing took, if SSMS will be so kind as to let me pull this up.

This took about 1.1 seconds. Now, the reason why this, this plan looks a little funny is because it is mixed row mode and batch mode.

So some operators that execute in batch mode are only responsible for their time. And some operators that execute in row mode are responsible for their time, plus all their child operators.

So if we kind of come and look over here, we’ll see about half a second here. And that time go up here. Actually, I bet this is batch mode. Yeah.

So this happens in batch mode. Does this happen in batch mode? This does happen in batch mode. Right? So we got two batch modes. So we got five plus five plus zero. But then this gather streams is in row mode. And this gather stream shows us the 1.1, 1, 1.112 seconds.

And then that pretty much carries on through the top, which is row mode. Another, another parallel exchange, which is also row mode. This filter, which is presumably row mode, because it didn’t like, like this time didn’t change from this one to this one.

And then this adaptive join here, which since this is like, this is technically a, like an extension of having batch mode execution happen. The actual join type of the adaptive join is nested loops.

And nested loops does not support batch mode. So we get a row mode nested loops here. Basically what it comes down to is that the whole query at the end of it, this sort is going to be in batch mode.

That’s why the time goes down there. But the whole query takes about 1.1 seconds. This is something that you can get. Oops. This is something that you can get from a paging query that uses a windowing function because the windowing function is eligible for this window aggregate operator where the offset fetch version is not.

So if you’re writing batch inquiries on more modern versions of SQL Server, I do think that the way that you write the query matters a lot more than it used to.

Granted, there were some problems with offset fetch. If, and if like, even with good indexes, if you were like going deep into a result set that maybe didn’t show up as, as readily with using windowing functions, but the way that you write window, the way that you write paging queries makes a big difference now because the, if you use the windowing function version, you can get batch mode on rowstore with offset fetch, you can’t.

Now that would change if we were to create a non clustered columnstore index on the post table, uh, because then, you know, having that clustered columns, that’s not, sorry, that non clustered columnstore data source would make the, whatever operators and the, the plan that are eligible for batch mode, get batch mode, but without that in place.

And I don’t know if you’re going to be even allowed to do that sort of thing. Uh, we would, we, this, this would be the preferred method. So when you’re writing paging queries, uh, and you’re on newer versions of SQL Server, um, you know, there, there are definitely some tricks you can play to make them faster.

One of them being to use the windowing function method. Another one being to add a non clustered columnstore index to the table so that, uh, you get the, you get one index, all of your, you know, dynamic paging queries can use rather than having to have 40 different non clustered rowstore indexes to support every different combination of columns or where clause and order by and everything else.

So with that out of the way, thank you for watching. I hope you enjoyed yourselves. I hope you learned something.

I hope that you will use this, these videos to start doing things more correctly in your SQL Server life. Um, it is, I believe Friday here, at least according to my swatch, uh, so I’m going to, I’m going to figure out some Friday things to do.

And, uh, I will see you in the next video. Uh, as soon as I’ve got sorted out exactly what the next, what I’m going to record next. So, uh, once again, thank you for watching.

Good, good job watching. full, Thank you.

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.