A Little About ROW_NUMBER Filtering Performance in SQL Server
Video Summary
In this video, I dive deep into an intriguing SQL Server performance question that came up during yesterday’s office hours. Specifically, we explore why a `ROW_NUMBER() = 1` query performs differently from a `ROW_NUMBER() <= 1` query by analyzing their execution plans in detail. You'll see how the choice of operators and execution modes can significantly impact performance, with one query finishing in just over five seconds compared to another that completes in less than two seconds. I walk you through the nuances of windowing functions, batch mode vs. row mode operations, and how SQL Server optimizes these queries. Whether you're a seasoned DBA or just starting out, this video offers valuable insights into optimizing your T-SQL code for better performance.
Full Transcript
Erik Darling here with Darling Data and I’m very excited. Very excited today. Hugely excited. Probably the most excited I’ve ever been. I’m going to be answering in longer form a question from yesterday’s office hours about why row number equals one. It has a different performance than row number is less than or equal to one. And of course, we will be looking at query plans. We’ll be looking at performance and we will be answering this question in as much detail as I can muster off the top of my head. So good stuff there, right? All right. I was going to say something else funny here, now I forgot it. See, this is the problem with the top of my head or things just fly off it. It’s like frying an egg. Little bubbles. Anyway, if this sort of stuff is interesting to you and you think, gosh, this Erik Darling guy just might know a thing or two about SQL Server, maybe he could help me with SQL Server. Maybe he could do it. Maybe he can be the one to finally change my life. Well, there are all sorts of ways for me to accomplish that. And if you look down in the video description, you will find all of these ways for me to change your life. You can hire me for consulting by my training. You can hire me for consulting by my training. You can hire me for consulting by my training. You can hire me for consulting by my training. You can hire me for consulting by my training.
You can hire me for consulting by my training. You can hire me for consulting by my training. You can hire me for consulting by my training, oversee this monthly consultation, and then for free, you can ask me office hours questions. And also for free, you can like and subscribe and tell a friend about the magnificence and glory of this channel and its content. And of course, the young and handsome consultant behind the majority of the content therein. If you would like to see a young and handsome consultant live and in person, well, golly and gosh. Won’t I just be a good friend? Here’s an awesome consultant. everywhere you want to be. Dallas, September 15th and 16th. Utrecht, the Netherlands, October 1st and 2nd. And Seattle, the Washington, November 17th through the 21st. These are all past events put on by the magnificent, lovely, also glorious, also magnificent and glorious folks at Redgate who were kind enough to say, hey Eric, we like you a little bit. Why don’t you come talk about these things? And I said, yes, yes, please. Yeah, I’d love to go places, talk about things because you know, there’s only so much time one can spend trapped in a PowerPoint or in an SSMS tab. It starts to jar the brain a little bit. So with that out of the way, let’s talk about this any aggregate thing. Let’s make sure that I go to the right spawn of SSMS. Now, it took me a little, it took me a little while to get this demo to the place where there was a nice noticeable difference in performance between these two queries. And part of it was having a slightly imperfect index key to support the windowing function that we’re going to be using. So that was the first thing. The second thing was getting this index to create in a reasonable amount of time by filtering out all these zeros, because there are a lot of zeros in the comments table.
And then of course, writing the query in a way where there would be a noticeable difference in performance between a query where the row number function is filtered to equals one, and a query where the row number function is filtered to less than or equal to one. All right, so we’re going to, well, we’re not going to create this. We’re not going to go do all this stuff over again, because that’s just silly. We are going to make sure that we have actual execution plans enabled, which we do. We can tell that via the lovely purple highlighting there. That is a nice touch. I do, I do like purple. If Erin Stellato had as much to do with picking the color purple for that highlight as I think she did, my kudos go out to Erin Stellato for her choice in purples, because that is a great purple. There’s a nice, there’s like a grimace purple. And that is, that is the perfect purple for me. That is purple rain to me. So let’s run these queries. Now what these, now just to get rid of the query results, because I hate waiting for query results. And the discard query results thing is a bit too heavy handed for me, because then I have to remember to turn it on and back off and on and back off. It’s all too much. We are going to be dumping the results of these queries into different, slightly different temp tables, because we don’t want to overcrowd a single temp table.
And let’s do this and let’s look at the performance difference. We’re going to execute these and we’re going to wait, oh, around about seven seconds total, I think, if I, if I remember things correctly. There we go. All right, look at all those rows affected. See, sometimes having, having no count off is nice, right? Make sure that the same number of rows got changed. Look at that beautiful symmetry. Wow, we’re so good at this. All right. So that was indeed about seven seconds.
And here is the first query plan. Now you’ll, if you’re familiar at all with SQL Server execution plans, you will immediately recognize that for a query with a windowing function in it, we are missing some potential operators. Usually when there is a windowing function involved, you will see some combination of segment and sequence project. You may even see a window spool for some of them. And here we do not have those. Here all we have is a stream aggregate. What this means is that SQL Server transformed the row numbering into a grouping operation. So we did not actually generate a row number. We just grouped the whole thing. Now there are some rules that SQL Server follows in order for this to happen. You must filter the, I think it’s, you must filter the windowing function to an equality predicate. I think that’s one of them, but more, but other more important stuff is like you can’t be presenting the window function column and the results, right? So if I had the column X dot N in here, that’s what the windowing function is, um, is alias to, uh, then we would not see this transformation get applied. But, uh, you, you can tell from this that we just did a stream aggregate. Now this all took around, around about five and a half seconds, right? 5.356 seconds, which is close enough to five and a half for me. Part of why this query, uh, took the five and a half seconds is that this query happened entirely in row mode, right? So we processed quite a lot of rows all in row mode. We did not use batch mode for this. Part of the other signs that we use batch mode for this, rather that we did not use batch mode for this, are that we have, uh, some parallel exchanges in the plan. Often these are, uh, omitted to some degree when you have a, uh, when you have a query plan that uses batch mode, usually fully omitted. If you have a query plan that is just about fully batch mode, sometimes only partially omitted. If you have a query plan that is a mix of row mode and batch mode. Another good sign is that we have a stream aggregate operator. The stream aggregate operator, of course, does not support batch mode, right? Uh, only the hash, uh, only the hash, uh, aggregate and, uh, join support batch mode. Nested loops and merge joins also do not support batch mode. Now, if we come down here and look at the second query, the first, well, let’s, let’s, let’s actually make this a little bit easier on the eyeballs. What do you say? I say that’s a great idea. Uh, this query finished in 1.6 seconds.
Hmm. It’s a meaningful improvement, isn’t it? I would say so. From about five and a half to about one and a half. That’s, that’s pretty good, right? We saved about four seconds there from, my math is correct. And, uh, if you look at the various operators in here, you might notice that, uh, SQL Server, uh, used, you might, you might have some visual indicators that SQL Server used batch mode. And, uh, uh, the visual indicators are, well, the main, the prime visual indicator is that we have a window aggregate function, right? So the window aggregate is a batch mode operator that SQL Server uses when you have a windowing function and batch mode is chosen as the execution engine for the query.
So if we look at this, uh, the details of this operator, we will see that this executed in batch mode. We will see that this sort executed in batch mode. We will see that the window aggregate, of course, executed in batch mode and even the filter operator filtered in, uh, yeah, even the filter operator filtered in batch mode. So, uh, if you, so to answer the question from yesterday’s office hours, where, uh, why is performance different? Well, this, these are the types of things that you would want to look for in your query plans in order to ascertain why performance is different for you locally.
Like most things, the answer is in the execution plan. So, uh, you should look at them and maybe take, take note of some of the details, take note of some of the operator times. Uh, I realize that there is nothing in the query plan that’s going to tell you that your, uh, windowing function was transformed to an any aggregate, uh, by SQL Server’s query optimizer. But, uh, this is just a piece of knowledge that you will have to stick on the shelf in your brain where you store your SQL Server knowledge and come back to when you see this sort of thing happening. Anyway, I hope that answers your question in a little bit more detail. Thank you for watching. I hope you enjoyed yourselves. I hope you learned something and I will see you in tomorrow’s video where I reckon, I do reckon, I do hereby reckon, uh, we will talk more about learning T-SQL. So, um, hope to see you there.
Anyway, thanks for watching.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.