Using sp_tableoption To Store LOB Data Off Row In SQL Server

Crazy That Medulla Oblongata

Video Summary

In this video, I explore an interesting and underutilized SQL Server option that can significantly impact performance in certain scenarios. Specifically, I demonstrate setting the table option to store large value types out of row, no matter their length, using the `sp_tableoption` stored procedure on the Stack Overflow database’s post table. By creating a copy of the post table and enabling this setting, I observed substantial reductions in read operations from the clustered index when querying the data that was previously stored in-row. The performance benefits were evident, with fewer logical reads and reduced wait times for page I/O operations, especially when dealing with large amounts of max data types like `varchar(max)` or `text`. This video provides insights into how this setting can be leveraged to optimize table designs where large data fields are common.

Full Transcript

Erik Darling here to talk about a sort of fun, underutilized option when it comes to SQL Server. Now, the option that I’m talking about is one that you can set at the table level using sys.sp table option. And the option that I’m going to talk about today is setting the table so that large value types are stored out of row, no matter what their length is. And what that means is if you have a table with large data types in it, like max data types, SQL Server will store up to like 8000 bytes of them, I believe. If I remember the documentation that I read several hours ago correctly, then it’ll store a good chunk of that data in row. But only after it gets past a certain amount of data in row, it’ll store up to 8000 bytes of them. And then if you have a certain size, it’ll store that data off row. So if you have like an in varchar 4000 or a varchar 8000 column, then those will always get stored in row or something like that. I don’t know. It’s all a little bit foggy. Last night’s a little bit foggy, to be honest. But I wanted to mess around a little bit with this setting, specifically in the stack overflow database. The database, the post table has this column called body that’s a varchar max and or in varchar max rather. And it’s, it’s, I mean, it’s gigantic. It’s where the body of every question and answer lives. So it’s a lot of just like, you know, long, long rambling internet nonsense. People talking about, I don’t know, computer, computers, maybe, I don’t know, whatever.

We can pay attention. But anyway, I think this is an interesting setting because you can, if you have a lot of lobbish data, you can drastically reduce the number of reads that you do against the clustered index. If you, as long as you’re not selecting off row data. So to do that, I created a copy of the post table. It’s not just this column, but I created the entire thing. And I named it post off row. And then after that, I set the table option to store large data types out of row. And then I inserted every single row from the regular post table into this post table.

They both have the exact same number of rows in them. There’s no weird tomfoolery magic witchcraft, I don’t know, black arts or whatever they call them. None of that’s going on. They are, they’re equally sized tables. And we can validate that by running spblitzindex. And looking at size and, I mean, other stuff, I guess. So this top, this top line is the regular post table. And if we scroll a little bit over to the right, we’ll have, we’ll see, oh, the size column.

The size column will tell us it’s about 17.1 million rows. And the table itself is 36 and a half gigs. And that four, about four and a half gigs of it is lob data. So stored off row. And about one mega stored in row overflow. Or about 0.1 meg stored in row overflow. I can read today. It’s great. I’m having a wonderful time. And if we scroll down a little bit further, we’ll have the results for the post table with the data stored off row.

And if we do the exact same thing, and scroll over a little bit to the right and look at the results, we’ll see that it has the exact same number of rows, just like I said. That the table is 34.6 gigs and that 31.8 of those gigs is stored as lob data. So this is stored not in row, but off row. Cool. Right? Cool. So let’s go look at what the performance implications of that is.

R. Is R. R. Is R. Fewer, more, less? I don’t know. And we’re going to do that in a couple different ways. So first off, I’m going to drop clean buffers. And when that finishes, I’m going to talk for like two seconds. And then I’m going to turn on query plans or else we will have nothing to look at.

And I’m going to run this simple count query. There’s no where clause. There’s nothing going on here that would be of, that’s going to cause any weird performance stuff. Now this is SQL Server 2019. And I do have enough memory assigned A to this, I mean A in my laptop and B assigned to the SQL Server instance to hold the full post table in memory. Okay. So I have like 50-ish gigs of memory just for this SQL Server instance.

And when this thing runs, it takes about 20 seconds. And that 20 seconds is pretty much all spent reading data from disk into memory. All right. So we had to read all that in. If we look at two things. One, let’s look at the wait stats for this query. Let’s go over here. And let’s see. We spent a whole lot of time waiting on page IOLatchSH.

So that is all reading pages from disk into memory. There are some other less consequential waits like memory allocation and SOS schedule yield and blah, blah, blah. And now, so we know that we spent the majority of our time waiting on that. So reading pages from disk into memory. And even though this happened in batch mode, it doesn’t really affect that portion of the operation.

We still have to get stuff from disk into memory. And before you accuse me of having crappy disks, I have excellent disks in this laptop. They are like one terabyte Opal SSD, whatever they call them. You can go look them up, the P51 stuff on Lenovo. I have good drives in here.

And if we go to the properties of the index scan, we can look at actual I.O. statistics. And if we look at the reads over here, we’ll have two numbers of semi-note. There is actual logical reads and read-ahead reads.

So for the actual logical reads, that’s a 4178042. That’s a seven-digit number. So we did 4 million logical reads, 4.1 million logical reads. And just about the same number of those were read-ahead reads.

I should learn how to zoom it someday. So we did a bunch of read-aheads and we did a bunch of logical reads. Now, this is only really, really painful when data is not already in memory, at least from what I can tell. If I run this count query again, it’s not going to take anywhere near 20 seconds.

One stuff is already in memory. That count query finishes in just under two seconds. So 1.8 seconds there. The thing is, that’s a pretty dramatic departure from what happens when we select a count from the off-road table.

So now we’re going to do that. So we dropped clean buffers. So there’s no data in memory right now.

We’re going to start with a cold cache. The first time we did that for this, that took 20 seconds. So let’s run this query. Now that finished in 1.6 seconds.

And if we look at the wait stats for that, we’ll come over here and up at the top we have page.io.hsh, but instead of 20 seconds, it was 2 seconds.

Another cool thing is if we look at the IO stats for this, and we look at… Oops.

I didn’t go in the right thing. I didn’t expand the right thing. If we look at the logical reads and read-ahead reads, these are now only a six-digit number. These are not… We did not do millions of logical reads.

We did a few hundred thousand, so about 400,000. I want to say that’s like about 10% of the reads that we did on last time. There were like 4.1 million or something.

Now we’re down to 370,000. So a little under 10%. So we reduced that number by quite a bit because it didn’t have to read through all of that crazy in-row lob data, lob-ish data.

And now, of course, the second time we run this, this will be substantially faster as well. Now with this data already in memory, that takes about half a second. So we went from about 1.5 seconds down to about half a second just to read the data that was already in memory.

So pretty clearly here, that off-row option is helpful for the POST table. Now, like I was saying before, batch mode does not make any particular difference for this.

So again, if we clear data out and I run this query where I’m specifically telling SQL Server not to use batch mode at all for any operator in here.

In the previous queries, we did use batch mode on the clustered index scan. It’s not going to make any difference for how long it takes SQL Server to read the pages from disk into memory.

This is not something that I think batch mode has a side effect on. So this took just about 20 seconds as well without batch mode, just about half a second over there.

And of course, without batch mode, the second run of this will be pretty quick, still about 1.8-ish seconds, I believe, or two and a half seconds. So batch mode helped a little bit there.

But then if we get rid of everything in memory and we just go back to reading the off-road table, even without batch mode, this is still relatively quick as well, about 1.7 seconds there.

And then a second run of this once the table is already in memory is about one second. So yeah. Where this is interesting, and I mean, I’m dealing with a pretty specific case here where there is a lot of data stored in row for the post table.

And having that data stored in row means that we have, you know, a larger index altogether that we have to kind of read through to get information out.

With the data stored off-row, we have a much smaller index because that’s all over there. Now, yes, this, you know, I have to do some more testing on this, but this is what I could fit into, like, a 10-minute video for you goldfish out there.

But as well, yeah, I have some more testing to do on this to look at, you know, what happens when we do need that body column. What are the performance implications of if, you know, we select that or, you know, do something terrible with that data because it is max data type and it is fairly large.

So anyway, I’m going to end this video and then go back to tinkering with this set of fun stuff on a Monday. Thank you for watching, and I’ll see you soon-ish because I have to poke around a little and then whatever.

Goodbye. Goodbye.

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Using sp_tableoption To Store LOB Data Off Row In SQL Server

Crazy That Medulla Oblongata

Video Summary

Full Transcript

Going Further

Related Posts

One thought on “Using sp_tableoption To Store LOB Data Off Row In SQL Server”