Imbalance
In yesterday’s post, we looked at a funny situation where a query that spilled was about 5 seconds faster than one that didn’t.
Here’s what the query looked like:
SELECT x.PostId FROM ( SELECT v.PostId, ROW_NUMBER() OVER ( ORDER BY v.PostId DESC ) AS n FROM dbo.Votes AS v ) AS x WHERE x.n = 1;
Now, I can add more columns in, and the timing will hold up:
SELECT x.Id, x.PostId, x.UserId, x.BountyAmount, x.VoteTypeId, x.CreationDate FROM ( SELECT v.Id, v.PostId, v.UserId, v.BountyAmount, v.VoteTypeId, v.CreationDate, ROW_NUMBER() OVER ( ORDER BY v.PostId DESC ) AS n FROM dbo.Votes AS v ) AS x WHERE x.n = 1;
They both got slower, the non-spill plan by about 2.5s, and the spill plan by about 4.3s.
But the spill plan is still 3s faster. With fewer columns it was 5s faster, but hey.
No one said this was easy.
Fully comparing things from yesterday, when memory is capped at 0.0, the query takes much longer now, with more columns:
To compare the “fast” spills, here’s yesterday and today’s warnings.
With one integer column, we spilled 100k pages.
With five integer columns and one datetime column, we spill 450k pages.
That’s a non-trivial amount. That’s like every column adding 75k pages to the spill.
If you’re really worried about spills: STOP SELECTING SO MANY COLUMNS.
For The Worst
I promised to show you things going quite downhill, and for the spill query to no longer be faster.
To do that, we need a different table.
I’m going to use the Comments table, because it has a column called Text in it, which is an NVARCHAR(700).
Very few comments are 700 characters long. The majority are < 120 or so.
This query looks about like so:
SELECT x.Id, x.CreationDate, x.PostId, x.Score, x.Text, x.UserId FROM ( SELECT c.Id, c.CreationDate, c.PostId, c.Score, c.Text, c.UserId, ROW_NUMBER() OVER ( ORDER BY c.PostId DESC ) AS n FROM dbo.Comments AS c ) AS x WHERE x.n = 1
And the results are… icky.
The top query asks for 9.7GB of RAM. That’s as much as my laptop can give out.
It still spills. Nearly 10GB of memory grant, and it still spills.
If you care about spills: STOP OVERSIZING STRING COLUMNS:
Apparently only spilling 1mm pages is a lot faster than spilling 2.5mm pages.
But still much slower than not spilling string columns.
Who knew?
Matters of Whale
I was using the Stack Overflow 2013 database for that, which is fairly big relative to the 64GB of RAM my laptop has.
If I go back to using the 2010 version, we can get a better comparison, because the first query won’t spill anymore.
Some points to keep in mind here:
- I’m testing with (very fast) local storage
- I don’t have tempdb contention
But still, it seems like spilling out non-string columns is significantly less painful than spilling out string columns.
Ahem.
“Seems.”
I’ll reiterate two points:
- Stop selecting so many columns
- Stop oversizing string columns
In the next two posts, we’ll look at hash match and hash join spills under similar circumstances.
Thanks for reading!
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.