A/B Testing
Let’s look at a pretty simple query against the Votes table:
SELECT v.VoteTypeId, MIN(v.CreationDate) AS min_creation_date, MAX(v.CreationDate) AS max_creation_date, COUNT_BIG(*) AS records FROM dbo.Votes AS v GROUP BY v.VoteTypeId ORDER BY v.VoteTypeId;
There are only two columns involved:
- VoteTypeId, which we’re grouping on
- CreationDate, which we’re finding min and max values for
There are two different ways to approach this:
CREATE INDEX even_steven ON dbo.Votes(VoteTypeId, CreationDate); CREATE INDEX steven_even ON dbo.Votes(CreationDate, VoteTypeId);
But first, let’s look at the query with just the clustered index, which is on the Id column, not mentioned at all in the query.
We’ll call this our baseline.
This takes 2 seconds.
Even Steven
With an index on VoteTypeId, CreationDate, what happens?
It’ll take 5 seconds no matter how many times I run it.
This might sound like a very good index though, because even though we don’t have a where clause looking for VoteTypeIds, we’re grouping by them.
Having CreationDate next in the index key should make it really easy to find a min and max value for each VoteTypeId, because CreationDate will be in order.
And you know what? That sort of works out. We get a Stream Aggregate in the plan without a Sort operator.
But it still sucks: Why?
Steven Even
With this index, we go right back to… What we had before.
We went through all the trouble of adding indexes, to have one be slower, and one not get us any faster than just using the clustered index.
What gives?
Teenage Angst
I’ve been avoiding something a little bit, dear reader. You see, I’m using SQL Server 2019.
The first plan and the third plan — the ones that finished in 2 seconds — they both used batch mode on rowstore. That’s an Enterprise Edition optimizer feature available in compat level 150.
If you were to run this on SQL Server 2017 or earlier, you would find no measurable difference between any one of these queries.
And look, batch mode on row store does represent a good improvement in many cases for large aggregation queries — the type of queries that would benefit from columnstore in general — and maybe in places where you’re unable to use columnstore today because you’re also using triggers, cursors, Replication, or another feature that it disagrees with.
If you suddenly saw a 60% improvement in some of your “big” queries, you’d probably be pretty happy. I’m not saying it comes for free, or that it’s a magickal world where everything is perfect for every query now.
You can only get that if you have PLAN GUIDES FOR EVERY QUERY. H ah aa ah ah ha a.
But let’s consider something else
It only kicked in when our indexes were lacking.
When we had a “good index”, SQL Server chose a plan with no batch-y mode-y at all.
If you’ve carefully crafted some indexes over the years, even a sure shot for the type of query you want to get some batch mode love may not see it.
Remember: It only gets used if your query is tall enough to ride.
Thanks for reading!
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.