Optimizer Optimizes
Sometimes, the optimizer can take a query with a complex where clause, and turn it into two queries.
This only happens up to a certain point in complexity, and only if you have really specific indexes to allow these kinds of plan choices.
Here’s a haphazard query:
SELECT COUNT(*) AS records FROM dbo.Posts AS p WHERE ( p.PostTypeId = 1 AND p.AcceptedAnswerId <> 0 AND p.CommentCount > 5 AND p.CommunityOwnedDate IS NULL AND p.FavoriteCount > 0 ) OR ( p.PostTypeId = 2 AND p.CommentCount > 1 AND p.LastEditDate IS NULL AND p.Score > 5 AND p.ParentId = 0 ) AND (p.ClosedDate IS NULL);
There’s a [bunch of predicates], an OR, then a [bunch of predicates]. Since there’s some shared spaced, we can create an okay general index.
It’s pretty wide, and it may not be the kind of index I’d normally create, unless I really had to.
CREATE INDEX whatever ON dbo.Posts (PostTypeId, CommentCount, ParentId) INCLUDE(AcceptedAnswerId, FavoriteCount, LastEditDate, Score, ClosedDate, CommunityOwnedDate);
It covers every column we’re using. It’s a lot. But I had to do it to show you this.
The optimizer took each separate group of predicates, and turned it into a separate index access, with a union operator.
It’s like if you wrote two count queries, and then counted the results of both.
But With A Twist
Let’s tweak the where clause a little bit.
SELECT COUNT(*) AS records FROM dbo.Posts AS p WHERE ( p.PostTypeId = 1 AND p.AcceptedAnswerId <> 0 AND p.CommentCount > 5 OR p.CommunityOwnedDate IS NULL --This is an OR now AND p.FavoriteCount > 0 ) OR ( p.PostTypeId = 2 AND p.CommentCount > 1 AND p.LastEditDate IS NULL OR p.Score > 5 -- This is an OR now AND p.ParentId = 0 ) AND (p.ClosedDate IS NULL)
We don’t get the two seeks anymore. We get one big scan.
Is One Better?
The two seek plan has this profile:
Table 'Posts'. Scan count 10, logical reads 30678 Table 'Worktable'. Scan count 0, logical reads 0 Table 'Workfile'. Scan count 0, logical reads 0 SQL Server Execution Times: CPU time = 439 ms, elapsed time = 108 ms.
Here’s the scan plan profile:
Table 'Posts'. Scan count 5, logical reads 127472 SQL Server Execution Times: CPU time = 4624 ms, elapsed time = 1617 ms.
In this case, the index union optimization works in our favor.
We can push the optimizer towards a plan like that by breaking up complicated where clauses.
SELECT COUNT(*) FROM ( SELECT 1 AS x FROM dbo.Posts AS p WHERE ( p.PostTypeId = 1 AND p.AcceptedAnswerId <> 0 AND p.CommentCount > 5 AND p.CommunityOwnedDate IS NULL AND p.FavoriteCount > 0 ) UNION ALL SELECT 1 AS x FROM dbo.Posts AS p WHERE ( p.PostTypeId = 2 AND p.CommentCount > 1 AND p.LastEditDate IS NULL AND p.Score > 5 AND p.ParentId = 0 ) AND (p.ClosedDate IS NULL) ) AS x
Et voila!
Which has this profile:
Table 'Posts'. Scan count 2, logical reads 30001 SQL Server Execution Times: CPU time = 329 ms, elapsed time = 329 ms.
Beat My Guest
The optimizer is full of all sorts of cool tricks.
The better your indexes are, and the more clearly you write your queries, the more of those tricks you might see it start using
Thanks for reading!
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.
One thought on “How SQL Server Can Handle Complex Query Predicates (Sometimes)”
Comments are closed.