Footnote
I have two queries. They return the same number of rows.
The only difference is one column in the select list.
This query has the Id column, which is the primary key and clustered index of the Posts table.
SELECT DISTINCT
p.Id, p.AcceptedAnswerId, p.AnswerCount, p.Body, p.ClosedDate,
p.CommentCount, p.CommunityOwnedDate, p.CreationDate,
p.FavoriteCount, p.LastActivityDate, p.LastEditDate,
p.LastEditorDisplayName, p.LastEditorUserId, p.OwnerUserId,
p.ParentId, p.PostTypeId, p.Score, p.Tags, p.Title, p.ViewCount
FROM dbo.Posts AS p
JOIN dbo.Votes AS v
ON p.Id = v.PostId
WHERE p.PostTypeId = 2
AND p.CreationDate >= '20131225'
ORDER BY p.Id;
The query plan for it looks like this:

Notice that no operator in this plan performs any kind of aggregation.
There’s no Hash Match Aggregate, no Stream Aggregate, no Distinct Sort, NADA!
It runs for ~1.9 seconds to return about 25k rows.
Lessen
Watch how much changes when we remove that Id column from the select list.
SELECT DISTINCT
p.AcceptedAnswerId, p.AnswerCount, p.Body, p.ClosedDate,
p.CommentCount, p.CommunityOwnedDate, p.CreationDate,
p.FavoriteCount, p.LastActivityDate, p.LastEditDate,
p.LastEditorDisplayName, p.LastEditorUserId, p.OwnerUserId,
p.ParentId, p.PostTypeId, p.Score, p.Tags, p.Title, p.ViewCount
FROM dbo.Posts AS p
JOIN dbo.Votes AS v
ON p.Id = v.PostId
WHERE p.PostTypeId = 2
AND p.CreationDate >= '20131225';
This is what the query plan now looks like:

Zooming in a bit…

After we Scan the Posts table, we sort about 47k rows.
After the join to Votes, we aggregate data twice. There are two Stream Aggregate operators.
What do we sort?

We Sort every column in the table by every column in the table.
In other words, we order by every column we’ve selected.
What do we aggregate?
Everything. Twice.
What Does It All Mean?
When selecting distinct rows, it can be beneficial to include a column that the optimizer can guarantee is unique in the set of selected columns. Think of a primary key, or another column with a uniqueness constraint on it.
Without that, you can end up doing a lot of extra work to create a distinct result set.
Of course, there are times when that changes the logic of the query.
Thanks for reading!
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.
