SQL Server Query Performance Using Union vs Union All (Sometimes)

Navel Academy

Thanks for watching!

Video Summary

In this video, I delve into a fascinating debate: when is union better or worse than union all? Often maligned due to its requirement for distinct rows, the union operator can indeed be cumbersome, especially without proper indexing. However, I demonstrate that there are scenarios where using union can significantly improve query performance by avoiding unnecessary sorting and spooling operations. Through a practical example from my recent SQLBits session, I show how tuning with union instead of union all can drastically reduce execution time—from 16 seconds to just three seconds in one case. This video is not about promoting vintage seltzer but rather encouraging viewers to consider the context before choosing between these operators, emphasizing that responsible query optimization requires understanding the nuances and specific needs of your data.

Full Transcript

Howdy folks, Erik Darling here with Erik Darling Data, which only makes sense. I’m drinking this lovely vintage seltzer. This YouTube video is not sponsored by vintage seltzer. I’m here to ask an interesting question, and that is, is union always better, or always better, always worse than union all. See, union gets kind of a bad reputation on account of it has to come up with a distinct set of rows and columns, or fields and records, if you’re into that sort of thing. But, and that can be painful. So when we think about union, we have a ton of columns, we have a ton of records, rows, or whatever you want to call them, who’s he, what’s it’s gadgets and ding-dongs. It can be pretty painful to come up with a unique set of those, especially when you’re there’s no supporting indexing, get some string columns in there, stuff can get pretty nasty. And that’s where union all can kind of be better because the SQL Server is not wasting time trying to uniqueify that result set. It’s just saying, you’re all welcome, come on, hang out, come into the light, some duplicates, repetitives, nulls, anything you want, throw it in there.

But, sometimes, getting a unique set of values is a good idea. And I’m going to show you one example of that. Now I’ve got two queries here. Now this is kind of a small deviation of a query that I presented in my indexing session at SQLBits. And the query doesn’t do anything terribly interesting, but I rewrote it a little bit. And this one here uses a union operator. And this one here uses a union all operator. And I’m going to show you something kind of interesting. I’m going to run, which one do I want to do first? You know what, I’m going to run the union all query first. And I’m going to kick that off. I’ve got query plans turned on, I hope. Otherwise, I’m going to waste about 15 seconds of your life. And while this runs, and runs, and runs, and runs, and kind of just keeps going. Oh, 16 seconds. Yeah, all right. Splendid. Now I’m going to run the union query.

And, ooh, that felt better to me. Three seconds. Not bad. Go us. We tuned that query with the magnificent use of union over union all. So what was the difference between these two queries? Well, let’s look at the union all plan first. I’m going to go over into the execution plan. And we’re going to concentrate on this bottom part, because believe me, this bottom part is where SQL Server did the majority of the work.

And let’s first look at this sort operator. And what I want to do is hit F4 there to bring up the properties window. Before we get to that, if we look at the sort, we can see that we spilled two threads. Spill level two. Oh, wrong button. There we go. And we spilled about 1,287 pages to disk.

Not a lot. Not terrible. If we look at the actual time statistics, we can see that we spent a little under a second on the sort. Where we spent the rest of our time was in this index. Sorry, this table spool.

So if we look at the actual time statistics here, a full 10. Oh, come on. Zoom it. There we go. A full 10 of our 16 seconds was spent spooling data about.

Spooled all over the place. Spooled once forth and beyond to the grave. Spooled.

It’s a long time to spend spooling data. If we look at kind of what we did for work in here. Well, we did about 243,000 executions. Right down there.

We did, if you add up the rebinds and the rewinds, that’ll add up to that number down there. And we spooled through above. That’s a lot.

Throw some commas in there, about 10 million rows. All right. Cool. So that’s our union all plan. If we look over at the union plan, right? That was three seconds, remember? Three wonderful seconds.

This is about the best you get out of me. It’s about, I mean, by that I mean that’s about as quickly as I can tune a query down to. Don’t think that there is any innuendo there. Sick people.

But if we look at the actual time stats here on the sort, we do about the same. But we have a slightly different sort over here. In this plan, we have a distinct sort. So this is where SQL Server decided to go and make our data set unique.

This is the distinct sort. If we look back over at the other plan, we just have, oh, where’d you go? Why’d you run away from me?

We have a regular sort. There is no distinctification going on in this sort operator. So let’s go back over here and let’s look at what happened in this query plan. So now let’s look at this table spool.

And it’s kind of interesting that if we look at what the spool did, it’s the same as the other one. We have 243, 205. And if we add up these, 40938, 202, 266, it adds up to the same thing.

But we spooled a lot less rows through this one. Yeah. Far fewer rows ended up going through that because we had a distinct result set from the union of those two tables rather than everything.

So sometimes, sometimes, we can save a lot of time in the query by getting a distinct result set. Now, in the session that I presented, I gave you a different way of tuning the query to get it down pretty quickly. Again, about three seconds.

So that’s why they call me three seconds, Eric. That’s why I tune queries down to. Anyway, sometimes, union can be better than union all.

Not always. Sometimes. It’s up to you as a, what do you call it? What’s a good word? Responsible.

Yeah. As a responsible query tuner to do this kind of investigation into your queries and why they’re slow. Anyway, I’m almost at the seven minute mark and that’s two minutes longer than I wanted to go. So thank you for watching and I’ll see you next time.