Learn T-SQL With Erik: UNION and UNION ALL

Learn T-SQL With Erik: UNION and UNION ALL


Video Summary

In this video, I dive into the world of T-SQL by exploring `UNION` and `UNION ALL`, two powerful set operators that often confuse SQL Server developers. I walk you through practical examples using tables filled with datums (or numbers), demonstrating how `UNION` returns unique values from both result sets while `UNION ALL` preserves all rows, including duplicates. We also delve into the query plans generated by these commands, showing how SQL Server processes and deduplicates data to optimize performance. Whether you’re new to T-SQL or looking to deepen your understanding of set operators, this video offers valuable insights that will enhance your skills. If you want to explore more, consider purchasing my presale course for just $250—significantly cheaper than the full price after summer!

Full Transcript

All right, it’s time to learn T-SQL with Erik. That’s me. This is the teaser content for my learn T-SQL course. Again, this is all in the presale price of 250 bucks. It’ll go up to 500 bucks when the advanced material is written. So that’ll be after the summer. So if you buy it now, you’ll spend less money than if you buy it later, which is a very, very good deal. Right? Just like, just because. Anyway, in this video, we’re going to talk about union and union all because everyone sort of gets this weird and wrong and there’s nothing worse than being weird or wrong. Nevermind being weird and wrong altogether. So let’s, let’s play with some tables here, shall we? Let’s, let’s create a couple of tables with some datum in them, right? Datums, you know, whatever, whatever you want to call them. Ah, that’s a good sign. Connection is, connection is not recovered, possibly recovered. What happened? I don’t know. I probably restarted my VM. I don’t know. Don’t, don’t, don’t mind me.

Anyway, let’s look, let’s actually turn on query plans here and let’s look at what datums are in these tables. And we’ll see that in the first table, we have the numbers one through five, but there are duplicates, right? There’s two ones and two twos and two threes and two fours and two fives and a rogue zoom it being weird on me, right? But we have doubles in here. And then in this table, we have nearly the same data, except we have added two sixes to the end. Oh dear me. What are we going to do? What are we going to do with these extra, these spare slack sixes running around?

Well, the fun thing about union is that union will return unique values from both tables, right? But, but only, it will be fully deduplicated down to just, just unique rows. So if we run this query and we look at the results, we just get the numbers one through six, right? The numbers one through five were in both tables, but we deduplicated that down to just one, two, three, four, five. And then in the second table, we had two sixes, which got narrowed down to just six. So there doesn’t have to be a match between the two tables to union or union all, right?

Like that, like matching the rows doesn’t matter. It’s all just about like returning unique rows from both result sets. Now, if we change, we actually just look at the query plan for this real quick. We’ll see that we scanned each one of those tables, right? They’re just tiny little temp tables. So there’s not really much worth putting an index on.

And then we concatenate both result sets together. And because we have 10 rows here and 12 rows here, then we end up with 22 rows here. But then we have this distinct sort kick in and that brings us down to six rows, right? So this is where the deduplication of everything happens from both inputs.

We can change those queries a little bit, right? And this is logically the same query, right? If we say select distinct union and then another select distinct, union would already do the work of making both result sets distinct. But we’re going to see a slightly different query plan here.

If we run this and we look at the results, we’ll see that we got the same thing back, right? We got the numbers 1, 2, 3, 4, 5, 6, just like last time. But the way SQL Server went and did this is a little bit different.

Notice here that we have two distinct sorts where SQL Server actually before we went and did this merge join union, right? And this is where SQL Server broke things down to just six rows. So we start with 10 rows here. We start with 12 rows here.

We make each of the result sets unique for itself, right? We get down to five here and we get down to six here. But then this just returns the six unique from between those two, right?

So remember, the first one is going to be 1, 2, 3, 4, 5 after we distinctify it. The second one is going to be 1, 2, 3, 4, 5, 6 after we distinctify it. So when it goes into the merge union, it’s 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 6.

And then the merge union takes care of the rest. How you see, how this query plan shows up is going to depend on what your data going into things looks like. If you are selecting a primary key or if you are like another unique column or you put like a literal value in there, you may not see the early sort distinct or other aggregation that would remove duplicates or like do any grouping on the first result set.

So like this really depends on what the data looks like before when like when you apply that when you say select distinct from it, it really depends on what that data looks like and what guarantees SQL Server has about that data. So you might not see this, but if you do, it might be kind of a reasonable performance benefit to think of this because rather than making the final operator do all the distinctification, right? Making like using union to make each thing unique, you might get like we’ve talked, like I’ve talked about in a million videos.

Sometimes early aggregation pays off because you break up the work of doing aggregates that SQL Server would do later over a much larger result set. So sometimes this can pay off from a performance point of view, even if logically both of these like either doing union or select distinct union select distinct is logically the same as select union select, right? We don’t need the distincts because the union implies that everything will be unique.

Sometimes the distinct or like group by or something else going before we do the union would be useful from a performance point of view. And of course, union all does not do anything to remove duplicates. If we run this query, we will get everything back, right?

I’ll fight like from both tables, four ones, four twos, four threes on down until we get down to two sixes, right? So that is everything, which of course is different from saying select distinct union all select distinct, which would just bring us down to two of each because we remove one from each result set. But then the union preserves duplicates coming out of here.

So we end up with one, one, two, two, three, three, four, four, five, five, six, six. And we can sort of validate that looking at the query plan where we go from 10 rows here and 12 rows here down to five rows here and six rows here. But then we have this concatenation operator that just brings the two results together and gives us 11 total rows, which is, of course, one, one, two, two, three, three, four, four, five, five, six.

Anyway, not a whole lot to talk about with union and union all, at least in the teaser material. So if you want to learn more, you should buy the full course, which would give you every opportunity to learn more. Anyway, thank you for watching.

I hope you enjoyed yourselves. I hope you learned something and I will see you in the next video where we will. I think we’re going to continue talking about these set operators. I think the next one is we got intersect, except is not distinct from. We got some fun stuff to talk about.

So I will see you over in those videos next. All right. Thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 25% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.