Learn T-SQL With Erik: Getting Fancy With CASE Expressions
Video Summary
In this video, I delve into the world of case expressions in T-SQL, dispelling any confusion around their terminology by clarifying that they are indeed “case expressions” and not “case statements.” I walk through a simple count query example to illustrate how using subqueries can lead to multiple table scans, significantly impacting performance. To address this issue, I demonstrate how to use case expressions to selectively apply aggregates, ensuring more efficient execution plans and reducing the number of times the table is accessed. The video also explores the nuances between using `COUNT` and `SUM` with case expressions, emphasizing the importance of correctly handling null values for accurate results. Throughout the discussion, I highlight common pitfalls when defining ranges in these expressions to avoid data exclusion or double counting issues, providing practical advice on how to write robust and efficient queries.
Full Transcript
Erik Darling here with Darling Data, and in today’s video we are going to continue on talking about, this time, case expressions. I suppose it’s good to get it out of the way, first and foremost, that they are case expressions, they are not case statements. If you run into anyone in the wild who calls it a case statement, you can very, very safely disregard their opinion about basically anything and everything that they might say. So it is kind of a nice litmus test for a person. Call it a case statement, out you go. Anyway, let’s say that we have a, just a simple count query like this. But the problem with this count query is that we need to count all different sort of reputations in different ways.
You might end up, if you kind of knew it at T-SQL, writing a query that looks a little bit like this, right? Where, you know, for some reason you’re going to be able to write a query, and you’re like, well, I’ll just, I know, I know it’ll work. So I’ll just write some sub queries that count for reputation equals one, and then reputation between one and 9,000, and then reputation over 9,000. So this, this will work. This is perfectly fine.
But if we look at, if we run it, for the small table that we’re working with, this is fast enough. But if we look at the execution plan, we will see that we hit the user’s table once, twice, three times. And each time we hit the user’s table, oh, the tooltip went away. We read the whole thing, and, you know, we have sort of a not good time, right? So if your queries are big and more complex and slow for each one of these sub queries, you will have a 3x slowdown for your query, rather than just hitting the table once.
Now, what you can do is you can use case expressions to apply aggregates selectively. So if you do some, but you have to be a little bit careful with it, like, like which aggregate you choose. Like up here, we’re, we’re really, like, we’re just getting a count of all the users whose reputation falls into, you know, these various sort of segments. If you use count like this, you’re going to get wrong results back.
Right? Because we know when we ran this query for reputation equals one, we get 1,090,000 or so users back. If we run this query, what we’re going to get back is 2.46 million. Why? Because count is counting the ones, and the zeros. Count doesn’t exclude zeros. It’s like, what’s a value? So it gets counted. If you wanted to use count accurately for this, then you would have to, then you would have to suppress non-one reputations with null.
And then you could do this and get the right number of rows back. We get that 1,090,000. A lot of developers find it a little bit more intuitive to use sum rather than count, right? Because with sum, you’re saying if reputation equals one, then add one to it. If reputation doesn’t equal one, doesn’t equal one, then add zero to it. And that gives you a little bit easier time getting the correct count back.
But what that’s leading up to is that you can do something even more interesting with both sum and count. But sum is usually a little bit more common to do this with, is where you don’t have to just supply a static value for your then or else expressions. You can actually do like a column value. So let’s just say we wanted to sum up everyone’s reputation for 2013.
We could say where creation date is greater than equal to 2013-0101. And if that’s true, then we sum the reputation column. And if that’s not true, then we add in a zero, right? So you can do this and you can replace a column with, you can sum up a column doing that. Now, what that’s leading up to is that when you want to do that across a range of things, there’s a very easy way to do that to hit the entire users table, but only hit it once and just use case with the selective, selective aggregates to get you what you want.
The only thing that I’d say here is just when you’re writing stuff like this, be really careful, right? Because if you don’t write these, the ranges correctly. Like this, this query is very simple. So this query is hard to mess up. But as you write more complicated expressions for these things, you have to be very careful that you don’t write overlapping expressions, at least where you don’t intend to.
So like, it’d be like very easy to make a mental error and just say, like accidentally say less than 9,000 here and greater than 9,000 here. And what we would miss everyone with a reputation of exactly 9,000, right? So we would leave them completely out of the results. Same thing would be like if we accidentally did greater than or equal to 9,000 here, we would be double counting 9,000s across both of these.
So just when you’re writing these things, just take time, take your time and be careful about like when you’re setting up these ranges that they don’t overlap or exclude things accidentally. But this can be a very handy way of getting you a like full result set like we saw without having to write repetitive sub queries without having to hit the user’s table multiple times. This just does this just as one big scan of the user’s table makes life a lot easier and more simple, especially if the query that you need to pull those things from gets big and complicated, right?
This is just a very simple single table. If you have like a big long set of joins and you’re joining in views and functions, who knows what else then you know, like just having to hit that once rather than three, four more times that can be a lot easier on your server. Anyway, that’s just a little bit about case expressions and sort of more interesting ways of using them.
Thank you for watching. I hope you enjoyed yourselves. I hope you learned something. And I will see you in the next video where we will talk about using case expressions defensively in queries to avoid errors and other strange issues. So, hope you’ll join me there. Anyway, thank you for watching. Goodbye.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 25% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.