Learn T-SQL With Erik: SELECT FROM WHERE?
Video Summary
In this video, I delve into the fundamentals of writing SQL queries by focusing on how to select from a table using a WHERE clause—a critical skill for any data professional. As part of my Learn T-SQL with Erik course, which is currently in its early stages, I provide an essential introduction that ensures viewers understand the basics before diving deeper. The video covers key concepts such as the difference between writing and executing queries, explaining why certain elements like aliases can’t be referenced in specific parts of a query due to how SQL Server processes them. This teaser gives a taste of what’s to come, setting the stage for more detailed exploration into query optimization, logical processing, and effective database design strategies that will empower you to write better T-SQL code. The link to purchase this video during its pre-sale phase is in the description; by buying now, you can be among the first to access all the comprehensive content as it unfolds.
Full Transcript
Erik Darling here with Darling Data, and today’s video is going to be a small amount of teaser material from my Learn T-SQL with Erik course from the first module where I walk through the basics of selecting from a table with a where clause. Fun stuff, but we all have to begin somewhere, and this is about as close to the beginning as we can. as you can get. And this course does need to make sure that everyone, by the time we get to the end, knows absolutely everything about writing good queries. So that’s where we’re going to start. The link to purchase this video during the pre-sale era is down in the video description, so if you would like to purchase this video, you would like to see the full course content when and as it becomes available. You can purchase now and you can be on the ground floor for when all of the good stuff, the full course material starts getting presented. Again, this is just a small chunk of it. So the first thing to understand, and it’s a fundamental concept for understanding why some queries work and some queries don’t, and why you have to write some queries differently.
from other queries is that the way that you sit down and write a query is a bit different from the way that SQL Server goes and interprets that query when it gets executed. You, of course, start with your select, and there’s a whole bunch of stuff that you can do with the select clause. Then, you know, you can, you know, you can, well, I guess, semi-optionally have a from clause, and there’s a whole bunch of stuff that you can do within your from clause. And then you can have your where clause and your group by, and you’re, and if you have group by, you might even add in a having clause.
And then you have your order by. Now the presentation order by down here is very important because without this presentation order by, we do not have any guaranteed result ordering. This is a problem that I find in many client queries. And this is a question that I see people asking to this day where it’s something like, if every time I run this query, it returns data in the same order. So, do I always need an order by? The answer is, well, yes. If you want that data to always, if you want that query to always return that data in that order, without an order, without presentation order by, might not always do that.
I do talk about why more in depth in some other modules, but for now, just know that that is the case. And one should not rely purely on a small handful of maybe observed behaviors, because those observed behaviors may change as your database changes. And of course, the way that SQL Server starts putting your queries together for execution, it is slightly different.
SQL Server starts with the from, moves on to the where, moves on to the group by, moves on to the having, then down towards the end, it finally gets to select and then order by. And then, well, I did, I did leave this out the first time, but because I hope to God, you never actually have to work with XML or JSON. But if for some reason you do just know that, that happens way at the end. SQL Server is like, I’ll deal with you last.
You mess. All right. But yeah, so SQL Server, the way that it interprets a query is different from the way that your query is written. That’s why there are some elements of your query that are not going to be eligible or accessible in other parts of the query. By that, pretty specifically, I mean that, you know, like if you have some sort of expression in your select list, you can have it in your order by, but you like you can access that alias in your order by, but you can’t access that alias in your where clause or your group by or your having.
Like, I think Oracle recently introduced something where you can now reference aliases in the group by, which I think is very cool because it saves you a lot of like, like perhaps like repasting some complex expressions in your select list in your group by clause. But the main point of writing queries is we tell our database the stuff that we want. And it’s the database’s job to go and give us that stuff, go get it for us.
So the select list tells the database which columns we want. The from clause tells the database which tables those columns are in. The where clause tells the database which rows to include or exclude in the query.
Group by will tell us which rows to produce aggregates with. And the order by tells the engine which way to sort results. In other words, your job is to shape the result set and it’s the database’s job to retrieve it.
So if I were looking to expand the darling data empire and I were saying, hey, database of all these cool people who, you know, show up to events or take my training. All of these wonderful, wonderful, brilliant, beautiful, bright attendees. If I were to say, hey, I need to make some money.
I need some need to put some change in my pocket. I would want to tell the database to give me company contact information for 100 people. Right. You want to send out that big email.
Right. You don’t want to send out a bunch of individual emails. That’s a big, not very, not very wise. I find who primarily use SQL Server who have the highest consulting budgets. And that query would look a little bit like this.
Or I select the top 100. Right. Just limit it to those 100 results. Now SQL Server uses top. Other database engines might use limit.
Some of them might only use offset fetch. But SQL Server has this wonderful proprietary top command. Not sure what Microsoft was coding for there, but we have it anyway. And then I would want to get some company names and some contact names and some email addresses.
And I’d probably actually want to see a number on the consulting budget before I decided what to do here. Because without seeing that, you might sort by the top 100 descending. And you might find someone who has like a 50 cent consulting budget.
And well, that’s just not enough change for the pockets these days, is it? Not in this economy. Not in this economy.
Not in this economy. But then I would limit the results to people who primarily use SQL Server. And my presentation order by. Right, you don’t put the word presentation in there. Just the order by is sufficient.
Then I would order by consulting budget descending. And that would be the data that I want, shape the way I want it, and the order that I want it in. Queries are really just descriptions of the things that you want to see.
It’s sort of like how indexes contain data, right? You create an index and the data that you, the way that you define that index defines what data lives in that index, whether it’s key columns or included columns. And then you get statistics, like you get a histogram on that data that describes what data is, like exists in that index. So databases contain data and we describe the data that we want to see with queries, right? So all the stuff that we do to write that query is just a description, right? It is not an order, right? It is not a, like, it does not tell the database how to do anything. It just tells us what the end result should be. The better job you do of describing the results you want, the fewer surprises you’ll have later when results show up. There’s a lot that goes into this. This is a very loaded statement and that’s what the rest of the course is for, to help you describe your results as accurately as possible. There are going to be certain things that your query is going to be at the mercy of as far as database design goes, though, right? There’s a lot that can go on here that can make or break a query. Data types being one of them, the amount of times that I have seen data incorrectly typed in a table, whether it’s a, you know, I mean, it’s usually something else that is stored in a string column, right? So it’s like a date or like an integer, not like a phone number because phone numbers make sense to store as strings because they might have parentheses or dashes or you might call 1-900-ERIC-RULES. It has letters in it. Someone might have like an extension that they put in, right? So there’s like reasons that to not store phone numbers as integers, but there are things that like that should be stored as the correct data type that I often see are not. Other things that can help your queries when they run, if your, if data is unique, either in a column or in some group of columns, some composite key of columns, you should tell the database this. If your data is closely related to data in another table, enough so that it would, it would, it would best be described with a foreign key relationship, you should do that.
If you have some domain knowledge of what is in your tables, where either the, the values that should, that are allowed to exist in there, then you should put constraints on your tables to enforce that. Likewise, if data should never be null, then a default constraint is a very good idea because that way, in most cases, you can get around nulls ever showing up in there. And of course, your, your database will function best when you have useful indexes to help your queries locate the data they care about, have data in the order that they care about, all sorts of things like that.
Now, the actual material, there’s a bunch of awesome stuff in here. And this teaser video, it’s left to the imagination. I know, such a tease. So when we think about logical query processing, there’s fundamentally no difference between the way the queries, like how the way the queries that I’m going to show you are implemented by the optimizer. And it’s really hard to do spell checks in comments, but I’m going to fix this one. So don’t worry about that. So again, your job is to shape the data. And it’s the engine’s job to get the data. Now, because of the way that logical query processing works, you might notice this nice little red squiggle here. That’s because total display names is an alias for this count big aggregate function, which means because this is in the select list, we can’t reference it in the where clause because to SQL Server, the select list hasn’t happened yet, right? Only the, only like we have the from, we have the where, we have the group by, but select and order by have not happened, have not occurred to SQL Server yet. So this query is going to give us an error. All right. So like there’s, there’s no running this one. Now this is an aggregate, so we don’t want to put a where clause on that. That’s where we use having, but let’s turn on actual execution plans.
Now you can either push this button here at the top of SQL Server management studio, this one right here, or you can hit control and M on your keyboard, assuming that you have a rather standard keyboard. And there are two other ways that we could write this query, depending on how we’re feeling.
It’s not going to make a big difference to performance in this case, how these things, how we do it. It’s, but there are going to be physically implemented by SQL Server in the same way, right? So this query where we basically successfully do what we tried to do above, and we use having to filter out rows produced by the grouped table aggregate, right? Where count big is greater than two greater than or equal to 2000, which is what we were looking for in the where clause here that was invalid, or we can add another layer to the query. So the reason why the logical query processing thing is important is because every time you add in a new from that process kind of starts over, right? SQL Server has to like say, oh, well, there’s a from here. What happens in this from?
Oh, there’s a from out here that we’re froming this from and then like you can really from yourself into a really interesting query, the more layers you add to it. But the point here is that this from is the inner scope of the query because we have these lovely parentheses giving this query a little hug.
And this query has its own little alias out here. And then we can refer to everything inside of that inner scope in this outer scope, right? So the from here gives us a different scope of from to get our data from, right? So if we run these two queries and we look at the execution plans, we’re going to get the same results back and we’re going to get the same query plan. Now performance here is no big deal, but they both do roughly the same thing. They both scan the clustered index. They both use a hash aggregate to collapse the rows, right? By display name. And then we have a filter operator.
And that filter operator is going to be the same for both of them, or rather have the same meaning for both of them. In this one, we have this expression 1001 greater than equal to 2000. That’s the count, that’s the having in the count big, because remember, we don’t know which rows are going to have that count that meets our requirement until we’ve done the aggregate. So we have to group all our data, and then we can filter it. But it’s the same thing in here, where we have a predicate on expression 1001 is greater than or equal to 2000. So this filter happens after we’ve collapsed our data down, and we’ve done our count, and we know which grouped rows are going to meet our greater than or equal to 2000 requirement. So there are two ways to write this query, but the way that the query gets physically implemented is the same, right? So again, it’s our job to describe the query, the data that we want, and it’s the engine’s job to go and get that data and come up with a plan to get that data that works according to our logic. So what I want you to understand here is that the results of queries don’t just have to be returned out to some client, right? Like they don’t have to go out to SSMS, they don’t have to go to your application, they don’t have to, you know, end up like, you know, wherever on some web portal screen.
The results of a query can be passed to another scope of that query. You can even put the results of a query into a temporary object and then use something else to talk to that temporary object. But it’s almost sort of like the pipeline concept of things where each from is sort of like its own little pipeline where data comes from here, and you can pipeline it into another from. So you can really build big queries that do exactly what you want. I’m not suggesting from a performance standpoint that big monolithic queries are always a good idea or even ever a good idea. There is a lot of wisdom in writing smaller chunks of queries that do use temporary objects, but we’ll get into that a little bit more detail later. A lot of the performance tuning I do is on big giant monolithic queries that did not feel like maybe worked out well when the database was small and needed the money. And as the database got bigger and money was no object, that query got worse and worse and slower and slower.
So what when you’re writing a query, there are many things that might seem just downright impossible with just writing a select from where maybe some joins and stuff that when you start introducing outer and inner scopes or perhaps using CTE, if not derived tables, they can make writing a query that seems impossible very, very simple and easy. Anyway, thank you for watching.
I hope you enjoyed yourselves. I hope you learned something. And again, this, the link to purchase this content is in the video description. So if you’re interested in buying this course while it’s on sale, you can get to it now. Anyway, thank you for watching. Goodbye.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.