Learn T-SQL With Erik: DISTINCT Isn’t Always Evil

Learn T-SQL With Erik: DISTINCT Isn’t Always Evil


Video Summary

In this video, I delve into the nuances of using `DISTINCT` in T-SQL queries, addressing common misconceptions and providing practical examples to help you understand when and why `DISTINCT` is necessary. I start by explaining how `DISTINCT` can sometimes get a bad rap due to its impact on performance with large result sets, but emphasize that it’s not always harmful or avoidable. The video covers scenarios where `DISTINCT` is crucial for getting accurate query results, such as in join operations and when dealing with many-to-many relationships. I also discuss alternative methods like using `EXISTS` to achieve similar outcomes without the performance overhead of `DISTINCT`. This content is part of my comprehensive T-SQL course, which you can pre-order now at a special price of $250; after completion, the price will increase to $500. If you’re attending the PASS Data Community Summit in Seattle this November and are interested in these topics, be sure to check out the companion material for my pre-cons with Kendra Little-Leader.

Full Transcript

Erik Darling here with Darling Data and we are going to, in this video, which is again more sort of preview content from my T-SQL course, we’re going to talk about Distinct. Now, if you would like to get in on the pre-sale price for this video, down in the video description there is a link. You can buy it right now for $250. That price, once all the material is finished, will be going up to $500. And of course, this is companion material for the pre-cons that Kendra Little-Leader has. And I will be, I don’t know, I don’t know, I don’t know, I don’t know, I don’t know, I don’t know, I don’t know, I don’t know, I don’t know. I don’t know, I don’t know, I don’t know, I don’t know, I don’t know, I don’t know, there’s lots of P’s you could use in there. At PASS Data Community Summit, this November in Seattle. So if you are attending those pre-cons, you can, of course, get, you’ll, of course, get this content free with your admission there. Otherwise, the link is down below. So this is a short excerpt from my material on Distinct.

And I hope you enjoy it. Anyway, when it was like, like, like, I wanted to write specifically about Distinct, because there are times when it’s like, not harmful, and when it is necessary. And like, all the time, like, every time I open LinkedIn, it’s like, the first three things I see are like, this terrible LLM generated posts, where it’s like, top 10 things to avoid in SQL, and it’s like, fire emoji, avoid, avoid select star, and you’re like, here we go. And it’s rocket ship emoji, don’t use distinct. And you’re like, hmm, but, but some, but, and there’s never like, an alternative.

All right, there’s never like, like, here’s what, here’s a better way of doing anything. It’s always just like, don’t do this stuff later. You’re like, oh, great. Oh, wonderful. You really put your heart and soul into that one. Good, good, good job there. But what we’re going to talk about in this video is sort of like why Distinct kind of ends up sneaking into things. And when you might actually need to distinctify some stuff in order to make SQL, make, get query, get the correct query results back. Who knows, right? Sometimes you got to make something distinct.

There are a lot more examples in the full content, but this is, so this is just something to give you an idea of what’s in there, right? A little, this little, what’s that, what’s that you’re cooking? What’s in the kitchen? Good stuff. Come for dinner. Whatever. Anyway, let’s say that, you know, you’re, you’re writing a query and you want to see everyone, all the users with a reputation of one that have posted an answer. Remember in the inner join, outer join video, we talked about different join, like logical semantic stuff that preserves rows on whatever side of the thing.

And so you write a query that looks something like this, right? Where we’re selecting ID and display name from the users table. And we’re just joining to the post table to see, hey, did they post anything? Right? Did they post the right kind of thing in there? Oh, maybe they did. Maybe they didn’t. But we’re going to find out now.

The thing is when you run this query, you get like a lot of duplicates in here, right? You can see good old, a mean, a meanie. That’s duplicate rows. And, you know, as you go through old farmer, Dave, a couple in there and you realize that, well, you know, we got like a hundred thousand rows back from this thing. If we run this and get the actual execution plan, about 99,000 rows come out of this and we realize we got a lot of duplicates in there.

And we might not want a lot of duplicates in there. So what we do is slap an old distinct on it. Now, the reason why distinct gets sort of a bad rap is because it is like doing a big group buy on all the columns that you are selecting. Right? Maybe just distinct. Right? It doesn’t happen at the same place when you, in like the logical query processing part of the query is grouped by.

It happens like way at the end and like down, like if you remember the order that SQL processes queries in, distinct happens way down at the end with the select. So it’s like a group buy after you’ve already like potentially grouped by other stuff. Like there’s very little reason to mix distinct in with a group buy as well.

So like, let’s say we just slap a distinct on this one, which is no different from saying group buy ID display name. And when we look at the execution plan, rather than 99,000 rows coming back, it’s only 79,000. So we got like 20,000 some odd rows from the results by putting a distinct on this.

Now, every, now every combination of ID and display name that we see in here is going to be unique. We see that now we only have one Amin Amini and there’s going to be one Farmer Dave and everyone else. We just have one of them. They are now, this is now fully unique.

Now, the reason why this happens a lot is because people’s view of SQL in general, like as a language, how it fundamentally works is often full of shortcomings, right? The goal of this course is to fill those shortcomings in, right? There’s a need in the market, stuff about the economy. I don’t know.

But, um, the, like a lot of the times people will use a join, uh, some, some form of join syntax to figure out if rows exist or don’t exist. And that’s not really the best way to do this sort of thing. So, uh, if Zoomit will wake up, hello, Zoomit, come on, paid good money for you.

Just kidding, Mark Rassinovich. I downloaded wares. Uh, just kidding, it’s free. Um, so with a join, if you have a one-to-many relationship, the way you’d be doing the, from the user’s table to the post table, right?

Because one user could have many posts, joins preserve that, right? So we get all the rows back from a join that match, right? It doesn’t, doesn’t matter if there are one or 10 million, all those rows come back if they match, right?

The joins preserve that sort of duplication stuff, the one-to-many relationship. Because, like, you might, you know, want it for some queries, right? There might be something that you actually need, like, all of the matches for.

Like, if you were summing up the total score for a user, you would need all of their scores. You wouldn’t want just one of their scores. It wouldn’t make sense, right? You need all their scores to sum up a user’s total score from the post table.

If you just need to figure out if data is there or not there, the best way to go about that is to use exists or not exists. Because these either find a row or they don’t. If you find a row, it existed.

If you don’t find a row, it not existed. But you don’t preserve one-to-many relationships. You just find a relationship or don’t find a relationship. And that’s what meets the logical requirements of your query, right?

You don’t mess around with a whole bunch of other stuff. So, I love summing is now mad that apparently I downloaded it for free. But one way that you can get around having to use distinct is to use instead of a join, just use exists. Because this means that we’ll only get one row back when there’s a match.

So, like, for this, this just says, hey, if this user is over here, send them to the results, right? It’s not like if this user is over here, get all their results, right? Get all their posts.

So, if we run this and we look at the execution plan, we will have gotten the same 79,000 rows back without having to slap a distinct on it, which would make the LLMs of LinkedIn happy, right? I wish there was a way to just report those people. No semicolon.

I should report myself. Anyway, there are times when distinct is the correct thing to do. Now, let’s take a query like this, which is sort of a many-to-many relationship. So, we’re selecting from the post table and we’re doing a left join to the comments table, right?

And we’re just looking at this for one single user. Now, since this is a left join to the comments table, if we wanted to get a count of how many comments a user had, we could do this in one of three ways. We could either just do a count, which would be wrong.

We could do a count of just the ID column in the comments table. Now, remember, count doesn’t count nulls. It only counts present values, right?

But it will count duplicate present values, right? We talked about this in the case expression video a bit. And then the other way of doing this is to say, give me a count of just distinct IDs from the comments table. Now, this is important because ID is the clustered primary key of the comments table.

So, if there are duplicates in here, it’s not because we have some, like, error in our table and we have corrupt data and we have duplicates in our primary key. That’s not the case. It’s just the left join has a, like, it’s a many-to-many relationship between posts and comments.

Right? Like, one user can have a whole mess of comments. So, this is one of those things.

I guess it’s one-to-many relationships and it’s just one user. But one user can have lots of comments, but they have, but there’s also, but, like, the owner user ID in the post table is not unique. Right?

Because one user can have lots of posts. So, it’s, like, one, it is a many-to-many in that regard. So, if we were to run this query and look at what these different counts produce, we’d pretty quickly see that we do need distinct here in order to get correct results. So, let’s run this.

And so, it’ll take a second to run because we’re doing a lot of work here. But let’s look at the difference in what these counts produce. Right?

When we just do count or when we do a count of the comment IDs, right? That’s these two up here. Right? We have the count big and the count big CID. We get gigantic numbers back.

Right? But when we do a count of distinct IDs here, we get a much smaller number back. 46,737. I’m not sure how to pronounce this number.

We’re going to, there are no commas in there. We’re just going to be terrified. But we can see where these counts came from. At least the two big wrong ones. Right?

If we look at the execution plan, look at the number of rows that that join produces. That is the same number that we got from the count star and the count CID. Right?

So, the 27,901 rows in the post table that match owner user ID 22656 and the 46,737 rows in the comments table, those get joined together and they produce 1304009036. It’s a nine, 10 digit number.

I might have run out of fingers if I did that. Right? So, that produces a big result set. Right? It’s not quite a Cartesian product, but it’s a big number. It produces a big number of rows.

Right? Because they all match. There’s a many to many relationship. It’s like 27,901 times 46,737. It’s a lot of rows. Right?

Big number of rows. But the select distinct on the ID column over here produces the correct number of rows. And we can validate that by just hitting the comments table and getting a count for that specific user ID.

So, if we run this and we look at how many rows come back from that, we’ll see user ID 22656 does indeed have 46,737 rows in there. So, while I totally get the advice to avoid distinct, you know, it comes from a good place because, you know, distinct does get stuck on very big queries that select a lot of columns.

A lot of them are probably strings. And, like, doing that sort of distinctification of really, really big result sets can be painful. But there are plenty of times when using distinct, A, provides you correct results.

And, B, you know, not something we’re going to explore today. But there are times when using distinct in a query gives the optimizer some guarantees around a unique set of data being produced by that query.

And that can actually help the optimizer quite a bit to produce better query plans. And we’re actually, you know, using distinct to produce fewer rows from something that’s part of your query can be very useful at times.

So, don’t throw distinct out completely. Distinct can be useful. But, you know, there are times when, you know, just doing a select distinct on something can cause, you know, some performance pains.

But, you know, if you test it with and without distinct and it’s not that much different, maybe, I don’t know, maybe it’s okay. Maybe the LLMs aren’t always right.

Shocking, I know. Anyway, thank you for watching. I hope you enjoyed yourselves. I hope you learned something. And we will talk about something else wonderful and T-SQL-y in the next video. Anyway, goodbye.

Bye-bye. Dalai. Bye-bye. Bye-bye. Thank you.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Learn T-SQL With Erik: Solving Problems With APPLY

Learn T-SQL With Erik: Solving Problems With APPLY


Video Summary

In this video, I delve into the beginner section of my upcoming T-SQL course, focusing on a fundamental yet powerful feature: APPLY. Starting with an engaging introduction to APPLY as a unique blend between JOINs and subqueries, I walk through common issues encountered when trying to solve top-end-per-group problems using CTEs or derived joins. By demonstrating how APPLY elegantly handles these scenarios, I aim to demystify its syntax and application in real-world queries. Through practical examples, including the Stack Overflow database, I show how APPLY can be used to find the top three badges earned by users with a reputation over 750,000, illustrating why it’s such an essential tool for any SQL Server professional. The video also touches on alternative syntax and performance considerations, providing a comprehensive yet accessible overview of this powerful T-SQL feature.

Full Transcript

Erik Darling here with Darling Data, and we’re going to continue on with some of the preview content from my upcoming T-SQL course. I would like to thank everyone who has pre-purchased it so far. Your support, of course, means quite a bit to me. And my chief goal right now is to not disappoint you. Again, this content is going to be part of a larger paid course. If you would like to pre-purchase it, there is a link down in the video description for you to do that. The price is currently $250. That will be going up to $500 once the course is fully published. And this is companion content to the T-SQL pre-conferences that Kendra Little and I are doing at PASS Data Community Summit in Seattle this November. And so if you are attending those, don’t buy it because you will get access to the content. as part of your admission to those pre-conferences. So hold your horses for that. I mean, I’m going to tell you not to. If you decide to just because you’re like, Erik Darling, we love you. You’re worth $250. Cool, but you don’t have to. Save your money. Go drinking. Today’s video is going to be about apply. Now, this is the beginner section of the content. So there’s not going to be the advanced level of stuff that you would maybe normally find in content about apply that that is coming later. What we’re going to do today is solve a little problem with apply and sort of dissect how apply works. Now, we’ll also look at an alternative to the apply syntax because sometimes looking at alternative syntax helps people understand what we’re doing with apply a little bit better.

But apply is one of my favorite pieces of t-SQL syntax. Sort of generally. I was going to see if zoom it’s awake. It is wonderful. Apply is sort of like if a drive join in a subquery had a baby. It’s the beautiful bouncing baby apply. In other dialects of SQL, postgres, Oracle, some other fancy ones. The cross-apply syntax is fairly well equivalent to the lateral join syntax over there. Apparently, it was in the SQL standard, but it wasn’t fully codified in the SQL standard as the lateral join syntax until after Microsoft had added apply to SQL Server, So apply is not the like anti-SQL standard form of way of doing it, but it is the sort of equivalent idea. Now, apply does have two flavors. There is cross-apply, which is an inner join equivalent, and then there’s outer apply, which is an outer join equivalent.

And the problem that we’re going to solve today is probably the easiest one to help most folks grasp what apply is good at. And that is the sort of top end per group problem. What we’re going to look at is in the context of the stack overflow database is a few of the higher reputation users in the three most recent badges that they’ve earned for being really good at overflowing stacks. But in your work, you might look at this in other ways, like maybe the top three selling products per category, like bed and bath, home and garden, kitchen, sneakers, whatever product categories you have, you want to find the top three selling products for that. Things like that.

So we’re going to look at sort of some query patterns that don’t work for that and then how apply works for that. So if you were to try this with a CTE, and I’m not saying that it’s not doable with a CTE, just you can’t do it like this. And looking at this sort of help set up the apply syntax a little bit.

So if we were to run this query, which in here we’re looking, we’re getting the top three badges, you notice that this line right here is quoted out. And that’s because in the context of this CTE, the users table, the alias for the users table, the reference for the users table down here is not visible to it yet. We can’t access the users table from the CTE up here.

We can reference it after we join to the CTE down here, but it doesn’t get us the result that we want. So like if we just run this and like we see what the query produces, we just produce three random badges, right, for three random users. These are just the three most recent badges earned period, right?

So this doesn’t get us what we want. We want the top three badges for the users that qualify for our query down here. So this query produces no results, right? We don’t get anything back because no one with a reputation over 750,000 was in the top three badges that got earned period for the badges table, right?

This is just like the whole thing. These people weren’t in there. The other part of it that doesn’t work is if we tried to do a similar thing with a derived join. You notice I have this where clause similarly quoted out in here because like the users table, the reference to the users table, isn’t available inside of the derived join.

We can’t see it until after the join is complete and we do the on clause. You end up with the exact same problem in here, right? Selecting the top three most recent badges, none of those users have a reputation over 750,000, right?

So like this returns no results as well. This is why you can’t just go stick things like top into a CTE or a derived join. Like expect it to match the results of the other stuff, right?

Like it works for some things, but not for this particular task. So if we use apply, what we’re able to do is a see like that reference to the users table inside of the apply. So this does end up getting a little bit more spread out than it would normally be, but it’s only because I tried to put some helpful commentary in there.

Where inside we have the we select from the users table and then we take then we write our cross apply, right? So the cross apply is right after that. Like I said, the cross apply is like an inner join.

So it comes after that from and in here we can select the top three badges and we can make sure because we’re using top now would be the same story with offset fetch to right? Like you could just as easily use offset fetch instead of top here, just like offset zero rows fetch next three rows. But in here we’re using top because it’s faster to write.

No other reason. That most people in T-SQL will write top just sort of like muscle memory, right? Top, whatever.

So but with top we need order by and because this order by on the date column, the date column is not unique, right? We might have duplicates in here. We do need to have our unique tiebreaker here.

This ID column. The ID is the clustered primary key of the badges table. So this is our unique tiebreaker in case there are any duplicates in the date column, right? That’s a critical thing that you need when you are ordering by a date, when you’re ordering by a column that might have duplicates in it because it’s considered non-deterministic, right?

You get different results without that unique tiebreaker. But inside of the apply, we are able to see that outer reference to the users table here, right? We can see this inside of the apply, which means that we, when we run this query, we’ll actually get back the results that we want, which are the users with a rep So we have a total of over 750,000, right?

Let’s validate that looking in here. Everyone here, the lowest one is 754,947. And we get, as long as there are three badges, like they’re like, as long as they’ve earned three badges, you’ll see three badges for each person.

Now, this is the sort of interesting thing about apply is that, you know, you can say you want like the top however many, but if that however many don’t exist, you only get back like, the, you only get back however many results actually do exist. So like, let’s say we said top 10, but one person that only earned nine badges, we would only get nine rows for them.

We can’t invent a row to send back for them. And we’re not going to send back this a random duplicate row because that, that would be insane. Right?

So like you get the top however many are available, like for your, like to, until your top is either met or we run out of rows to find. So this give, this does give us the correct results, which is wonderful. So now let’s dissect a little bit more how apply works so that everyone can understand the majesty of this query.

Right? So, um, the other queries didn’t work because again, it’s sort of like you were saying, uh, select the top three badges of whatever, but like just changing this query to hit the users table and do a simple in to figure out who. Uh, like which users earned just the top three most recent badges ever in here.

You’ll see that we get back three kind of random people and none of them have a reputation even close to 750,000. Right? Maybe they do now.

Maybe they’re a fantastic, like just amazing stack overflow database users at this point. I don’t know. I don’t follow their careers. I’ll be honest with you. I don’t, don’t care that much, but like, like those three people were not in, did not meet that 7,000.

So we’re going to have a 750,000 reputation goal. So the way that apply works is sort of like if we take that, like the initial starting query, right? Like let’s say that we know we want to find like what, like our basic starting point is we want to select.

We want to select these columns from the users table. And all that we care about are users who have a reputation of over 750,000. And when we supply our results back to, you know, whatever application web front end, whatever, whatever you want to call it.

So we want those results ordered by reputation descending. And then for any, if there are any duplicates, the ID column descending, right? We need that unique tiebreaker to make sure the results come back in a sort of reliable, reliable order every single time.

These are, these are the people we get back. So these six rows. All right.

We get, of course, John Skeet. We get Balassie. I’m not sure who that is. We get Darren Dimitrov. Sounds very smart. We get Bon C. I’ve seen that name around. We get Hans Passant. Some sort of German French mass mashup there, perhaps, perhaps a Belgian.

I don’t know. You can’t tell. And then we have Marc Gravel right here rounding out the list with 754,947. So the, this is our starting query, and these are the results from our starting query.

The way apply works is sort of like for every row that your starting query produces, apply the result of the cross apply query to that row. So that’s why we get three rows back for John Skeet and three rows back for Balassie and so on down the list, because we take, we apply that top three query to the results of this query. And just to sort of go a little bit further with that, you know, again, like this is the, this is the query inside of the cross apply just without the correlation in here.

So what we need to say is for each user that qualifies for our where clause, apply this query result to that. Now you could absolutely do this with a derived join or a CTE. You just have to write the query a little bit differently.

Another common approach to this is to use the row number window function and to get a result, like generate a row number over that result and then filter on the row number after that result has been, after that result has been produced. So in here we’re saying where the, remember the alias for the row number inside of the derived join is n, right? That’s this thing right here.

So we’re saying where n is less than or equal to three. Now, just to kind of make this a little bit easier to understand, I’m just going to edit this query really quickly. And let’s grab, let’s look at this again.

And so these are our top six users. We’re just going to grab this number 22656. And then in here, we’re just going to change this for a minute where b.userid equals 22656. And we’re just going to run this result on its own because there’s no correlation inside of a derived join.

Again, you can’t access the stuff outside of a derived join within the derived join. So there’s no correlation to the user’s table inside of this. So we can just run this without an error and we can just see what the results are for one user, right?

So if we run this, we’re going to get back this result. And this is, these are going to be the top three most recent badges for this particular user ID, right? We had a good answer, a nice answer, and a great answer.

What a fantastic answer. And we have that ordered by date descending in here. And of course, we have this ordered by ID. If there were, I don’t think, no, I don’t see any duplicate dates in there, but, if there were any duplicate dates in here, then the ID would act as the tiebreaker to differentiate between those duplicate rows.

Like I said, you could, like with a duplicate, when you order by a column with duplicates in it, the result is considered non-deterministic. So what we, we get the same result back for this one. We just take on a slightly different query format with it.

So let’s get rid of this where clause, because we don’t need this anymore. That was purely a display model query. And let’s run this. And you’ll see that we get back the same thing.

Well, now we have this, this column in here, right? And one, two, three, one, two, three, over and over again. So I guess this could be useful if you wanted to quickly eyeball anyone who may not have had, that may not have had a top three badges in there.

Maybe someone who only earned two badges. They somehow accumulated that much reputation without ever doing anything really spectacular, I guess. But we, we get all the same sort of information.

We get the same sort of set of information back from that. So that’s a bit about how apply works. Hopefully it helps you understand the syntax for it and at least one application for apply that you can, where you can use it to solve interesting query riddles without breaking your back with and typing a whole lot more than you might if you start using windowing functions. There are some performance considerations around this stuff, but you know, there’s only so much time in the, in the preview material for me to spend on things.

So you’re just, you’re just going to have to wait for the full video on that, but you only get the full video if you buy the training. But anyway, thank you for watching. I hope you enjoyed yourselves.

I hope you learned something and I will see you in the next video where we will do some more of this fun T-SQL learning stuff. Anyway, thank you for watching. I’m out of here. Goodbye.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

SQL Server Performance Office Hours Episode 13

SQL Server Performance Office Hours Episode 13


What is the effect of having a few queries using Read Committed isolation level where the database is set to RCSI? Will those dumb Read Committed queries block any others?
Do you have any suggestions for optimizing data retrieval using Full Text Search (FullTextMatch TVF)? We’ve tried cutting down the query to reduce the dataset using other parameters before the FTS bit and also breaking out FTS across multiple columns to do one at a time, but anything else? Also, the memory used by FullTextMatch is hard to define. Should we reduce the size of the BufferPool to give more to FTS or the other way around? Thank you for coming to my Ted Talk on FTS.
If I were to want to kill all the table partitions by fire, what brand of flamethrower do you recommend?
Do you know of any documentation that makes it very obvious what Max Server Memory controls? It’s changed across versions. I think it used to just be the buffer pool?
I am writing C# classes and building CLRs for string comparisons, then creating a TVF for execution. The TVF is creating row estimates of exactly 1000x my row actuals (2 rows creates a “2 of 2000” for example). No query hints seem to resolve this, how would you work to control these estimates from my compiled C#?

To ask your questions, head over here.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Learn T-SQL With Erik: INNER vs OUTER JOIN Filtering Logic

Learn T-SQL With Erik: INNER vs OUTER JOIN Filtering Logic


Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Learn T-SQL With Erik: WHERE vs Having

Learn T-SQL With Erik: WHERE vs Having


Video Summary

In this video, I delve into a practical example to explain the differences between using `WHERE` and `HAVING` clauses in SQL Server queries. Erik Darling from Darling Data shares insights on how these clauses can be used effectively or misused when filtering data. By walking through an interesting query involving user display names and reputation scores from the Stack Overflow database, I demonstrate why applying a `WHERE` clause might not always yield the desired results and introduce conditional aggregates as a solution to preserve the integrity of grouped data while still filtering out unwanted rows. This video is a sneak peek into my upcoming T-SQL course, which will be available at a presale price of $250 until it launches in full, priced at $500. If you’re interested in learning more advanced SQL techniques and strategies from experienced instructors like myself and Kendra, make sure to check out the link in the video description for early access.

Full Transcript

Erik Darling here with Darling Data and today’s video is going to be another little sort of teaser-oo for my upcoming T-SQL course. If you are interested in getting in on the presale price for this course, link down in the video description to do that. It is currently $250. When the course is all completed, it will be going up to $500. And of course, this course, of course, of course, of course, is a horse. This course material is going to be $500. This course material is going to be in large part a representation of the material that Kendra and I are presenting at our pre-cons at PASS Data Community Summit in November. So, like the stuff that you learn there will be the stuff that is in this course. And if you attend our pre-cons in Seattle, I will give you access to this material as part of that ticket. So, you will get it for free if you show up there. If you’re not going to be there, well, the link to purchase it is below. There are no further discounts on this. So, don’t ask. It’s a lot of work. But in today’s video, we’re going to talk about where versus having a little bit. And because they’re both ways of filtering data, right? So, like you have the where clause, which filters out rows from query results, right? It applies a filter and it says you rose, you made it or you didn’t make it.

And then you have the having clause. And the having clause introduces filtering on a grouped table. And by grouped table, I mean you have a query with a group by expression in it. And you might want to figure out if the result of some aggregate on that group table meets some criteria. So, an easy way to show you that is to use this first query here. We’ll just wait for ZoomIt to wake up. ZoomIt apparently is just taking a very long nap and doesn’t want to do anything for me here.

So, this is an example of a grouped query. We have a group by on the display name column, right? We are also selecting the display name column. But we’re asking SQL Server to filter out the results by only display names. Now, this is a sort of a funny quirk about the Stack Overflow database and it’s kind of an amusing query because display names are non-unique.

Users can, multiple users can have the same display name. There are even two John Skeets in the Stack Overflow database for some strange reason. But it’s just sort of, so it’s sort of a funny query where we’re just taking all the display names and we are looking for that average reputation, average downvotes, average upvotes, and then the count of total users that have that display name. And we’re sort of putting them on teams together, right? So, like team display name, whatever your display name is.

So, if your display name has more than 2,000 participants, more than 2,000 team members, that’s where we can start caring about your numbers for these. So, this is our original query and ZoomIt has once again gone to sleep. So, if we run this, we’ll get some results back.

And let’s pretend that after running this query, we have sort of a practical problem. We have these nine rows, right? The nine rows is sort of important to the problem we’re going to solve. But the thing that our teams are mad about is the fact that their reputations, their average reputation, is being brought low by teammates with low reputation.

People with a reputation of, let’s say, one. Let’s say these low reputation users are really screwing things up. So, what we might be tempted to do is apply a where clause additionally to this query, right?

We’re still going to, we still need to figure out who has a total team size of greater than or equal to 2,000. But now we’re just going to say, well, we’d only, we’re going to screen out all the people with a reputation of one. And hopefully, this will make our, this will bring up the averages for these teams, right?

Because we want to make sure that these teams still have good averages. The problem we run into is that this significantly cuts down on the results. We only have five rows now.

We actually, we actually destroyed four teams by getting rid of, by filtering out reputation equals one users. These are the original results. So, we’re missing Matt, we’re missing Mark.

I don’t know. This is like weird, like apostle names or something. But like, you know, the averages did go up for these, right? Like these score, these average reputations are higher, but we no longer have this full list of teams.

And that’s not a very good sign, is it? We have done something wrong with our query. And we need to figure out a way to run this query and get the same nine rows back.

But without these reputations suffering, right? Like these reputations, sorry, these reputations suffering. These reputations have already suffered, right?

And granted, like the average downvotes and upvotes are probably a bit different too. But the total users is different because we have screened out all these reputation equals one people. So, the where cloud is the wrong way to do this.

Now, there’s one way that you can write this query to give you the, to preserve the nine rows, but still not consider the low reputation equals one people, is to write a conditional aggregate. Now, if we, I’m going to zoom in on this.

And I have, I have, I have formatted this in a way to make it hopefully very clear what we’re doing here. Like this average downvotes is very easy to figure out. It’s just an average on downvotes.

So, I’ve, I’ve spread this out a bit to make it very clear what we’re doing. And what we have in here is a case expression. Case expressions are wonderful things, right?

You can do all sorts of fun stuff with them. And one thing that you can do with them is stick them in an aggregate like this. So, what we’re going to say is if your reputation equals one, then we are going to, we’re going to substitute that value with a null, right?

So, average isn’t a, null isn’t a value. So, you can’t average nulls, right? It’s not a, not an averageable thing. And if your rep, so that’s that.

And then our bailout on the case expression is if your reputation is not equal to one, or rather higher than one, right? Because there’s no, no reputation lower than one in the Stack Overflow database. One is as low as you can go.

There’s not a check constraint on that in the 2013 copy. But in real life, your reputation can’t go below one, right? Doesn’t matter, doesn’t matter how bad you screw up, your reputation doesn’t go below one.

I guess otherwise it would be like, almost like the opposite of the high reputation contest, sort of the Stack Overflow thing, sort of gamifies for people answering questions and stuff. You’re able to say, I want to see how low I can get this.

I don’t know how many bad answers. Like, actually like rewarding bad behavior if they did that. So, if we run this query now, we will get, so like, so just to like make sure that we’re on the same page here. This query had the where clause where reputation is greater than one in it.

I’ve taken that out of this one, right? We have no where clause here. All we have in here is the having clause. So, say, we’re grouped by as, sorry, we’re count big as greater than or equal to 200, greater than or equal to 2,000 after we group by display name, right?

And if we run this, we will get our same nine rows back, but our reputation, right? Our reputation will be okay, right? So, all of the numbers, all of the other numbers are going to match up.

There’s not a great way to sync this up, but you can see that the numbers up here and the numbers in the upvotes and downvotes columns, these all match up nicely, right? These are all the same, but the numbers in our reputation column, our average reputation column, these are all, these have gone up.

Now, the order is a little bit different here. The reason why the order is a little bit different here is because once we’ve screened out the reputation equals one people, some of the averages have sort of changed places a little bit, right?

Like, there’s a little bit of difference in the order of these, where if we compare one to the other, Matt is still on top, Alex is still second, David is still third, Mark is still fourth, but Chris and Michael have switched places, and I think actually that’s about it there.

Yeah. So, this, like, that did change the scores a little bit, because the average reputations switched around because of, like, some display names had way, had more reputation equals one people that got, where the reputation got replaced with null, so that the scores, like, the order changed a little bit, so someone might be mad in there, someone might want to, I don’t know, start kicking some reputation one people off their team, maybe.

But this is one way to get around the problems you might run into when you are trying to sort of manipulate data that is being grouped, and stuff like the where clause and the having clause is not an appropriate way to filter things out. Sometimes you do need to filter out rows with the where clause, and sometimes you do need to figure out, filter out rows from the group table with the having clause.

Because other times, you can apply a little bit of filtering inside of your aggregates in order to change what actually gets aggregated in there. So, that’s about it for this one.

In the actual course material, there is, of course, a lot more other stuff that goes along with this lesson, but this is just sort of one of the more practical examples of what the things that I’m going to be teaching you. If you like this, again, you can buy it down there.

The videos will, the actual full-length course videos will start dropping soon. They’re out in tech review right now, kind of getting spruced up and fixed up. And, you know, I am like rehearsing them and stuff to make sure that my delivery is sound and solid for when I give these to you.

But they will be dropping very shortly. And I hope you actually buy them and watch them and like them and all that stuff. So, thank you for watching.

I hope you enjoyed yourselves. I hope you learned something. And I will see you in the next video where we will talk about some more of the course material. So, anyway, that’s about it for me. Goodbye.

Goodbye. Goodbye.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Learn T-SQL With Erik: SELECT FROM WHERE?

Learn T-SQL With Erik: SELECT FROM WHERE?


Video Summary

In this video, I delve into the fundamentals of writing SQL queries by focusing on how to select from a table using a WHERE clause—a critical skill for any data professional. As part of my Learn T-SQL with Erik course, which is currently in its early stages, I provide an essential introduction that ensures viewers understand the basics before diving deeper. The video covers key concepts such as the difference between writing and executing queries, explaining why certain elements like aliases can’t be referenced in specific parts of a query due to how SQL Server processes them. This teaser gives a taste of what’s to come, setting the stage for more detailed exploration into query optimization, logical processing, and effective database design strategies that will empower you to write better T-SQL code. The link to purchase this video during its pre-sale phase is in the description; by buying now, you can be among the first to access all the comprehensive content as it unfolds.

Full Transcript

Erik Darling here with Darling Data, and today’s video is going to be a small amount of teaser material from my Learn T-SQL with Erik course from the first module where I walk through the basics of selecting from a table with a where clause. Fun stuff, but we all have to begin somewhere, and this is about as close to the beginning as we can. as you can get. And this course does need to make sure that everyone, by the time we get to the end, knows absolutely everything about writing good queries. So that’s where we’re going to start. The link to purchase this video during the pre-sale era is down in the video description, so if you would like to purchase this video, you would like to see the full course content when and as it becomes available. You can purchase now and you can be on the ground floor for when all of the good stuff, the full course material starts getting presented. Again, this is just a small chunk of it. So the first thing to understand, and it’s a fundamental concept for understanding why some queries work and some queries don’t, and why you have to write some queries differently.

from other queries is that the way that you sit down and write a query is a bit different from the way that SQL Server goes and interprets that query when it gets executed. You, of course, start with your select, and there’s a whole bunch of stuff that you can do with the select clause. Then, you know, you can, you know, you can, well, I guess, semi-optionally have a from clause, and there’s a whole bunch of stuff that you can do within your from clause. And then you can have your where clause and your group by, and you’re, and if you have group by, you might even add in a having clause.

And then you have your order by. Now the presentation order by down here is very important because without this presentation order by, we do not have any guaranteed result ordering. This is a problem that I find in many client queries. And this is a question that I see people asking to this day where it’s something like, if every time I run this query, it returns data in the same order. So, do I always need an order by? The answer is, well, yes. If you want that data to always, if you want that query to always return that data in that order, without an order, without presentation order by, might not always do that.

I do talk about why more in depth in some other modules, but for now, just know that that is the case. And one should not rely purely on a small handful of maybe observed behaviors, because those observed behaviors may change as your database changes. And of course, the way that SQL Server starts putting your queries together for execution, it is slightly different.

SQL Server starts with the from, moves on to the where, moves on to the group by, moves on to the having, then down towards the end, it finally gets to select and then order by. And then, well, I did, I did leave this out the first time, but because I hope to God, you never actually have to work with XML or JSON. But if for some reason you do just know that, that happens way at the end. SQL Server is like, I’ll deal with you last.

You mess. All right. But yeah, so SQL Server, the way that it interprets a query is different from the way that your query is written. That’s why there are some elements of your query that are not going to be eligible or accessible in other parts of the query. By that, pretty specifically, I mean that, you know, like if you have some sort of expression in your select list, you can have it in your order by, but you like you can access that alias in your order by, but you can’t access that alias in your where clause or your group by or your having.

Like, I think Oracle recently introduced something where you can now reference aliases in the group by, which I think is very cool because it saves you a lot of like, like perhaps like repasting some complex expressions in your select list in your group by clause. But the main point of writing queries is we tell our database the stuff that we want. And it’s the database’s job to go and give us that stuff, go get it for us.

So the select list tells the database which columns we want. The from clause tells the database which tables those columns are in. The where clause tells the database which rows to include or exclude in the query.

Group by will tell us which rows to produce aggregates with. And the order by tells the engine which way to sort results. In other words, your job is to shape the result set and it’s the database’s job to retrieve it.

So if I were looking to expand the darling data empire and I were saying, hey, database of all these cool people who, you know, show up to events or take my training. All of these wonderful, wonderful, brilliant, beautiful, bright attendees. If I were to say, hey, I need to make some money.

I need some need to put some change in my pocket. I would want to tell the database to give me company contact information for 100 people. Right. You want to send out that big email.

Right. You don’t want to send out a bunch of individual emails. That’s a big, not very, not very wise. I find who primarily use SQL Server who have the highest consulting budgets. And that query would look a little bit like this.

Or I select the top 100. Right. Just limit it to those 100 results. Now SQL Server uses top. Other database engines might use limit.

Some of them might only use offset fetch. But SQL Server has this wonderful proprietary top command. Not sure what Microsoft was coding for there, but we have it anyway. And then I would want to get some company names and some contact names and some email addresses.

And I’d probably actually want to see a number on the consulting budget before I decided what to do here. Because without seeing that, you might sort by the top 100 descending. And you might find someone who has like a 50 cent consulting budget.

And well, that’s just not enough change for the pockets these days, is it? Not in this economy. Not in this economy.

Not in this economy. But then I would limit the results to people who primarily use SQL Server. And my presentation order by. Right, you don’t put the word presentation in there. Just the order by is sufficient.

Then I would order by consulting budget descending. And that would be the data that I want, shape the way I want it, and the order that I want it in. Queries are really just descriptions of the things that you want to see.

It’s sort of like how indexes contain data, right? You create an index and the data that you, the way that you define that index defines what data lives in that index, whether it’s key columns or included columns. And then you get statistics, like you get a histogram on that data that describes what data is, like exists in that index. So databases contain data and we describe the data that we want to see with queries, right? So all the stuff that we do to write that query is just a description, right? It is not an order, right? It is not a, like, it does not tell the database how to do anything. It just tells us what the end result should be. The better job you do of describing the results you want, the fewer surprises you’ll have later when results show up. There’s a lot that goes into this. This is a very loaded statement and that’s what the rest of the course is for, to help you describe your results as accurately as possible. There are going to be certain things that your query is going to be at the mercy of as far as database design goes, though, right? There’s a lot that can go on here that can make or break a query. Data types being one of them, the amount of times that I have seen data incorrectly typed in a table, whether it’s a, you know, I mean, it’s usually something else that is stored in a string column, right? So it’s like a date or like an integer, not like a phone number because phone numbers make sense to store as strings because they might have parentheses or dashes or you might call 1-900-ERIC-RULES. It has letters in it. Someone might have like an extension that they put in, right? So there’s like reasons that to not store phone numbers as integers, but there are things that like that should be stored as the correct data type that I often see are not. Other things that can help your queries when they run, if your, if data is unique, either in a column or in some group of columns, some composite key of columns, you should tell the database this. If your data is closely related to data in another table, enough so that it would, it would, it would best be described with a foreign key relationship, you should do that.

If you have some domain knowledge of what is in your tables, where either the, the values that should, that are allowed to exist in there, then you should put constraints on your tables to enforce that. Likewise, if data should never be null, then a default constraint is a very good idea because that way, in most cases, you can get around nulls ever showing up in there. And of course, your, your database will function best when you have useful indexes to help your queries locate the data they care about, have data in the order that they care about, all sorts of things like that.

Now, the actual material, there’s a bunch of awesome stuff in here. And this teaser video, it’s left to the imagination. I know, such a tease. So when we think about logical query processing, there’s fundamentally no difference between the way the queries, like how the way the queries that I’m going to show you are implemented by the optimizer. And it’s really hard to do spell checks in comments, but I’m going to fix this one. So don’t worry about that. So again, your job is to shape the data. And it’s the engine’s job to get the data. Now, because of the way that logical query processing works, you might notice this nice little red squiggle here. That’s because total display names is an alias for this count big aggregate function, which means because this is in the select list, we can’t reference it in the where clause because to SQL Server, the select list hasn’t happened yet, right? Only the, only like we have the from, we have the where, we have the group by, but select and order by have not happened, have not occurred to SQL Server yet. So this query is going to give us an error. All right. So like there’s, there’s no running this one. Now this is an aggregate, so we don’t want to put a where clause on that. That’s where we use having, but let’s turn on actual execution plans.

Now you can either push this button here at the top of SQL Server management studio, this one right here, or you can hit control and M on your keyboard, assuming that you have a rather standard keyboard. And there are two other ways that we could write this query, depending on how we’re feeling.

It’s not going to make a big difference to performance in this case, how these things, how we do it. It’s, but there are going to be physically implemented by SQL Server in the same way, right? So this query where we basically successfully do what we tried to do above, and we use having to filter out rows produced by the grouped table aggregate, right? Where count big is greater than two greater than or equal to 2000, which is what we were looking for in the where clause here that was invalid, or we can add another layer to the query. So the reason why the logical query processing thing is important is because every time you add in a new from that process kind of starts over, right? SQL Server has to like say, oh, well, there’s a from here. What happens in this from?

Oh, there’s a from out here that we’re froming this from and then like you can really from yourself into a really interesting query, the more layers you add to it. But the point here is that this from is the inner scope of the query because we have these lovely parentheses giving this query a little hug.

And this query has its own little alias out here. And then we can refer to everything inside of that inner scope in this outer scope, right? So the from here gives us a different scope of from to get our data from, right? So if we run these two queries and we look at the execution plans, we’re going to get the same results back and we’re going to get the same query plan. Now performance here is no big deal, but they both do roughly the same thing. They both scan the clustered index. They both use a hash aggregate to collapse the rows, right? By display name. And then we have a filter operator.

And that filter operator is going to be the same for both of them, or rather have the same meaning for both of them. In this one, we have this expression 1001 greater than equal to 2000. That’s the count, that’s the having in the count big, because remember, we don’t know which rows are going to have that count that meets our requirement until we’ve done the aggregate. So we have to group all our data, and then we can filter it. But it’s the same thing in here, where we have a predicate on expression 1001 is greater than or equal to 2000. So this filter happens after we’ve collapsed our data down, and we’ve done our count, and we know which grouped rows are going to meet our greater than or equal to 2000 requirement. So there are two ways to write this query, but the way that the query gets physically implemented is the same, right? So again, it’s our job to describe the query, the data that we want, and it’s the engine’s job to go and get that data and come up with a plan to get that data that works according to our logic. So what I want you to understand here is that the results of queries don’t just have to be returned out to some client, right? Like they don’t have to go out to SSMS, they don’t have to go to your application, they don’t have to, you know, end up like, you know, wherever on some web portal screen.

The results of a query can be passed to another scope of that query. You can even put the results of a query into a temporary object and then use something else to talk to that temporary object. But it’s almost sort of like the pipeline concept of things where each from is sort of like its own little pipeline where data comes from here, and you can pipeline it into another from. So you can really build big queries that do exactly what you want. I’m not suggesting from a performance standpoint that big monolithic queries are always a good idea or even ever a good idea. There is a lot of wisdom in writing smaller chunks of queries that do use temporary objects, but we’ll get into that a little bit more detail later. A lot of the performance tuning I do is on big giant monolithic queries that did not feel like maybe worked out well when the database was small and needed the money. And as the database got bigger and money was no object, that query got worse and worse and slower and slower.

So what when you’re writing a query, there are many things that might seem just downright impossible with just writing a select from where maybe some joins and stuff that when you start introducing outer and inner scopes or perhaps using CTE, if not derived tables, they can make writing a query that seems impossible very, very simple and easy. Anyway, thank you for watching.

I hope you enjoyed yourselves. I hope you learned something. And again, this, the link to purchase this content is in the video description. So if you’re interested in buying this course while it’s on sale, you can get to it now. Anyway, thank you for watching. Goodbye.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Learn T-SQL With Erik: Course Introduction Material

Learn T-SQL With Erik: Course Introduction Material


Video Summary

In this video, I introduce my T-SQL course aimed at SQL developers, data scientists, engineers, and analysts looking to enhance their T-SQL skills or those who are just getting started. I walk through the setup process, including the need for SQL Server 2022 Developer Edition and the latest version of Management Studio, as well as how to download and use sample databases like Stack Overflow 2010 or 2013. Additionally, I provide a preview of some introductory material that covers common T-SQL conventions, basic commands, and data storage fundamentals, all designed to help learners get up to speed quickly. This introduction is just the beginning; stay tuned for more in-depth content starting with the basics of the `SELECT FROM WHERE` query structure.

Full Transcript

Erik Darling here with Darling Data, and we’re going to spend some time, you and I, where I’m going to be presenting small portions of the material from my T-SQL course here on YouTube. It is nowhere, going to be nowhere near the full breadth of material that you will find in the paid course, but it will be enough to hopefully lure you in. It will be hopefully a wiggly enough worm to lure you in to purchase it from me so that you can get the the full goodness from it. So of course, no learning experience is complete without a little bit of an introduction to talk about what course you’re taking. You know, like that. The course introduction slide. Beautiful. And of course introducing your instructor. So this is me. This is my very impressive resume. This is my MBA. This is just about a full encapsulation of all the details of my life, at least the important and pertinent ones to what we’re doing here, learning T-SQL together. This course is, of course, aimed at people who are trying to get into or who are practicing, but maybe not so meaningfully practicing their T-SQL skills. So if you’re a SQL developer, whether it is, you know, you’re developing, OLTP applications, or you are developing other types of things, you could be a data scientist, data engineer, data analyst, but it doesn’t matter because this course covers all sorts of query writing stuff. It doesn’t matter if it’s transactional or if it’s like reporting OLAP stuff.

I’m going to hit everything in here. So whatever your job title is, if you work with Microsoft SQL Server with T-SQL, this course is absolutely for you to learn from me. It’s amazing the way that works out, isn’t it? Something for everyone. The goals of the course are to teach you sound fundamentals, right? To make sure that all the little gaps and cracks and whatever you think you know are filled in. To advance your skillset meaningfully, really step you up so that when you’re writing queries or procedures or creating stuff, you do a real good job at it.

You know, you can make very informed choices as to the things, the structure of your database and your queries and everything else. And you will not be to blame when things go terrible. I’m also here to bust some very pervasive SQL myths because there are a lot of them floating around out there. Probably one of the biggest sources of SQL stupidity on the planet is LinkedIn, which is really just a playground for morons to post LLM nonsense in.

Granted, there are plenty of SQL bloggers out there who are incapable of an original thought that an LLM didn’t spit out for them. But we don’t need to pay too much attention to them, do we? Because we can recognize that and we know better than to traffic their websites. So this course, again, just to sort of recap a little bit, is going to be recording the content that will be presented at PASS.

Kendra and I are doing two days of T-SQL courses. So my goal is to have this entire thing cover the material presented there, plus all the stuff that was sort of on the cutting room floor. Right. Like, you know, two days of pre-cons, you’re looking at about six and a half hours of teach time each. This is going to be way more than that. So this is like all that stuff, plus a whole lot of other stuff that probably can’t fit in there.

The course up until it goes live will be available for two hundred and fifty dollars. And then it will go up to five hundred dollars once everything is said and done and complete. Videos will start dropping for this after May 10th.

I realize it says May 10th, but May 10th is Saturday. So it’s really after May 10th. I just ran out of room. So what I want to show you in this video is some of the starting material that like the from the overview material that you will you will get with the with the full course.

Starting off with some instructions about how to get to the sample data. Data and databases and the tools and things that you will need in order to run queries the way that I am showing you. So you will need SQL Server 2022 Developer Edition.

There is no harm in you getting 2022 Developer Edition and practicing there. Considering that SQL Server 2025 will probably be dropping sometime towards the end of this year, probably five, six months from now. It’s not a bad idea to get acquainted with the current newest version of SQL Server.

And of course, the latest version of Management Studio. I am right now using the SSMS 21 preview. That’s how I have this lovely dark mode.

But you don’t necessarily have to do that if you prefer a lighter mode of Management Studio. The demo databases that you can use for either of these are the Stack Overflow 2010 or 2013 database. They are you can download them from, of course, from these links.

The small version expands to about 10 gigs. The larger version expands to about 50 gigs. So depending on the size of your laptop or VM or wherever you’re going to be running these exercises and doing these practice things, that’s where you should do that. You should you should get a database that will fit reasonably into that hardware.

Now, to extract these, you will either want 7-zip or Windows 11 because Windows 11 can work with 7-zip zippy files or whatever. But 7-zip is very good at compacting things down. So that’s why 7-zip gets used here.

When you unzip everything, you will get four data files and one log file. It is not a backup file. So there’s no like restore database command. It is an attach database thing.

So you need to move your data and log files to whatever wherever your SQL Server is installed. And then you will use the attach database dialog to select the first MDF file and it will detect everything else that it needs to put in there. Along with the sort of get you set up instructions, there are a few other files that I thought would be useful to the folks looking at this because, you know, like me knowing T-SQL, I maybe make you or rather anyone who knows T-SQL may make a lot of assumptions about what other people know about T-SQL, but they might not actually know.

You might say some words, then they’re like, I know those are words, but I don’t know what they mean. So there is a file of what I consider to be common T-SQL conventions that I will refer to throughout the course. There will probably be a few things to get added in here once I get to some of the other stuff.

But for now, it covers logical query processing. It covers, you know, temporary tables, table variables, derived tables, just some basic syntax so that when you see query patterns, you understand. Exactly what they are.

And if I say some words that you don’t, you know, you don’t know the meaning of you can, you can refer to this file and you can find exactly what I’m talking about in here. So there is a very good sort of compendium of common terms and expressions and what they mean in here. There’s also some common T-SQL usage stuff that I thought would be useful.

So just sort of teaching like just some basic commands, mathematical operators, comparison operators, stuff like that. Common functions that you might use for things. So all sorts of good stuff in there.

Some newer functions when working with strings. So this stuff over here. And some information about how you can search. Some searching stuff, which is useful.

Some stuff when you’re talking about working with dates, right? All fun things to do in there. Aggregation functions and, of course, window functions as well. There is also this file here on data storage, which is just a rather brief overview of how data is stored in SQL Server.

So there’s kind of a lot of readie text in here for you to go over and absorb, assuming you need it. And then there’s an example database. I don’t use this database in the course of the training.

I just wanted to have a separate sort of set of commands that would show you common SQL commands like create database, use database, creating a schema and granting permissions, creating tables with different things in them, like different constraints and whatnot. And then, of course, how to like add constraints to tables and add indexes to tables. There are some filtered indexes in here.

There’s all sorts of like just basic command things to help you learn fundamentals of creating tables and indexes and whatnot in SQL Server. So there’s some fun stuff in here just to get people acquainted with SQL Server who are not, or rather T-SQL, who are not maybe completely familiar with it. So just, you know, very basic things to help sort of like serve as a reference if you run into anything you don’t understand or you want to learn how to do something that maybe you don’t know the command for off the top of your head.

So this is not meant to be a copy of like the full documentation for everything you can do in a database, just kind of the most common stuff that like you might have to do day to day and the most common terms and things that I’m going to be referring to during the course of me teaching you these things. So that’s the course intro material. Nothing too crazy in there, but I think it’s a pretty good starting place for folks who are completely unaware or rather like getting basic start starting material for T-SQL.

Anyway, that’s that. After this video, we’re going to do some teaser material from the actual like learning portion of it. And I think this is a pretty good starting place for most folks.

Even if you are pretty good with T-SQL, you can probably learn a thing or two from these as well. Anyway, thank you for watching. Hope you enjoyed yourselves.

I hope you learned something. And I will see you in the next video where we will talk about the very starting point of most of your queries, which is the old select from where combo. So we’ll get into that then.

All right. Thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

SQL Server Performance Office Hours Episode 12

SQL Server Performance Office Hours Episode 12


What are your top recommended books on SQL server performance tuning for someone planning to explore this field for the first time?
How are you Eric? I have a stored procedure that reads data from tables with (option recompile). Initially, the execution plan indicated that some tables lacked statistics, displaying a warning sign. After updating the statistics, the warning sign disappeared. However, the procedure always times out on its first run each day, but subsequent calls are successful. Please advise.
I suspect some tables are not getting accurate stats with the default sampling rate. How would you recommend confirming this and finding a better value to set for persisted_sample_percent if I’m right?
If I remove a very large non-clustered index what is the impact on the transaction log?
You seem to default to turning on SORT_IN_TEMPDB. Why?

To ask your questions, head over here.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

PASS Precons and New T-SQL Course Announcement!

PASS Precons and New T-SQL Course Announcement!


Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Is OUTPUT Broken In SQL Server?

Is OUTPUT Broken In SQL Server?


Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.