Learn T-SQL With Erik: Solving Problems With APPLY

Learn T-SQL With Erik: Solving Problems With APPLY


Video Summary

In this video, I delve into the beginner section of my upcoming T-SQL course, focusing on a fundamental yet powerful feature: APPLY. Starting with an engaging introduction to APPLY as a unique blend between JOINs and subqueries, I walk through common issues encountered when trying to solve top-end-per-group problems using CTEs or derived joins. By demonstrating how APPLY elegantly handles these scenarios, I aim to demystify its syntax and application in real-world queries. Through practical examples, including the Stack Overflow database, I show how APPLY can be used to find the top three badges earned by users with a reputation over 750,000, illustrating why it’s such an essential tool for any SQL Server professional. The video also touches on alternative syntax and performance considerations, providing a comprehensive yet accessible overview of this powerful T-SQL feature.

Full Transcript

Erik Darling here with Darling Data, and we’re going to continue on with some of the preview content from my upcoming T-SQL course. I would like to thank everyone who has pre-purchased it so far. Your support, of course, means quite a bit to me. And my chief goal right now is to not disappoint you. Again, this content is going to be part of a larger paid course. If you would like to pre-purchase it, there is a link down in the video description for you to do that. The price is currently $250. That will be going up to $500 once the course is fully published. And this is companion content to the T-SQL pre-conferences that Kendra Little and I are doing at PASS Data Community Summit in Seattle this November. And so if you are attending those, don’t buy it because you will get access to the content. as part of your admission to those pre-conferences. So hold your horses for that. I mean, I’m going to tell you not to. If you decide to just because you’re like, Erik Darling, we love you. You’re worth $250. Cool, but you don’t have to. Save your money. Go drinking. Today’s video is going to be about apply. Now, this is the beginner section of the content. So there’s not going to be the advanced level of stuff that you would maybe normally find in content about apply that that is coming later. What we’re going to do today is solve a little problem with apply and sort of dissect how apply works. Now, we’ll also look at an alternative to the apply syntax because sometimes looking at alternative syntax helps people understand what we’re doing with apply a little bit better.

But apply is one of my favorite pieces of t-SQL syntax. Sort of generally. I was going to see if zoom it’s awake. It is wonderful. Apply is sort of like if a drive join in a subquery had a baby. It’s the beautiful bouncing baby apply. In other dialects of SQL, postgres, Oracle, some other fancy ones. The cross-apply syntax is fairly well equivalent to the lateral join syntax over there. Apparently, it was in the SQL standard, but it wasn’t fully codified in the SQL standard as the lateral join syntax until after Microsoft had added apply to SQL Server, So apply is not the like anti-SQL standard form of way of doing it, but it is the sort of equivalent idea. Now, apply does have two flavors. There is cross-apply, which is an inner join equivalent, and then there’s outer apply, which is an outer join equivalent.

And the problem that we’re going to solve today is probably the easiest one to help most folks grasp what apply is good at. And that is the sort of top end per group problem. What we’re going to look at is in the context of the stack overflow database is a few of the higher reputation users in the three most recent badges that they’ve earned for being really good at overflowing stacks. But in your work, you might look at this in other ways, like maybe the top three selling products per category, like bed and bath, home and garden, kitchen, sneakers, whatever product categories you have, you want to find the top three selling products for that. Things like that.

So we’re going to look at sort of some query patterns that don’t work for that and then how apply works for that. So if you were to try this with a CTE, and I’m not saying that it’s not doable with a CTE, just you can’t do it like this. And looking at this sort of help set up the apply syntax a little bit.

So if we were to run this query, which in here we’re looking, we’re getting the top three badges, you notice that this line right here is quoted out. And that’s because in the context of this CTE, the users table, the alias for the users table, the reference for the users table down here is not visible to it yet. We can’t access the users table from the CTE up here.

We can reference it after we join to the CTE down here, but it doesn’t get us the result that we want. So like if we just run this and like we see what the query produces, we just produce three random badges, right, for three random users. These are just the three most recent badges earned period, right?

So this doesn’t get us what we want. We want the top three badges for the users that qualify for our query down here. So this query produces no results, right? We don’t get anything back because no one with a reputation over 750,000 was in the top three badges that got earned period for the badges table, right?

This is just like the whole thing. These people weren’t in there. The other part of it that doesn’t work is if we tried to do a similar thing with a derived join. You notice I have this where clause similarly quoted out in here because like the users table, the reference to the users table, isn’t available inside of the derived join.

We can’t see it until after the join is complete and we do the on clause. You end up with the exact same problem in here, right? Selecting the top three most recent badges, none of those users have a reputation over 750,000, right?

So like this returns no results as well. This is why you can’t just go stick things like top into a CTE or a derived join. Like expect it to match the results of the other stuff, right?

Like it works for some things, but not for this particular task. So if we use apply, what we’re able to do is a see like that reference to the users table inside of the apply. So this does end up getting a little bit more spread out than it would normally be, but it’s only because I tried to put some helpful commentary in there.

Where inside we have the we select from the users table and then we take then we write our cross apply, right? So the cross apply is right after that. Like I said, the cross apply is like an inner join.

So it comes after that from and in here we can select the top three badges and we can make sure because we’re using top now would be the same story with offset fetch to right? Like you could just as easily use offset fetch instead of top here, just like offset zero rows fetch next three rows. But in here we’re using top because it’s faster to write.

No other reason. That most people in T-SQL will write top just sort of like muscle memory, right? Top, whatever.

So but with top we need order by and because this order by on the date column, the date column is not unique, right? We might have duplicates in here. We do need to have our unique tiebreaker here.

This ID column. The ID is the clustered primary key of the badges table. So this is our unique tiebreaker in case there are any duplicates in the date column, right? That’s a critical thing that you need when you are ordering by a date, when you’re ordering by a column that might have duplicates in it because it’s considered non-deterministic, right?

You get different results without that unique tiebreaker. But inside of the apply, we are able to see that outer reference to the users table here, right? We can see this inside of the apply, which means that we, when we run this query, we’ll actually get back the results that we want, which are the users with a rep So we have a total of over 750,000, right?

Let’s validate that looking in here. Everyone here, the lowest one is 754,947. And we get, as long as there are three badges, like they’re like, as long as they’ve earned three badges, you’ll see three badges for each person.

Now, this is the sort of interesting thing about apply is that, you know, you can say you want like the top however many, but if that however many don’t exist, you only get back like, the, you only get back however many results actually do exist. So like, let’s say we said top 10, but one person that only earned nine badges, we would only get nine rows for them.

We can’t invent a row to send back for them. And we’re not going to send back this a random duplicate row because that, that would be insane. Right?

So like you get the top however many are available, like for your, like to, until your top is either met or we run out of rows to find. So this give, this does give us the correct results, which is wonderful. So now let’s dissect a little bit more how apply works so that everyone can understand the majesty of this query.

Right? So, um, the other queries didn’t work because again, it’s sort of like you were saying, uh, select the top three badges of whatever, but like just changing this query to hit the users table and do a simple in to figure out who. Uh, like which users earned just the top three most recent badges ever in here.

You’ll see that we get back three kind of random people and none of them have a reputation even close to 750,000. Right? Maybe they do now.

Maybe they’re a fantastic, like just amazing stack overflow database users at this point. I don’t know. I don’t follow their careers. I’ll be honest with you. I don’t, don’t care that much, but like, like those three people were not in, did not meet that 7,000.

So we’re going to have a 750,000 reputation goal. So the way that apply works is sort of like if we take that, like the initial starting query, right? Like let’s say that we know we want to find like what, like our basic starting point is we want to select.

We want to select these columns from the users table. And all that we care about are users who have a reputation of over 750,000. And when we supply our results back to, you know, whatever application web front end, whatever, whatever you want to call it.

So we want those results ordered by reputation descending. And then for any, if there are any duplicates, the ID column descending, right? We need that unique tiebreaker to make sure the results come back in a sort of reliable, reliable order every single time.

These are, these are the people we get back. So these six rows. All right.

We get, of course, John Skeet. We get Balassie. I’m not sure who that is. We get Darren Dimitrov. Sounds very smart. We get Bon C. I’ve seen that name around. We get Hans Passant. Some sort of German French mass mashup there, perhaps, perhaps a Belgian.

I don’t know. You can’t tell. And then we have Marc Gravel right here rounding out the list with 754,947. So the, this is our starting query, and these are the results from our starting query.

The way apply works is sort of like for every row that your starting query produces, apply the result of the cross apply query to that row. So that’s why we get three rows back for John Skeet and three rows back for Balassie and so on down the list, because we take, we apply that top three query to the results of this query. And just to sort of go a little bit further with that, you know, again, like this is the, this is the query inside of the cross apply just without the correlation in here.

So what we need to say is for each user that qualifies for our where clause, apply this query result to that. Now you could absolutely do this with a derived join or a CTE. You just have to write the query a little bit differently.

Another common approach to this is to use the row number window function and to get a result, like generate a row number over that result and then filter on the row number after that result has been, after that result has been produced. So in here we’re saying where the, remember the alias for the row number inside of the derived join is n, right? That’s this thing right here.

So we’re saying where n is less than or equal to three. Now, just to kind of make this a little bit easier to understand, I’m just going to edit this query really quickly. And let’s grab, let’s look at this again.

And so these are our top six users. We’re just going to grab this number 22656. And then in here, we’re just going to change this for a minute where b.userid equals 22656. And we’re just going to run this result on its own because there’s no correlation inside of a derived join.

Again, you can’t access the stuff outside of a derived join within the derived join. So there’s no correlation to the user’s table inside of this. So we can just run this without an error and we can just see what the results are for one user, right?

So if we run this, we’re going to get back this result. And this is, these are going to be the top three most recent badges for this particular user ID, right? We had a good answer, a nice answer, and a great answer.

What a fantastic answer. And we have that ordered by date descending in here. And of course, we have this ordered by ID. If there were, I don’t think, no, I don’t see any duplicate dates in there, but, if there were any duplicate dates in here, then the ID would act as the tiebreaker to differentiate between those duplicate rows.

Like I said, you could, like with a duplicate, when you order by a column with duplicates in it, the result is considered non-deterministic. So what we, we get the same result back for this one. We just take on a slightly different query format with it.

So let’s get rid of this where clause, because we don’t need this anymore. That was purely a display model query. And let’s run this. And you’ll see that we get back the same thing.

Well, now we have this, this column in here, right? And one, two, three, one, two, three, over and over again. So I guess this could be useful if you wanted to quickly eyeball anyone who may not have had, that may not have had a top three badges in there.

Maybe someone who only earned two badges. They somehow accumulated that much reputation without ever doing anything really spectacular, I guess. But we, we get all the same sort of information.

We get the same sort of set of information back from that. So that’s a bit about how apply works. Hopefully it helps you understand the syntax for it and at least one application for apply that you can, where you can use it to solve interesting query riddles without breaking your back with and typing a whole lot more than you might if you start using windowing functions. There are some performance considerations around this stuff, but you know, there’s only so much time in the, in the preview material for me to spend on things.

So you’re just, you’re just going to have to wait for the full video on that, but you only get the full video if you buy the training. But anyway, thank you for watching. I hope you enjoyed yourselves.

I hope you learned something and I will see you in the next video where we will do some more of this fun T-SQL learning stuff. Anyway, thank you for watching. I’m out of here. Goodbye.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

SQL Server Performance Office Hours Episode 13

SQL Server Performance Office Hours Episode 13


What is the effect of having a few queries using Read Committed isolation level where the database is set to RCSI? Will those dumb Read Committed queries block any others?
Do you have any suggestions for optimizing data retrieval using Full Text Search (FullTextMatch TVF)? We’ve tried cutting down the query to reduce the dataset using other parameters before the FTS bit and also breaking out FTS across multiple columns to do one at a time, but anything else? Also, the memory used by FullTextMatch is hard to define. Should we reduce the size of the BufferPool to give more to FTS or the other way around? Thank you for coming to my Ted Talk on FTS.
If I were to want to kill all the table partitions by fire, what brand of flamethrower do you recommend?
Do you know of any documentation that makes it very obvious what Max Server Memory controls? It’s changed across versions. I think it used to just be the buffer pool?
I am writing C# classes and building CLRs for string comparisons, then creating a TVF for execution. The TVF is creating row estimates of exactly 1000x my row actuals (2 rows creates a “2 of 2000” for example). No query hints seem to resolve this, how would you work to control these estimates from my compiled C#?

To ask your questions, head over here.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Learn T-SQL With Erik: INNER vs OUTER JOIN Filtering Logic

Learn T-SQL With Erik: INNER vs OUTER JOIN Filtering Logic


Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Learn T-SQL With Erik: WHERE vs Having

Learn T-SQL With Erik: WHERE vs Having


Video Summary

In this video, I delve into a practical example to explain the differences between using `WHERE` and `HAVING` clauses in SQL Server queries. Erik Darling from Darling Data shares insights on how these clauses can be used effectively or misused when filtering data. By walking through an interesting query involving user display names and reputation scores from the Stack Overflow database, I demonstrate why applying a `WHERE` clause might not always yield the desired results and introduce conditional aggregates as a solution to preserve the integrity of grouped data while still filtering out unwanted rows. This video is a sneak peek into my upcoming T-SQL course, which will be available at a presale price of $250 until it launches in full, priced at $500. If you’re interested in learning more advanced SQL techniques and strategies from experienced instructors like myself and Kendra, make sure to check out the link in the video description for early access.

Full Transcript

Erik Darling here with Darling Data and today’s video is going to be another little sort of teaser-oo for my upcoming T-SQL course. If you are interested in getting in on the presale price for this course, link down in the video description to do that. It is currently $250. When the course is all completed, it will be going up to $500. And of course, this course, of course, of course, of course, is a horse. This course material is going to be $500. This course material is going to be in large part a representation of the material that Kendra and I are presenting at our pre-cons at PASS Data Community Summit in November. So, like the stuff that you learn there will be the stuff that is in this course. And if you attend our pre-cons in Seattle, I will give you access to this material as part of that ticket. So, you will get it for free if you show up there. If you’re not going to be there, well, the link to purchase it is below. There are no further discounts on this. So, don’t ask. It’s a lot of work. But in today’s video, we’re going to talk about where versus having a little bit. And because they’re both ways of filtering data, right? So, like you have the where clause, which filters out rows from query results, right? It applies a filter and it says you rose, you made it or you didn’t make it.

And then you have the having clause. And the having clause introduces filtering on a grouped table. And by grouped table, I mean you have a query with a group by expression in it. And you might want to figure out if the result of some aggregate on that group table meets some criteria. So, an easy way to show you that is to use this first query here. We’ll just wait for ZoomIt to wake up. ZoomIt apparently is just taking a very long nap and doesn’t want to do anything for me here.

So, this is an example of a grouped query. We have a group by on the display name column, right? We are also selecting the display name column. But we’re asking SQL Server to filter out the results by only display names. Now, this is a sort of a funny quirk about the Stack Overflow database and it’s kind of an amusing query because display names are non-unique.

Users can, multiple users can have the same display name. There are even two John Skeets in the Stack Overflow database for some strange reason. But it’s just sort of, so it’s sort of a funny query where we’re just taking all the display names and we are looking for that average reputation, average downvotes, average upvotes, and then the count of total users that have that display name. And we’re sort of putting them on teams together, right? So, like team display name, whatever your display name is.

So, if your display name has more than 2,000 participants, more than 2,000 team members, that’s where we can start caring about your numbers for these. So, this is our original query and ZoomIt has once again gone to sleep. So, if we run this, we’ll get some results back.

And let’s pretend that after running this query, we have sort of a practical problem. We have these nine rows, right? The nine rows is sort of important to the problem we’re going to solve. But the thing that our teams are mad about is the fact that their reputations, their average reputation, is being brought low by teammates with low reputation.

People with a reputation of, let’s say, one. Let’s say these low reputation users are really screwing things up. So, what we might be tempted to do is apply a where clause additionally to this query, right?

We’re still going to, we still need to figure out who has a total team size of greater than or equal to 2,000. But now we’re just going to say, well, we’d only, we’re going to screen out all the people with a reputation of one. And hopefully, this will make our, this will bring up the averages for these teams, right?

Because we want to make sure that these teams still have good averages. The problem we run into is that this significantly cuts down on the results. We only have five rows now.

We actually, we actually destroyed four teams by getting rid of, by filtering out reputation equals one users. These are the original results. So, we’re missing Matt, we’re missing Mark.

I don’t know. This is like weird, like apostle names or something. But like, you know, the averages did go up for these, right? Like these score, these average reputations are higher, but we no longer have this full list of teams.

And that’s not a very good sign, is it? We have done something wrong with our query. And we need to figure out a way to run this query and get the same nine rows back.

But without these reputations suffering, right? Like these reputations, sorry, these reputations suffering. These reputations have already suffered, right?

And granted, like the average downvotes and upvotes are probably a bit different too. But the total users is different because we have screened out all these reputation equals one people. So, the where cloud is the wrong way to do this.

Now, there’s one way that you can write this query to give you the, to preserve the nine rows, but still not consider the low reputation equals one people, is to write a conditional aggregate. Now, if we, I’m going to zoom in on this.

And I have, I have, I have formatted this in a way to make it hopefully very clear what we’re doing here. Like this average downvotes is very easy to figure out. It’s just an average on downvotes.

So, I’ve, I’ve spread this out a bit to make it very clear what we’re doing. And what we have in here is a case expression. Case expressions are wonderful things, right?

You can do all sorts of fun stuff with them. And one thing that you can do with them is stick them in an aggregate like this. So, what we’re going to say is if your reputation equals one, then we are going to, we’re going to substitute that value with a null, right?

So, average isn’t a, null isn’t a value. So, you can’t average nulls, right? It’s not a, not an averageable thing. And if your rep, so that’s that.

And then our bailout on the case expression is if your reputation is not equal to one, or rather higher than one, right? Because there’s no, no reputation lower than one in the Stack Overflow database. One is as low as you can go.

There’s not a check constraint on that in the 2013 copy. But in real life, your reputation can’t go below one, right? Doesn’t matter, doesn’t matter how bad you screw up, your reputation doesn’t go below one.

I guess otherwise it would be like, almost like the opposite of the high reputation contest, sort of the Stack Overflow thing, sort of gamifies for people answering questions and stuff. You’re able to say, I want to see how low I can get this.

I don’t know how many bad answers. Like, actually like rewarding bad behavior if they did that. So, if we run this query now, we will get, so like, so just to like make sure that we’re on the same page here. This query had the where clause where reputation is greater than one in it.

I’ve taken that out of this one, right? We have no where clause here. All we have in here is the having clause. So, say, we’re grouped by as, sorry, we’re count big as greater than or equal to 200, greater than or equal to 2,000 after we group by display name, right?

And if we run this, we will get our same nine rows back, but our reputation, right? Our reputation will be okay, right? So, all of the numbers, all of the other numbers are going to match up.

There’s not a great way to sync this up, but you can see that the numbers up here and the numbers in the upvotes and downvotes columns, these all match up nicely, right? These are all the same, but the numbers in our reputation column, our average reputation column, these are all, these have gone up.

Now, the order is a little bit different here. The reason why the order is a little bit different here is because once we’ve screened out the reputation equals one people, some of the averages have sort of changed places a little bit, right?

Like, there’s a little bit of difference in the order of these, where if we compare one to the other, Matt is still on top, Alex is still second, David is still third, Mark is still fourth, but Chris and Michael have switched places, and I think actually that’s about it there.

Yeah. So, this, like, that did change the scores a little bit, because the average reputations switched around because of, like, some display names had way, had more reputation equals one people that got, where the reputation got replaced with null, so that the scores, like, the order changed a little bit, so someone might be mad in there, someone might want to, I don’t know, start kicking some reputation one people off their team, maybe.

But this is one way to get around the problems you might run into when you are trying to sort of manipulate data that is being grouped, and stuff like the where clause and the having clause is not an appropriate way to filter things out. Sometimes you do need to filter out rows with the where clause, and sometimes you do need to figure out, filter out rows from the group table with the having clause.

Because other times, you can apply a little bit of filtering inside of your aggregates in order to change what actually gets aggregated in there. So, that’s about it for this one.

In the actual course material, there is, of course, a lot more other stuff that goes along with this lesson, but this is just sort of one of the more practical examples of what the things that I’m going to be teaching you. If you like this, again, you can buy it down there.

The videos will, the actual full-length course videos will start dropping soon. They’re out in tech review right now, kind of getting spruced up and fixed up. And, you know, I am like rehearsing them and stuff to make sure that my delivery is sound and solid for when I give these to you.

But they will be dropping very shortly. And I hope you actually buy them and watch them and like them and all that stuff. So, thank you for watching.

I hope you enjoyed yourselves. I hope you learned something. And I will see you in the next video where we will talk about some more of the course material. So, anyway, that’s about it for me. Goodbye.

Goodbye. Goodbye.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Learn T-SQL With Erik: SELECT FROM WHERE?

Learn T-SQL With Erik: SELECT FROM WHERE?


Video Summary

In this video, I delve into the fundamentals of writing SQL queries by focusing on how to select from a table using a WHERE clause—a critical skill for any data professional. As part of my Learn T-SQL with Erik course, which is currently in its early stages, I provide an essential introduction that ensures viewers understand the basics before diving deeper. The video covers key concepts such as the difference between writing and executing queries, explaining why certain elements like aliases can’t be referenced in specific parts of a query due to how SQL Server processes them. This teaser gives a taste of what’s to come, setting the stage for more detailed exploration into query optimization, logical processing, and effective database design strategies that will empower you to write better T-SQL code. The link to purchase this video during its pre-sale phase is in the description; by buying now, you can be among the first to access all the comprehensive content as it unfolds.

Full Transcript

Erik Darling here with Darling Data, and today’s video is going to be a small amount of teaser material from my Learn T-SQL with Erik course from the first module where I walk through the basics of selecting from a table with a where clause. Fun stuff, but we all have to begin somewhere, and this is about as close to the beginning as we can. as you can get. And this course does need to make sure that everyone, by the time we get to the end, knows absolutely everything about writing good queries. So that’s where we’re going to start. The link to purchase this video during the pre-sale era is down in the video description, so if you would like to purchase this video, you would like to see the full course content when and as it becomes available. You can purchase now and you can be on the ground floor for when all of the good stuff, the full course material starts getting presented. Again, this is just a small chunk of it. So the first thing to understand, and it’s a fundamental concept for understanding why some queries work and some queries don’t, and why you have to write some queries differently.

from other queries is that the way that you sit down and write a query is a bit different from the way that SQL Server goes and interprets that query when it gets executed. You, of course, start with your select, and there’s a whole bunch of stuff that you can do with the select clause. Then, you know, you can, you know, you can, well, I guess, semi-optionally have a from clause, and there’s a whole bunch of stuff that you can do within your from clause. And then you can have your where clause and your group by, and you’re, and if you have group by, you might even add in a having clause.

And then you have your order by. Now the presentation order by down here is very important because without this presentation order by, we do not have any guaranteed result ordering. This is a problem that I find in many client queries. And this is a question that I see people asking to this day where it’s something like, if every time I run this query, it returns data in the same order. So, do I always need an order by? The answer is, well, yes. If you want that data to always, if you want that query to always return that data in that order, without an order, without presentation order by, might not always do that.

I do talk about why more in depth in some other modules, but for now, just know that that is the case. And one should not rely purely on a small handful of maybe observed behaviors, because those observed behaviors may change as your database changes. And of course, the way that SQL Server starts putting your queries together for execution, it is slightly different.

SQL Server starts with the from, moves on to the where, moves on to the group by, moves on to the having, then down towards the end, it finally gets to select and then order by. And then, well, I did, I did leave this out the first time, but because I hope to God, you never actually have to work with XML or JSON. But if for some reason you do just know that, that happens way at the end. SQL Server is like, I’ll deal with you last.

You mess. All right. But yeah, so SQL Server, the way that it interprets a query is different from the way that your query is written. That’s why there are some elements of your query that are not going to be eligible or accessible in other parts of the query. By that, pretty specifically, I mean that, you know, like if you have some sort of expression in your select list, you can have it in your order by, but you like you can access that alias in your order by, but you can’t access that alias in your where clause or your group by or your having.

Like, I think Oracle recently introduced something where you can now reference aliases in the group by, which I think is very cool because it saves you a lot of like, like perhaps like repasting some complex expressions in your select list in your group by clause. But the main point of writing queries is we tell our database the stuff that we want. And it’s the database’s job to go and give us that stuff, go get it for us.

So the select list tells the database which columns we want. The from clause tells the database which tables those columns are in. The where clause tells the database which rows to include or exclude in the query.

Group by will tell us which rows to produce aggregates with. And the order by tells the engine which way to sort results. In other words, your job is to shape the result set and it’s the database’s job to retrieve it.

So if I were looking to expand the darling data empire and I were saying, hey, database of all these cool people who, you know, show up to events or take my training. All of these wonderful, wonderful, brilliant, beautiful, bright attendees. If I were to say, hey, I need to make some money.

I need some need to put some change in my pocket. I would want to tell the database to give me company contact information for 100 people. Right. You want to send out that big email.

Right. You don’t want to send out a bunch of individual emails. That’s a big, not very, not very wise. I find who primarily use SQL Server who have the highest consulting budgets. And that query would look a little bit like this.

Or I select the top 100. Right. Just limit it to those 100 results. Now SQL Server uses top. Other database engines might use limit.

Some of them might only use offset fetch. But SQL Server has this wonderful proprietary top command. Not sure what Microsoft was coding for there, but we have it anyway. And then I would want to get some company names and some contact names and some email addresses.

And I’d probably actually want to see a number on the consulting budget before I decided what to do here. Because without seeing that, you might sort by the top 100 descending. And you might find someone who has like a 50 cent consulting budget.

And well, that’s just not enough change for the pockets these days, is it? Not in this economy. Not in this economy.

Not in this economy. But then I would limit the results to people who primarily use SQL Server. And my presentation order by. Right, you don’t put the word presentation in there. Just the order by is sufficient.

Then I would order by consulting budget descending. And that would be the data that I want, shape the way I want it, and the order that I want it in. Queries are really just descriptions of the things that you want to see.

It’s sort of like how indexes contain data, right? You create an index and the data that you, the way that you define that index defines what data lives in that index, whether it’s key columns or included columns. And then you get statistics, like you get a histogram on that data that describes what data is, like exists in that index. So databases contain data and we describe the data that we want to see with queries, right? So all the stuff that we do to write that query is just a description, right? It is not an order, right? It is not a, like, it does not tell the database how to do anything. It just tells us what the end result should be. The better job you do of describing the results you want, the fewer surprises you’ll have later when results show up. There’s a lot that goes into this. This is a very loaded statement and that’s what the rest of the course is for, to help you describe your results as accurately as possible. There are going to be certain things that your query is going to be at the mercy of as far as database design goes, though, right? There’s a lot that can go on here that can make or break a query. Data types being one of them, the amount of times that I have seen data incorrectly typed in a table, whether it’s a, you know, I mean, it’s usually something else that is stored in a string column, right? So it’s like a date or like an integer, not like a phone number because phone numbers make sense to store as strings because they might have parentheses or dashes or you might call 1-900-ERIC-RULES. It has letters in it. Someone might have like an extension that they put in, right? So there’s like reasons that to not store phone numbers as integers, but there are things that like that should be stored as the correct data type that I often see are not. Other things that can help your queries when they run, if your, if data is unique, either in a column or in some group of columns, some composite key of columns, you should tell the database this. If your data is closely related to data in another table, enough so that it would, it would, it would best be described with a foreign key relationship, you should do that.

If you have some domain knowledge of what is in your tables, where either the, the values that should, that are allowed to exist in there, then you should put constraints on your tables to enforce that. Likewise, if data should never be null, then a default constraint is a very good idea because that way, in most cases, you can get around nulls ever showing up in there. And of course, your, your database will function best when you have useful indexes to help your queries locate the data they care about, have data in the order that they care about, all sorts of things like that.

Now, the actual material, there’s a bunch of awesome stuff in here. And this teaser video, it’s left to the imagination. I know, such a tease. So when we think about logical query processing, there’s fundamentally no difference between the way the queries, like how the way the queries that I’m going to show you are implemented by the optimizer. And it’s really hard to do spell checks in comments, but I’m going to fix this one. So don’t worry about that. So again, your job is to shape the data. And it’s the engine’s job to get the data. Now, because of the way that logical query processing works, you might notice this nice little red squiggle here. That’s because total display names is an alias for this count big aggregate function, which means because this is in the select list, we can’t reference it in the where clause because to SQL Server, the select list hasn’t happened yet, right? Only the, only like we have the from, we have the where, we have the group by, but select and order by have not happened, have not occurred to SQL Server yet. So this query is going to give us an error. All right. So like there’s, there’s no running this one. Now this is an aggregate, so we don’t want to put a where clause on that. That’s where we use having, but let’s turn on actual execution plans.

Now you can either push this button here at the top of SQL Server management studio, this one right here, or you can hit control and M on your keyboard, assuming that you have a rather standard keyboard. And there are two other ways that we could write this query, depending on how we’re feeling.

It’s not going to make a big difference to performance in this case, how these things, how we do it. It’s, but there are going to be physically implemented by SQL Server in the same way, right? So this query where we basically successfully do what we tried to do above, and we use having to filter out rows produced by the grouped table aggregate, right? Where count big is greater than two greater than or equal to 2000, which is what we were looking for in the where clause here that was invalid, or we can add another layer to the query. So the reason why the logical query processing thing is important is because every time you add in a new from that process kind of starts over, right? SQL Server has to like say, oh, well, there’s a from here. What happens in this from?

Oh, there’s a from out here that we’re froming this from and then like you can really from yourself into a really interesting query, the more layers you add to it. But the point here is that this from is the inner scope of the query because we have these lovely parentheses giving this query a little hug.

And this query has its own little alias out here. And then we can refer to everything inside of that inner scope in this outer scope, right? So the from here gives us a different scope of from to get our data from, right? So if we run these two queries and we look at the execution plans, we’re going to get the same results back and we’re going to get the same query plan. Now performance here is no big deal, but they both do roughly the same thing. They both scan the clustered index. They both use a hash aggregate to collapse the rows, right? By display name. And then we have a filter operator.

And that filter operator is going to be the same for both of them, or rather have the same meaning for both of them. In this one, we have this expression 1001 greater than equal to 2000. That’s the count, that’s the having in the count big, because remember, we don’t know which rows are going to have that count that meets our requirement until we’ve done the aggregate. So we have to group all our data, and then we can filter it. But it’s the same thing in here, where we have a predicate on expression 1001 is greater than or equal to 2000. So this filter happens after we’ve collapsed our data down, and we’ve done our count, and we know which grouped rows are going to meet our greater than or equal to 2000 requirement. So there are two ways to write this query, but the way that the query gets physically implemented is the same, right? So again, it’s our job to describe the query, the data that we want, and it’s the engine’s job to go and get that data and come up with a plan to get that data that works according to our logic. So what I want you to understand here is that the results of queries don’t just have to be returned out to some client, right? Like they don’t have to go out to SSMS, they don’t have to go to your application, they don’t have to, you know, end up like, you know, wherever on some web portal screen.

The results of a query can be passed to another scope of that query. You can even put the results of a query into a temporary object and then use something else to talk to that temporary object. But it’s almost sort of like the pipeline concept of things where each from is sort of like its own little pipeline where data comes from here, and you can pipeline it into another from. So you can really build big queries that do exactly what you want. I’m not suggesting from a performance standpoint that big monolithic queries are always a good idea or even ever a good idea. There is a lot of wisdom in writing smaller chunks of queries that do use temporary objects, but we’ll get into that a little bit more detail later. A lot of the performance tuning I do is on big giant monolithic queries that did not feel like maybe worked out well when the database was small and needed the money. And as the database got bigger and money was no object, that query got worse and worse and slower and slower.

So what when you’re writing a query, there are many things that might seem just downright impossible with just writing a select from where maybe some joins and stuff that when you start introducing outer and inner scopes or perhaps using CTE, if not derived tables, they can make writing a query that seems impossible very, very simple and easy. Anyway, thank you for watching.

I hope you enjoyed yourselves. I hope you learned something. And again, this, the link to purchase this content is in the video description. So if you’re interested in buying this course while it’s on sale, you can get to it now. Anyway, thank you for watching. Goodbye.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Learn T-SQL With Erik: Course Introduction Material

Learn T-SQL With Erik: Course Introduction Material


Video Summary

In this video, I introduce my T-SQL course aimed at SQL developers, data scientists, engineers, and analysts looking to enhance their T-SQL skills or those who are just getting started. I walk through the setup process, including the need for SQL Server 2022 Developer Edition and the latest version of Management Studio, as well as how to download and use sample databases like Stack Overflow 2010 or 2013. Additionally, I provide a preview of some introductory material that covers common T-SQL conventions, basic commands, and data storage fundamentals, all designed to help learners get up to speed quickly. This introduction is just the beginning; stay tuned for more in-depth content starting with the basics of the `SELECT FROM WHERE` query structure.

Full Transcript

Erik Darling here with Darling Data, and we’re going to spend some time, you and I, where I’m going to be presenting small portions of the material from my T-SQL course here on YouTube. It is nowhere, going to be nowhere near the full breadth of material that you will find in the paid course, but it will be enough to hopefully lure you in. It will be hopefully a wiggly enough worm to lure you in to purchase it from me so that you can get the the full goodness from it. So of course, no learning experience is complete without a little bit of an introduction to talk about what course you’re taking. You know, like that. The course introduction slide. Beautiful. And of course introducing your instructor. So this is me. This is my very impressive resume. This is my MBA. This is just about a full encapsulation of all the details of my life, at least the important and pertinent ones to what we’re doing here, learning T-SQL together. This course is, of course, aimed at people who are trying to get into or who are practicing, but maybe not so meaningfully practicing their T-SQL skills. So if you’re a SQL developer, whether it is, you know, you’re developing, OLTP applications, or you are developing other types of things, you could be a data scientist, data engineer, data analyst, but it doesn’t matter because this course covers all sorts of query writing stuff. It doesn’t matter if it’s transactional or if it’s like reporting OLAP stuff.

I’m going to hit everything in here. So whatever your job title is, if you work with Microsoft SQL Server with T-SQL, this course is absolutely for you to learn from me. It’s amazing the way that works out, isn’t it? Something for everyone. The goals of the course are to teach you sound fundamentals, right? To make sure that all the little gaps and cracks and whatever you think you know are filled in. To advance your skillset meaningfully, really step you up so that when you’re writing queries or procedures or creating stuff, you do a real good job at it.

You know, you can make very informed choices as to the things, the structure of your database and your queries and everything else. And you will not be to blame when things go terrible. I’m also here to bust some very pervasive SQL myths because there are a lot of them floating around out there. Probably one of the biggest sources of SQL stupidity on the planet is LinkedIn, which is really just a playground for morons to post LLM nonsense in.

Granted, there are plenty of SQL bloggers out there who are incapable of an original thought that an LLM didn’t spit out for them. But we don’t need to pay too much attention to them, do we? Because we can recognize that and we know better than to traffic their websites. So this course, again, just to sort of recap a little bit, is going to be recording the content that will be presented at PASS.

Kendra and I are doing two days of T-SQL courses. So my goal is to have this entire thing cover the material presented there, plus all the stuff that was sort of on the cutting room floor. Right. Like, you know, two days of pre-cons, you’re looking at about six and a half hours of teach time each. This is going to be way more than that. So this is like all that stuff, plus a whole lot of other stuff that probably can’t fit in there.

The course up until it goes live will be available for two hundred and fifty dollars. And then it will go up to five hundred dollars once everything is said and done and complete. Videos will start dropping for this after May 10th.

I realize it says May 10th, but May 10th is Saturday. So it’s really after May 10th. I just ran out of room. So what I want to show you in this video is some of the starting material that like the from the overview material that you will you will get with the with the full course.

Starting off with some instructions about how to get to the sample data. Data and databases and the tools and things that you will need in order to run queries the way that I am showing you. So you will need SQL Server 2022 Developer Edition.

There is no harm in you getting 2022 Developer Edition and practicing there. Considering that SQL Server 2025 will probably be dropping sometime towards the end of this year, probably five, six months from now. It’s not a bad idea to get acquainted with the current newest version of SQL Server.

And of course, the latest version of Management Studio. I am right now using the SSMS 21 preview. That’s how I have this lovely dark mode.

But you don’t necessarily have to do that if you prefer a lighter mode of Management Studio. The demo databases that you can use for either of these are the Stack Overflow 2010 or 2013 database. They are you can download them from, of course, from these links.

The small version expands to about 10 gigs. The larger version expands to about 50 gigs. So depending on the size of your laptop or VM or wherever you’re going to be running these exercises and doing these practice things, that’s where you should do that. You should you should get a database that will fit reasonably into that hardware.

Now, to extract these, you will either want 7-zip or Windows 11 because Windows 11 can work with 7-zip zippy files or whatever. But 7-zip is very good at compacting things down. So that’s why 7-zip gets used here.

When you unzip everything, you will get four data files and one log file. It is not a backup file. So there’s no like restore database command. It is an attach database thing.

So you need to move your data and log files to whatever wherever your SQL Server is installed. And then you will use the attach database dialog to select the first MDF file and it will detect everything else that it needs to put in there. Along with the sort of get you set up instructions, there are a few other files that I thought would be useful to the folks looking at this because, you know, like me knowing T-SQL, I maybe make you or rather anyone who knows T-SQL may make a lot of assumptions about what other people know about T-SQL, but they might not actually know.

You might say some words, then they’re like, I know those are words, but I don’t know what they mean. So there is a file of what I consider to be common T-SQL conventions that I will refer to throughout the course. There will probably be a few things to get added in here once I get to some of the other stuff.

But for now, it covers logical query processing. It covers, you know, temporary tables, table variables, derived tables, just some basic syntax so that when you see query patterns, you understand. Exactly what they are.

And if I say some words that you don’t, you know, you don’t know the meaning of you can, you can refer to this file and you can find exactly what I’m talking about in here. So there is a very good sort of compendium of common terms and expressions and what they mean in here. There’s also some common T-SQL usage stuff that I thought would be useful.

So just sort of teaching like just some basic commands, mathematical operators, comparison operators, stuff like that. Common functions that you might use for things. So all sorts of good stuff in there.

Some newer functions when working with strings. So this stuff over here. And some information about how you can search. Some searching stuff, which is useful.

Some stuff when you’re talking about working with dates, right? All fun things to do in there. Aggregation functions and, of course, window functions as well. There is also this file here on data storage, which is just a rather brief overview of how data is stored in SQL Server.

So there’s kind of a lot of readie text in here for you to go over and absorb, assuming you need it. And then there’s an example database. I don’t use this database in the course of the training.

I just wanted to have a separate sort of set of commands that would show you common SQL commands like create database, use database, creating a schema and granting permissions, creating tables with different things in them, like different constraints and whatnot. And then, of course, how to like add constraints to tables and add indexes to tables. There are some filtered indexes in here.

There’s all sorts of like just basic command things to help you learn fundamentals of creating tables and indexes and whatnot in SQL Server. So there’s some fun stuff in here just to get people acquainted with SQL Server who are not, or rather T-SQL, who are not maybe completely familiar with it. So just, you know, very basic things to help sort of like serve as a reference if you run into anything you don’t understand or you want to learn how to do something that maybe you don’t know the command for off the top of your head.

So this is not meant to be a copy of like the full documentation for everything you can do in a database, just kind of the most common stuff that like you might have to do day to day and the most common terms and things that I’m going to be referring to during the course of me teaching you these things. So that’s the course intro material. Nothing too crazy in there, but I think it’s a pretty good starting place for folks who are completely unaware or rather like getting basic start starting material for T-SQL.

Anyway, that’s that. After this video, we’re going to do some teaser material from the actual like learning portion of it. And I think this is a pretty good starting place for most folks.

Even if you are pretty good with T-SQL, you can probably learn a thing or two from these as well. Anyway, thank you for watching. Hope you enjoyed yourselves.

I hope you learned something. And I will see you in the next video where we will talk about the very starting point of most of your queries, which is the old select from where combo. So we’ll get into that then.

All right. Thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

SQL Server Performance Office Hours Episode 12

SQL Server Performance Office Hours Episode 12


What are your top recommended books on SQL server performance tuning for someone planning to explore this field for the first time?
How are you Eric? I have a stored procedure that reads data from tables with (option recompile). Initially, the execution plan indicated that some tables lacked statistics, displaying a warning sign. After updating the statistics, the warning sign disappeared. However, the procedure always times out on its first run each day, but subsequent calls are successful. Please advise.
I suspect some tables are not getting accurate stats with the default sampling rate. How would you recommend confirming this and finding a better value to set for persisted_sample_percent if I’m right?
If I remove a very large non-clustered index what is the impact on the transaction log?
You seem to default to turning on SORT_IN_TEMPDB. Why?

To ask your questions, head over here.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

PASS Precons and New T-SQL Course Announcement!

PASS Precons and New T-SQL Course Announcement!


Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Is OUTPUT Broken In SQL Server?

Is OUTPUT Broken In SQL Server?


Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

DevOps Devour Hour – New York City Lunch & Learn – May 23, 2025

DevOps Devour Hour – New York City Lunch & Learn – May 23, 2025


Video Summary

In this video, I’m Erik Darling from Darling Data, and I’m excited to share some insider details about an upcoming event that’s going to be a fantastic opportunity for database enthusiasts. On the 23rd of May, I’ll be joining forces with Kendra Little and Steve Jones at Redgate’s New York City Lunch and Learn. We’ll dive into topics ranging from performance tuning to cloud migrations and zero-downtime deployments, making it an absolute must-attend event for anyone looking to enhance their database skills or just network with like-minded professionals. The best part? There’s even a rumor of a happy hour planned after the sessions, so you can dance the night away while continuing the conversation. If you’re in the area—or just want an excuse to visit New York City—this event is definitely worth your time. Don’t miss out; register now by clicking on the link provided in the video description at the very top!

Full Transcript

Erik Darling here with Darling Data. We have a very special promotional video today for an event that I’m going to be at in May, a little bit later in May. It is a Redgate sponsored event, New York City Lunch and Learn. That makes it convenient enough for me. The nice folks at Redgate have been kind enough to give me a reason to live. Well, maybe not that, but at least they’ve given me a reason to leave the house on a Friday. Okay? And that’s good enough for me. It’s going to be at a space called Industrious. That’s at 730 3rd Ave in New York City. It’s on the second floor, so maybe the view won’t be so spectacular outside, but hopefully the event will be absolutely spectacular inside. Now, it’s going to be myself, Kendra Little, and Steve Jones talking about database stuff. I’m going to be doing a session on performance tuning. Kendra Little is going to be doing a session on migrating from one cloud to another from Azure to Amazon, and Steve Jones is going to be doing a session on zero downtime database deployments. So that’s going to be absolutely magical. And then I’ve also heard rumors of a happy hour that will be taking place after the event from 3 o’clock on. So, you know, get your dancing shoes ready.

Again, that’s Friday, May the 23rd. The link to RSVP will be in the video description at the very tippity top. Usually that’s where I reserve my space for my own benefit, but because this is a wonderful Red Gate event, we will be pushing my stuff down a little bit lower and putting the registration link right at the tippity top. So, if you are in New York City for whatever reason or just maybe in the surrounding area and you’re looking for an excuse to get into the city, maybe catch a baseball game or something, then swing on by here beforehand. Maybe get a tax write-off or, you know, be able to convince your boss that it’s for work and it’ll be a grand old time. Anyway, thank you for watching and hopefully see you there.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.