SQL Server Performance Office Hours Episode 13

SQL Server Performance Office Hours Episode 13


What is the effect of having a few queries using Read Committed isolation level where the database is set to RCSI? Will those dumb Read Committed queries block any others?
Do you have any suggestions for optimizing data retrieval using Full Text Search (FullTextMatch TVF)? We’ve tried cutting down the query to reduce the dataset using other parameters before the FTS bit and also breaking out FTS across multiple columns to do one at a time, but anything else? Also, the memory used by FullTextMatch is hard to define. Should we reduce the size of the BufferPool to give more to FTS or the other way around? Thank you for coming to my Ted Talk on FTS.
If I were to want to kill all the table partitions by fire, what brand of flamethrower do you recommend?
Do you know of any documentation that makes it very obvious what Max Server Memory controls? It’s changed across versions. I think it used to just be the buffer pool?
I am writing C# classes and building CLRs for string comparisons, then creating a TVF for execution. The TVF is creating row estimates of exactly 1000x my row actuals (2 rows creates a “2 of 2000” for example). No query hints seem to resolve this, how would you work to control these estimates from my compiled C#?

To ask your questions, head over here.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Learn T-SQL With Erik: INNER vs OUTER JOIN Filtering Logic

Learn T-SQL With Erik: INNER vs OUTER JOIN Filtering Logic


Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Learn T-SQL With Erik: WHERE vs Having

Learn T-SQL With Erik: WHERE vs Having


Video Summary

In this video, I delve into a practical example to explain the differences between using `WHERE` and `HAVING` clauses in SQL Server queries. Erik Darling from Darling Data shares insights on how these clauses can be used effectively or misused when filtering data. By walking through an interesting query involving user display names and reputation scores from the Stack Overflow database, I demonstrate why applying a `WHERE` clause might not always yield the desired results and introduce conditional aggregates as a solution to preserve the integrity of grouped data while still filtering out unwanted rows. This video is a sneak peek into my upcoming T-SQL course, which will be available at a presale price of $250 until it launches in full, priced at $500. If you’re interested in learning more advanced SQL techniques and strategies from experienced instructors like myself and Kendra, make sure to check out the link in the video description for early access.

Full Transcript

Erik Darling here with Darling Data and today’s video is going to be another little sort of teaser-oo for my upcoming T-SQL course. If you are interested in getting in on the presale price for this course, link down in the video description to do that. It is currently $250. When the course is all completed, it will be going up to $500. And of course, this course, of course, of course, of course, is a horse. This course material is going to be $500. This course material is going to be in large part a representation of the material that Kendra and I are presenting at our pre-cons at PASS Data Community Summit in November. So, like the stuff that you learn there will be the stuff that is in this course. And if you attend our pre-cons in Seattle, I will give you access to this material as part of that ticket. So, you will get it for free if you show up there. If you’re not going to be there, well, the link to purchase it is below. There are no further discounts on this. So, don’t ask. It’s a lot of work. But in today’s video, we’re going to talk about where versus having a little bit. And because they’re both ways of filtering data, right? So, like you have the where clause, which filters out rows from query results, right? It applies a filter and it says you rose, you made it or you didn’t make it.

And then you have the having clause. And the having clause introduces filtering on a grouped table. And by grouped table, I mean you have a query with a group by expression in it. And you might want to figure out if the result of some aggregate on that group table meets some criteria. So, an easy way to show you that is to use this first query here. We’ll just wait for ZoomIt to wake up. ZoomIt apparently is just taking a very long nap and doesn’t want to do anything for me here.

So, this is an example of a grouped query. We have a group by on the display name column, right? We are also selecting the display name column. But we’re asking SQL Server to filter out the results by only display names. Now, this is a sort of a funny quirk about the Stack Overflow database and it’s kind of an amusing query because display names are non-unique.

Users can, multiple users can have the same display name. There are even two John Skeets in the Stack Overflow database for some strange reason. But it’s just sort of, so it’s sort of a funny query where we’re just taking all the display names and we are looking for that average reputation, average downvotes, average upvotes, and then the count of total users that have that display name. And we’re sort of putting them on teams together, right? So, like team display name, whatever your display name is.

So, if your display name has more than 2,000 participants, more than 2,000 team members, that’s where we can start caring about your numbers for these. So, this is our original query and ZoomIt has once again gone to sleep. So, if we run this, we’ll get some results back.

And let’s pretend that after running this query, we have sort of a practical problem. We have these nine rows, right? The nine rows is sort of important to the problem we’re going to solve. But the thing that our teams are mad about is the fact that their reputations, their average reputation, is being brought low by teammates with low reputation.

People with a reputation of, let’s say, one. Let’s say these low reputation users are really screwing things up. So, what we might be tempted to do is apply a where clause additionally to this query, right?

We’re still going to, we still need to figure out who has a total team size of greater than or equal to 2,000. But now we’re just going to say, well, we’d only, we’re going to screen out all the people with a reputation of one. And hopefully, this will make our, this will bring up the averages for these teams, right?

Because we want to make sure that these teams still have good averages. The problem we run into is that this significantly cuts down on the results. We only have five rows now.

We actually, we actually destroyed four teams by getting rid of, by filtering out reputation equals one users. These are the original results. So, we’re missing Matt, we’re missing Mark.

I don’t know. This is like weird, like apostle names or something. But like, you know, the averages did go up for these, right? Like these score, these average reputations are higher, but we no longer have this full list of teams.

And that’s not a very good sign, is it? We have done something wrong with our query. And we need to figure out a way to run this query and get the same nine rows back.

But without these reputations suffering, right? Like these reputations, sorry, these reputations suffering. These reputations have already suffered, right?

And granted, like the average downvotes and upvotes are probably a bit different too. But the total users is different because we have screened out all these reputation equals one people. So, the where cloud is the wrong way to do this.

Now, there’s one way that you can write this query to give you the, to preserve the nine rows, but still not consider the low reputation equals one people, is to write a conditional aggregate. Now, if we, I’m going to zoom in on this.

And I have, I have, I have formatted this in a way to make it hopefully very clear what we’re doing here. Like this average downvotes is very easy to figure out. It’s just an average on downvotes.

So, I’ve, I’ve spread this out a bit to make it very clear what we’re doing. And what we have in here is a case expression. Case expressions are wonderful things, right?

You can do all sorts of fun stuff with them. And one thing that you can do with them is stick them in an aggregate like this. So, what we’re going to say is if your reputation equals one, then we are going to, we’re going to substitute that value with a null, right?

So, average isn’t a, null isn’t a value. So, you can’t average nulls, right? It’s not a, not an averageable thing. And if your rep, so that’s that.

And then our bailout on the case expression is if your reputation is not equal to one, or rather higher than one, right? Because there’s no, no reputation lower than one in the Stack Overflow database. One is as low as you can go.

There’s not a check constraint on that in the 2013 copy. But in real life, your reputation can’t go below one, right? Doesn’t matter, doesn’t matter how bad you screw up, your reputation doesn’t go below one.

I guess otherwise it would be like, almost like the opposite of the high reputation contest, sort of the Stack Overflow thing, sort of gamifies for people answering questions and stuff. You’re able to say, I want to see how low I can get this.

I don’t know how many bad answers. Like, actually like rewarding bad behavior if they did that. So, if we run this query now, we will get, so like, so just to like make sure that we’re on the same page here. This query had the where clause where reputation is greater than one in it.

I’ve taken that out of this one, right? We have no where clause here. All we have in here is the having clause. So, say, we’re grouped by as, sorry, we’re count big as greater than or equal to 200, greater than or equal to 2,000 after we group by display name, right?

And if we run this, we will get our same nine rows back, but our reputation, right? Our reputation will be okay, right? So, all of the numbers, all of the other numbers are going to match up.

There’s not a great way to sync this up, but you can see that the numbers up here and the numbers in the upvotes and downvotes columns, these all match up nicely, right? These are all the same, but the numbers in our reputation column, our average reputation column, these are all, these have gone up.

Now, the order is a little bit different here. The reason why the order is a little bit different here is because once we’ve screened out the reputation equals one people, some of the averages have sort of changed places a little bit, right?

Like, there’s a little bit of difference in the order of these, where if we compare one to the other, Matt is still on top, Alex is still second, David is still third, Mark is still fourth, but Chris and Michael have switched places, and I think actually that’s about it there.

Yeah. So, this, like, that did change the scores a little bit, because the average reputations switched around because of, like, some display names had way, had more reputation equals one people that got, where the reputation got replaced with null, so that the scores, like, the order changed a little bit, so someone might be mad in there, someone might want to, I don’t know, start kicking some reputation one people off their team, maybe.

But this is one way to get around the problems you might run into when you are trying to sort of manipulate data that is being grouped, and stuff like the where clause and the having clause is not an appropriate way to filter things out. Sometimes you do need to filter out rows with the where clause, and sometimes you do need to figure out, filter out rows from the group table with the having clause.

Because other times, you can apply a little bit of filtering inside of your aggregates in order to change what actually gets aggregated in there. So, that’s about it for this one.

In the actual course material, there is, of course, a lot more other stuff that goes along with this lesson, but this is just sort of one of the more practical examples of what the things that I’m going to be teaching you. If you like this, again, you can buy it down there.

The videos will, the actual full-length course videos will start dropping soon. They’re out in tech review right now, kind of getting spruced up and fixed up. And, you know, I am like rehearsing them and stuff to make sure that my delivery is sound and solid for when I give these to you.

But they will be dropping very shortly. And I hope you actually buy them and watch them and like them and all that stuff. So, thank you for watching.

I hope you enjoyed yourselves. I hope you learned something. And I will see you in the next video where we will talk about some more of the course material. So, anyway, that’s about it for me. Goodbye.

Goodbye. Goodbye.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Learn T-SQL With Erik: SELECT FROM WHERE?

Learn T-SQL With Erik: SELECT FROM WHERE?


Video Summary

In this video, I delve into the fundamentals of writing SQL queries by focusing on how to select from a table using a WHERE clause—a critical skill for any data professional. As part of my Learn T-SQL with Erik course, which is currently in its early stages, I provide an essential introduction that ensures viewers understand the basics before diving deeper. The video covers key concepts such as the difference between writing and executing queries, explaining why certain elements like aliases can’t be referenced in specific parts of a query due to how SQL Server processes them. This teaser gives a taste of what’s to come, setting the stage for more detailed exploration into query optimization, logical processing, and effective database design strategies that will empower you to write better T-SQL code. The link to purchase this video during its pre-sale phase is in the description; by buying now, you can be among the first to access all the comprehensive content as it unfolds.

Full Transcript

Erik Darling here with Darling Data, and today’s video is going to be a small amount of teaser material from my Learn T-SQL with Erik course from the first module where I walk through the basics of selecting from a table with a where clause. Fun stuff, but we all have to begin somewhere, and this is about as close to the beginning as we can. as you can get. And this course does need to make sure that everyone, by the time we get to the end, knows absolutely everything about writing good queries. So that’s where we’re going to start. The link to purchase this video during the pre-sale era is down in the video description, so if you would like to purchase this video, you would like to see the full course content when and as it becomes available. You can purchase now and you can be on the ground floor for when all of the good stuff, the full course material starts getting presented. Again, this is just a small chunk of it. So the first thing to understand, and it’s a fundamental concept for understanding why some queries work and some queries don’t, and why you have to write some queries differently.

from other queries is that the way that you sit down and write a query is a bit different from the way that SQL Server goes and interprets that query when it gets executed. You, of course, start with your select, and there’s a whole bunch of stuff that you can do with the select clause. Then, you know, you can, you know, you can, well, I guess, semi-optionally have a from clause, and there’s a whole bunch of stuff that you can do within your from clause. And then you can have your where clause and your group by, and you’re, and if you have group by, you might even add in a having clause.

And then you have your order by. Now the presentation order by down here is very important because without this presentation order by, we do not have any guaranteed result ordering. This is a problem that I find in many client queries. And this is a question that I see people asking to this day where it’s something like, if every time I run this query, it returns data in the same order. So, do I always need an order by? The answer is, well, yes. If you want that data to always, if you want that query to always return that data in that order, without an order, without presentation order by, might not always do that.

I do talk about why more in depth in some other modules, but for now, just know that that is the case. And one should not rely purely on a small handful of maybe observed behaviors, because those observed behaviors may change as your database changes. And of course, the way that SQL Server starts putting your queries together for execution, it is slightly different.

SQL Server starts with the from, moves on to the where, moves on to the group by, moves on to the having, then down towards the end, it finally gets to select and then order by. And then, well, I did, I did leave this out the first time, but because I hope to God, you never actually have to work with XML or JSON. But if for some reason you do just know that, that happens way at the end. SQL Server is like, I’ll deal with you last.

You mess. All right. But yeah, so SQL Server, the way that it interprets a query is different from the way that your query is written. That’s why there are some elements of your query that are not going to be eligible or accessible in other parts of the query. By that, pretty specifically, I mean that, you know, like if you have some sort of expression in your select list, you can have it in your order by, but you like you can access that alias in your order by, but you can’t access that alias in your where clause or your group by or your having.

Like, I think Oracle recently introduced something where you can now reference aliases in the group by, which I think is very cool because it saves you a lot of like, like perhaps like repasting some complex expressions in your select list in your group by clause. But the main point of writing queries is we tell our database the stuff that we want. And it’s the database’s job to go and give us that stuff, go get it for us.

So the select list tells the database which columns we want. The from clause tells the database which tables those columns are in. The where clause tells the database which rows to include or exclude in the query.

Group by will tell us which rows to produce aggregates with. And the order by tells the engine which way to sort results. In other words, your job is to shape the result set and it’s the database’s job to retrieve it.

So if I were looking to expand the darling data empire and I were saying, hey, database of all these cool people who, you know, show up to events or take my training. All of these wonderful, wonderful, brilliant, beautiful, bright attendees. If I were to say, hey, I need to make some money.

I need some need to put some change in my pocket. I would want to tell the database to give me company contact information for 100 people. Right. You want to send out that big email.

Right. You don’t want to send out a bunch of individual emails. That’s a big, not very, not very wise. I find who primarily use SQL Server who have the highest consulting budgets. And that query would look a little bit like this.

Or I select the top 100. Right. Just limit it to those 100 results. Now SQL Server uses top. Other database engines might use limit.

Some of them might only use offset fetch. But SQL Server has this wonderful proprietary top command. Not sure what Microsoft was coding for there, but we have it anyway. And then I would want to get some company names and some contact names and some email addresses.

And I’d probably actually want to see a number on the consulting budget before I decided what to do here. Because without seeing that, you might sort by the top 100 descending. And you might find someone who has like a 50 cent consulting budget.

And well, that’s just not enough change for the pockets these days, is it? Not in this economy. Not in this economy.

Not in this economy. But then I would limit the results to people who primarily use SQL Server. And my presentation order by. Right, you don’t put the word presentation in there. Just the order by is sufficient.

Then I would order by consulting budget descending. And that would be the data that I want, shape the way I want it, and the order that I want it in. Queries are really just descriptions of the things that you want to see.

It’s sort of like how indexes contain data, right? You create an index and the data that you, the way that you define that index defines what data lives in that index, whether it’s key columns or included columns. And then you get statistics, like you get a histogram on that data that describes what data is, like exists in that index. So databases contain data and we describe the data that we want to see with queries, right? So all the stuff that we do to write that query is just a description, right? It is not an order, right? It is not a, like, it does not tell the database how to do anything. It just tells us what the end result should be. The better job you do of describing the results you want, the fewer surprises you’ll have later when results show up. There’s a lot that goes into this. This is a very loaded statement and that’s what the rest of the course is for, to help you describe your results as accurately as possible. There are going to be certain things that your query is going to be at the mercy of as far as database design goes, though, right? There’s a lot that can go on here that can make or break a query. Data types being one of them, the amount of times that I have seen data incorrectly typed in a table, whether it’s a, you know, I mean, it’s usually something else that is stored in a string column, right? So it’s like a date or like an integer, not like a phone number because phone numbers make sense to store as strings because they might have parentheses or dashes or you might call 1-900-ERIC-RULES. It has letters in it. Someone might have like an extension that they put in, right? So there’s like reasons that to not store phone numbers as integers, but there are things that like that should be stored as the correct data type that I often see are not. Other things that can help your queries when they run, if your, if data is unique, either in a column or in some group of columns, some composite key of columns, you should tell the database this. If your data is closely related to data in another table, enough so that it would, it would, it would best be described with a foreign key relationship, you should do that.

If you have some domain knowledge of what is in your tables, where either the, the values that should, that are allowed to exist in there, then you should put constraints on your tables to enforce that. Likewise, if data should never be null, then a default constraint is a very good idea because that way, in most cases, you can get around nulls ever showing up in there. And of course, your, your database will function best when you have useful indexes to help your queries locate the data they care about, have data in the order that they care about, all sorts of things like that.

Now, the actual material, there’s a bunch of awesome stuff in here. And this teaser video, it’s left to the imagination. I know, such a tease. So when we think about logical query processing, there’s fundamentally no difference between the way the queries, like how the way the queries that I’m going to show you are implemented by the optimizer. And it’s really hard to do spell checks in comments, but I’m going to fix this one. So don’t worry about that. So again, your job is to shape the data. And it’s the engine’s job to get the data. Now, because of the way that logical query processing works, you might notice this nice little red squiggle here. That’s because total display names is an alias for this count big aggregate function, which means because this is in the select list, we can’t reference it in the where clause because to SQL Server, the select list hasn’t happened yet, right? Only the, only like we have the from, we have the where, we have the group by, but select and order by have not happened, have not occurred to SQL Server yet. So this query is going to give us an error. All right. So like there’s, there’s no running this one. Now this is an aggregate, so we don’t want to put a where clause on that. That’s where we use having, but let’s turn on actual execution plans.

Now you can either push this button here at the top of SQL Server management studio, this one right here, or you can hit control and M on your keyboard, assuming that you have a rather standard keyboard. And there are two other ways that we could write this query, depending on how we’re feeling.

It’s not going to make a big difference to performance in this case, how these things, how we do it. It’s, but there are going to be physically implemented by SQL Server in the same way, right? So this query where we basically successfully do what we tried to do above, and we use having to filter out rows produced by the grouped table aggregate, right? Where count big is greater than two greater than or equal to 2000, which is what we were looking for in the where clause here that was invalid, or we can add another layer to the query. So the reason why the logical query processing thing is important is because every time you add in a new from that process kind of starts over, right? SQL Server has to like say, oh, well, there’s a from here. What happens in this from?

Oh, there’s a from out here that we’re froming this from and then like you can really from yourself into a really interesting query, the more layers you add to it. But the point here is that this from is the inner scope of the query because we have these lovely parentheses giving this query a little hug.

And this query has its own little alias out here. And then we can refer to everything inside of that inner scope in this outer scope, right? So the from here gives us a different scope of from to get our data from, right? So if we run these two queries and we look at the execution plans, we’re going to get the same results back and we’re going to get the same query plan. Now performance here is no big deal, but they both do roughly the same thing. They both scan the clustered index. They both use a hash aggregate to collapse the rows, right? By display name. And then we have a filter operator.

And that filter operator is going to be the same for both of them, or rather have the same meaning for both of them. In this one, we have this expression 1001 greater than equal to 2000. That’s the count, that’s the having in the count big, because remember, we don’t know which rows are going to have that count that meets our requirement until we’ve done the aggregate. So we have to group all our data, and then we can filter it. But it’s the same thing in here, where we have a predicate on expression 1001 is greater than or equal to 2000. So this filter happens after we’ve collapsed our data down, and we’ve done our count, and we know which grouped rows are going to meet our greater than or equal to 2000 requirement. So there are two ways to write this query, but the way that the query gets physically implemented is the same, right? So again, it’s our job to describe the query, the data that we want, and it’s the engine’s job to go and get that data and come up with a plan to get that data that works according to our logic. So what I want you to understand here is that the results of queries don’t just have to be returned out to some client, right? Like they don’t have to go out to SSMS, they don’t have to go to your application, they don’t have to, you know, end up like, you know, wherever on some web portal screen.

The results of a query can be passed to another scope of that query. You can even put the results of a query into a temporary object and then use something else to talk to that temporary object. But it’s almost sort of like the pipeline concept of things where each from is sort of like its own little pipeline where data comes from here, and you can pipeline it into another from. So you can really build big queries that do exactly what you want. I’m not suggesting from a performance standpoint that big monolithic queries are always a good idea or even ever a good idea. There is a lot of wisdom in writing smaller chunks of queries that do use temporary objects, but we’ll get into that a little bit more detail later. A lot of the performance tuning I do is on big giant monolithic queries that did not feel like maybe worked out well when the database was small and needed the money. And as the database got bigger and money was no object, that query got worse and worse and slower and slower.

So what when you’re writing a query, there are many things that might seem just downright impossible with just writing a select from where maybe some joins and stuff that when you start introducing outer and inner scopes or perhaps using CTE, if not derived tables, they can make writing a query that seems impossible very, very simple and easy. Anyway, thank you for watching.

I hope you enjoyed yourselves. I hope you learned something. And again, this, the link to purchase this content is in the video description. So if you’re interested in buying this course while it’s on sale, you can get to it now. Anyway, thank you for watching. Goodbye.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Learn T-SQL With Erik: Course Introduction Material

Learn T-SQL With Erik: Course Introduction Material


Video Summary

In this video, I introduce my T-SQL course aimed at SQL developers, data scientists, engineers, and analysts looking to enhance their T-SQL skills or those who are just getting started. I walk through the setup process, including the need for SQL Server 2022 Developer Edition and the latest version of Management Studio, as well as how to download and use sample databases like Stack Overflow 2010 or 2013. Additionally, I provide a preview of some introductory material that covers common T-SQL conventions, basic commands, and data storage fundamentals, all designed to help learners get up to speed quickly. This introduction is just the beginning; stay tuned for more in-depth content starting with the basics of the `SELECT FROM WHERE` query structure.

Full Transcript

Erik Darling here with Darling Data, and we’re going to spend some time, you and I, where I’m going to be presenting small portions of the material from my T-SQL course here on YouTube. It is nowhere, going to be nowhere near the full breadth of material that you will find in the paid course, but it will be enough to hopefully lure you in. It will be hopefully a wiggly enough worm to lure you in to purchase it from me so that you can get the the full goodness from it. So of course, no learning experience is complete without a little bit of an introduction to talk about what course you’re taking. You know, like that. The course introduction slide. Beautiful. And of course introducing your instructor. So this is me. This is my very impressive resume. This is my MBA. This is just about a full encapsulation of all the details of my life, at least the important and pertinent ones to what we’re doing here, learning T-SQL together. This course is, of course, aimed at people who are trying to get into or who are practicing, but maybe not so meaningfully practicing their T-SQL skills. So if you’re a SQL developer, whether it is, you know, you’re developing, OLTP applications, or you are developing other types of things, you could be a data scientist, data engineer, data analyst, but it doesn’t matter because this course covers all sorts of query writing stuff. It doesn’t matter if it’s transactional or if it’s like reporting OLAP stuff.

I’m going to hit everything in here. So whatever your job title is, if you work with Microsoft SQL Server with T-SQL, this course is absolutely for you to learn from me. It’s amazing the way that works out, isn’t it? Something for everyone. The goals of the course are to teach you sound fundamentals, right? To make sure that all the little gaps and cracks and whatever you think you know are filled in. To advance your skillset meaningfully, really step you up so that when you’re writing queries or procedures or creating stuff, you do a real good job at it.

You know, you can make very informed choices as to the things, the structure of your database and your queries and everything else. And you will not be to blame when things go terrible. I’m also here to bust some very pervasive SQL myths because there are a lot of them floating around out there. Probably one of the biggest sources of SQL stupidity on the planet is LinkedIn, which is really just a playground for morons to post LLM nonsense in.

Granted, there are plenty of SQL bloggers out there who are incapable of an original thought that an LLM didn’t spit out for them. But we don’t need to pay too much attention to them, do we? Because we can recognize that and we know better than to traffic their websites. So this course, again, just to sort of recap a little bit, is going to be recording the content that will be presented at PASS.

Kendra and I are doing two days of T-SQL courses. So my goal is to have this entire thing cover the material presented there, plus all the stuff that was sort of on the cutting room floor. Right. Like, you know, two days of pre-cons, you’re looking at about six and a half hours of teach time each. This is going to be way more than that. So this is like all that stuff, plus a whole lot of other stuff that probably can’t fit in there.

The course up until it goes live will be available for two hundred and fifty dollars. And then it will go up to five hundred dollars once everything is said and done and complete. Videos will start dropping for this after May 10th.

I realize it says May 10th, but May 10th is Saturday. So it’s really after May 10th. I just ran out of room. So what I want to show you in this video is some of the starting material that like the from the overview material that you will you will get with the with the full course.

Starting off with some instructions about how to get to the sample data. Data and databases and the tools and things that you will need in order to run queries the way that I am showing you. So you will need SQL Server 2022 Developer Edition.

There is no harm in you getting 2022 Developer Edition and practicing there. Considering that SQL Server 2025 will probably be dropping sometime towards the end of this year, probably five, six months from now. It’s not a bad idea to get acquainted with the current newest version of SQL Server.

And of course, the latest version of Management Studio. I am right now using the SSMS 21 preview. That’s how I have this lovely dark mode.

But you don’t necessarily have to do that if you prefer a lighter mode of Management Studio. The demo databases that you can use for either of these are the Stack Overflow 2010 or 2013 database. They are you can download them from, of course, from these links.

The small version expands to about 10 gigs. The larger version expands to about 50 gigs. So depending on the size of your laptop or VM or wherever you’re going to be running these exercises and doing these practice things, that’s where you should do that. You should you should get a database that will fit reasonably into that hardware.

Now, to extract these, you will either want 7-zip or Windows 11 because Windows 11 can work with 7-zip zippy files or whatever. But 7-zip is very good at compacting things down. So that’s why 7-zip gets used here.

When you unzip everything, you will get four data files and one log file. It is not a backup file. So there’s no like restore database command. It is an attach database thing.

So you need to move your data and log files to whatever wherever your SQL Server is installed. And then you will use the attach database dialog to select the first MDF file and it will detect everything else that it needs to put in there. Along with the sort of get you set up instructions, there are a few other files that I thought would be useful to the folks looking at this because, you know, like me knowing T-SQL, I maybe make you or rather anyone who knows T-SQL may make a lot of assumptions about what other people know about T-SQL, but they might not actually know.

You might say some words, then they’re like, I know those are words, but I don’t know what they mean. So there is a file of what I consider to be common T-SQL conventions that I will refer to throughout the course. There will probably be a few things to get added in here once I get to some of the other stuff.

But for now, it covers logical query processing. It covers, you know, temporary tables, table variables, derived tables, just some basic syntax so that when you see query patterns, you understand. Exactly what they are.

And if I say some words that you don’t, you know, you don’t know the meaning of you can, you can refer to this file and you can find exactly what I’m talking about in here. So there is a very good sort of compendium of common terms and expressions and what they mean in here. There’s also some common T-SQL usage stuff that I thought would be useful.

So just sort of teaching like just some basic commands, mathematical operators, comparison operators, stuff like that. Common functions that you might use for things. So all sorts of good stuff in there.

Some newer functions when working with strings. So this stuff over here. And some information about how you can search. Some searching stuff, which is useful.

Some stuff when you’re talking about working with dates, right? All fun things to do in there. Aggregation functions and, of course, window functions as well. There is also this file here on data storage, which is just a rather brief overview of how data is stored in SQL Server.

So there’s kind of a lot of readie text in here for you to go over and absorb, assuming you need it. And then there’s an example database. I don’t use this database in the course of the training.

I just wanted to have a separate sort of set of commands that would show you common SQL commands like create database, use database, creating a schema and granting permissions, creating tables with different things in them, like different constraints and whatnot. And then, of course, how to like add constraints to tables and add indexes to tables. There are some filtered indexes in here.

There’s all sorts of like just basic command things to help you learn fundamentals of creating tables and indexes and whatnot in SQL Server. So there’s some fun stuff in here just to get people acquainted with SQL Server who are not, or rather T-SQL, who are not maybe completely familiar with it. So just, you know, very basic things to help sort of like serve as a reference if you run into anything you don’t understand or you want to learn how to do something that maybe you don’t know the command for off the top of your head.

So this is not meant to be a copy of like the full documentation for everything you can do in a database, just kind of the most common stuff that like you might have to do day to day and the most common terms and things that I’m going to be referring to during the course of me teaching you these things. So that’s the course intro material. Nothing too crazy in there, but I think it’s a pretty good starting place for folks who are completely unaware or rather like getting basic start starting material for T-SQL.

Anyway, that’s that. After this video, we’re going to do some teaser material from the actual like learning portion of it. And I think this is a pretty good starting place for most folks.

Even if you are pretty good with T-SQL, you can probably learn a thing or two from these as well. Anyway, thank you for watching. Hope you enjoyed yourselves.

I hope you learned something. And I will see you in the next video where we will talk about the very starting point of most of your queries, which is the old select from where combo. So we’ll get into that then.

All right. Thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

SQL Server Performance Office Hours Episode 12

SQL Server Performance Office Hours Episode 12


What are your top recommended books on SQL server performance tuning for someone planning to explore this field for the first time?
How are you Eric? I have a stored procedure that reads data from tables with (option recompile). Initially, the execution plan indicated that some tables lacked statistics, displaying a warning sign. After updating the statistics, the warning sign disappeared. However, the procedure always times out on its first run each day, but subsequent calls are successful. Please advise.
I suspect some tables are not getting accurate stats with the default sampling rate. How would you recommend confirming this and finding a better value to set for persisted_sample_percent if I’m right?
If I remove a very large non-clustered index what is the impact on the transaction log?
You seem to default to turning on SORT_IN_TEMPDB. Why?

To ask your questions, head over here.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

PASS Precons and New T-SQL Course Announcement!

PASS Precons and New T-SQL Course Announcement!


Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Is OUTPUT Broken In SQL Server?

Is OUTPUT Broken In SQL Server?


Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

DevOps Devour Hour – New York City Lunch & Learn – May 23, 2025

DevOps Devour Hour – New York City Lunch & Learn – May 23, 2025


Video Summary

In this video, I’m Erik Darling from Darling Data, and I’m excited to share some insider details about an upcoming event that’s going to be a fantastic opportunity for database enthusiasts. On the 23rd of May, I’ll be joining forces with Kendra Little and Steve Jones at Redgate’s New York City Lunch and Learn. We’ll dive into topics ranging from performance tuning to cloud migrations and zero-downtime deployments, making it an absolute must-attend event for anyone looking to enhance their database skills or just network with like-minded professionals. The best part? There’s even a rumor of a happy hour planned after the sessions, so you can dance the night away while continuing the conversation. If you’re in the area—or just want an excuse to visit New York City—this event is definitely worth your time. Don’t miss out; register now by clicking on the link provided in the video description at the very top!

Full Transcript

Erik Darling here with Darling Data. We have a very special promotional video today for an event that I’m going to be at in May, a little bit later in May. It is a Redgate sponsored event, New York City Lunch and Learn. That makes it convenient enough for me. The nice folks at Redgate have been kind enough to give me a reason to live. Well, maybe not that, but at least they’ve given me a reason to leave the house on a Friday. Okay? And that’s good enough for me. It’s going to be at a space called Industrious. That’s at 730 3rd Ave in New York City. It’s on the second floor, so maybe the view won’t be so spectacular outside, but hopefully the event will be absolutely spectacular inside. Now, it’s going to be myself, Kendra Little, and Steve Jones talking about database stuff. I’m going to be doing a session on performance tuning. Kendra Little is going to be doing a session on migrating from one cloud to another from Azure to Amazon, and Steve Jones is going to be doing a session on zero downtime database deployments. So that’s going to be absolutely magical. And then I’ve also heard rumors of a happy hour that will be taking place after the event from 3 o’clock on. So, you know, get your dancing shoes ready.

Again, that’s Friday, May the 23rd. The link to RSVP will be in the video description at the very tippity top. Usually that’s where I reserve my space for my own benefit, but because this is a wonderful Red Gate event, we will be pushing my stuff down a little bit lower and putting the registration link right at the tippity top. So, if you are in New York City for whatever reason or just maybe in the surrounding area and you’re looking for an excuse to get into the city, maybe catch a baseball game or something, then swing on by here beforehand. Maybe get a tax write-off or, you know, be able to convince your boss that it’s for work and it’ll be a grand old time. Anyway, thank you for watching and hopefully see you there.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

A SQL Server Query Plan Answer

A SQL Server Query Plan Answer


Video Summary

In this video, I dive into a fascinating riddle involving query plans and SQL Server performance. After posing the question in yesterday’s video, I explore the mystery of why removing a column from the SELECT list causes an error when using a hash join hint. I walk you through comparing estimated plans with and without the hint, highlighting the differences between nested loops joins and hash joins. By experimenting with scalar subqueries and correlated columns, we uncover some peculiar behavior related to implied predicates and query optimization. This video is packed with insights for SQL Server enthusiasts, whether you’re a seasoned professional or just starting your journey. If you enjoyed this deep dive into query plan mysteries, consider supporting my channel by becoming a member or simply liking, commenting, and subscribing—every bit helps!

Full Transcript

Erik Darling here with Darling Data. If you watched yesterday’s video, you know that I asked you a riddle. Since I’m recording this way ahead of time, like unbelievably far ahead of time, I have no idea if anyone answered, came up with anything good. If you did, good job. If you didn’t, shame on you. Anyway, let’s talk a little bit about channel stuff and lifestyle. just get it all out of our system. And then I’m gonna tell you what happened with that query plan. So if you love answers or riddles, you can sign up for a membership and you can support all my efforts to bring you interesting SQL Server content. Whatever. There’s a link down in the video description for that. If you don’t have money for whatever reason, or you don’t feel like spending it, I don’t know, whatever Scrooge. You can like, you can comment, you can subscribe. And if you want to ask me questions for my Office Hours episodes, you can click on that link, which will be also down in the video description, right where it belongs. I do consulting. I am a consultant from the Latin word console, which means a tiny lap infant or something. But that’s what I do for money, mostly because YouTube, doesn’t doesn’t pay much of a bill. If you need help with your SQL Server, perhaps you need answers to some riddles. There’s something weird on my forehead. I can never tell. I am available for hire you. You hire me with money, health checks, performance analysis, hands on tuning, dealing with performance emergencies, and of course, keeping your putting your developers in line so that you don’t have performance emergencies anymore. That’s the bulk of my activities. But if you need something else, well, we can talk about it.

And gosh darn it, my rates are reasonable. If you would like some training on SQL Server, I get that too. Boy, do I have it. All the important stuff, all the best stuff. I don’t leave out all the fluff that a lot of other people put in. You get all 24 hours of it for about 150 US dollars. And that is good for life. Por vida, as they say, in various places around the world. I’ve been to some of them, I think. Upcoming events, sequels, Saturday, New York City, ever closer, closer by the day, isn’t it? Taking place on May the 10th of 2025 with a performance tuning pre-con by Andreas Valter on May the 9th. That is the day before, that is Friday. I highly suggest tuning into both of those. The nice folks at Red Gate are taking pass on tour. They are coming to my fine city in August, August 18th to 20th. Someone else’s fine city, Dallas, September 15th to 17th.

And yet another person’s fine city, Amsterdam, August 1st to 3rd. And then Pass Data Community Summit will be taking place in Seattle, November 17th to 21st. So if you live in Seattle, it’ll be your fine city. If not, no, you should show up anyway, maybe. A lot of people just make Seattle their own, from what I hear. But with that out of the way, let’s talk about these strange and dramatic planes.

So the riddle from yesterday, of course, is that when we run this query, as is, it takes about four or five seconds. The performance of it really isn’t the point. That’s not really the riddle here. But the riddle is that it runs and completes successfully, and we get an answer back with a query plan and all that good stuff.

When we try to run that same query with owner user ID removed from the select list and do this, we get an error that the query processor could not produce a query plan because of the hint supplied to the query. So what are we looking for here?

Well, we’ve got a hash join hint on the query plan. All right. Well, that’s, that’s, that’s, that much is obvious. So what, if, if we were to start comparing things here, what we might want to do is get estimated plans for both of these without, without the hash join hint, obviously, because that prevents us from getting a query plan.

Ah, oh, that hurts. Hurts something fierce. So let’s, let’s highlight these.

That took a long time for some reason. That scroll was like ages. So let’s, let’s get estimated plans for these and let’s see what happens. So, um, obviously, this little grabby doohickey will cooperate. Usually it’s Zoomit.

Zoomit is about to not cooperate. Now, before it was the drivey doohickey. The plans are, of course, different. If we sort of frame this up about here, that’s good enough for now. Uh, this, this plan, of course, because of the hash join hint, uses the hash join here.

Uh, and it uses, well, you can see, you can see the, what is another hash join, like, here. But, you know, um, it’s a little hidden from us. I didn’t frame this perfectly.

Uh, and then down here, we have a nested loops join. And, oh, screw it. Let’s just go look. Uh, so here is the second hash join, the first plan, uh, users table. But we have another nested loops join here.

All right. So, there’s, like, obviously the join types are different. But why? And why can’t we use a hash join in this plan when we can in this one? Well, if you were to experiment with these queries, with this query a little bit, you know, we don’t need to experiment with both of them, just this one.

You know, um, like, obviously we could see if the problem is with the comments table, but then we’d have to do a lot more work. So, let’s, let’s, let’s start with the path of least resistance. And let’s quote out this, this, this here scalar subquery.

And we’re not going to use block quotes here because we just need to do this temporarily. Block quotes are forever. The double, double line quotes are for temporary, uh, temporary measures. And now let’s try to get the estimated plan with the hash join.

Well, that, that worked, didn’t it? So, the problem isn’t with this hash join here. The problem is with, uh, the, the, the, the join to the users table. Oops.

That was supposed to be control Z, not capital Z. Uh, join this. The problem is the join to the users table that, uh, gets the display name. But why would that be still? Why?

What, what happens? What is different between these? Well, uh, we have to dig in a little bit here. And we’re going to, I’m going to show you why. And then I’m going to show you some, some, that there is some background knowledge on this, uh, out there on the internet.

If you know, if you, if you know exactly where to look and Microsoft hasn’t ruined it by archiving the, the content yet. But, um, let’s examine the, the, the, the top plan first.

The one that works with the hash joins. Uh, if we look at this, uh, the, the clustered index scan of the post table, we looked at this a little bit in the first, in the first video, but you’ll see that we have a predicate that finds where owner user ID equals 22656.

And then we have an output list down here where owner user ID is emitted, not omitted, but emitted. It is omitted in the second one.

It is emitted in the first one from the scan of the post table, right? And, uh, it is emitted from the scan of the post table because it is in the select list in this one.

If we compare that to the clustered index scan of the post table for the second query that was, we got the estimated plan for without the hash join hint, we will see that owner user ID is not emitted from this one.

It is omitted from this one, right? So owner user ID is no longer emitted. It is omitted.

Exciting stuff. But because of that, SQL Server, uh, can’t, uh, has to use what, uses what’s called implied predicates.

And it uses that in the first plan too. Like, like if you look at the, what happens in the user’s table, we still have the literal value for 22656 here. The problem with this one is more over here. So let’s follow owner user ID throughout the plan.

So owner user ID is emitted here, right? We checked that box. Owner user ID is, um, passed through the hash join here. Owner user ID is grouped here, right?

You can see owner user ID. Uh, and then owner user ID is in the output list of here. But it does not get emitted here.

It gets omitted here. And then because it is not emitted here, that’s why we get the nested loops joined with no join predicate here.

But then here we have a nested loops join where the, that is an apply nested loops, right? But SQL, the, the, the, the value that is getting applied down here is still a literal value. So the, the, the, the online content that I wanted to show you, um, and before I jump into that, uh, it’s, it’s, this is, you know, foundational stuff.

Um, when you have a, uh, or rather if you want a merge join or a hash join, then your join clause has to have at least one equality predicate. If you do not have an equality predicate, SQL Server must use a nested loops join.

For some reason, this one down here, this thing is not our friend when owner user ID is not emitted from the post table. When it is omitted, this thing has a hard, has a real hard time.

So, uh, way back when, uh, I, so you can see the difference here. It is almost, wow. Uh, it is, it is 10 years, almost exactly, uh, to the day when, um, well, I mean, March 23rd, 2019, uh, 10 years to the day when Microsoft, geez, what ha, what happened to my brain?

It is not 2019, it is 2025. So it is like almost 15 years since Craig wrote this post. Ah, my brain’s on fire.

Uh, and this is about implied predicates and query hints. And Craig talks about, uh, another situation where, uh, forcing a hash join, uh, yields an error with one query when it doesn’t with another.

So like changing the predicate from B equals zero on this one to A equals zero. And this one gets us the exact same message that we saw, right? And it’s sort of the same thing happening here, just, you know, in a slightly different manifestation.

So, um, the only thing that seems to make SQL Server hang on to the owner user ID column is to do something goofy to the owner user ID column. Like, um, say P dot owner user ID mod zero plus P dot, uh, P dot owner.

Why can’t I type owner? Wow. And IntelliSense, neither IntelliSense nor Redgate is helping me on this one. So I’m just going to copy and paste that.

But if we do something like this, all of a sudden SQL Server can figure out the query plan again, because we have, we have, we have mangled, we have done enough weird stuff with owner user ID that it now decides to emit it from the post table rather than omit it from the post table.

So if you’re working with a query like this, and you really want to hash join, and you use a hash join hint, and SQL Server’s like, nah, can’t, can’t, can’t do it. Um, part of the problem might be, um, that you have a scalar subquery and your select, select clause, select query clause.

And that scalar subquery, um, references a column that is, uh, well, obviously it’s probably going to reference a correlate, right? That’s what we need to do here.

We correlate. If we correlate on the owner user ID column, but we do not emit, we omit the owner user ID column, then SQL Server will not be able to come up with a hash join plan. Um, that is, that is the best I got on this.

Um, it’s a weird one. It’s a very, very strange one. I admit it. Uh, but that’s why it was kind of a fun riddle. And that’s why we, we thank, uh, any, any higher power that we might believe in, whether it’s a bicycle or a couch or, uh, I don’t know, that’s Maru, the PEZ dispenser, that Craig Friedman existed and blogged as much as he did back when he did.

So, uh, I hope that this answers your riddle to your satisfaction. But if it doesn’t, um, you’re going to have, you’re going to have to go further because I have, I have expended all of my brain, uh, on, on this one.

Anyway, thank you for watching. I hope you enjoyed yourselves. I hope you learned something. And, uh, I will see you in the next video where, uh, we will talk about so many fun and interesting things.

You will, you will forget that this ever happened. So, anyway, thank you for watching. Goodbye. Bye.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.