Learn T-SQL With Erik: SELECT FROM WHERE?

Learn T-SQL With Erik: SELECT FROM WHERE?


Video Summary

In this video, I delve into the fundamentals of writing SQL queries by focusing on how to select from a table using a WHERE clause—a critical skill for any data professional. As part of my Learn T-SQL with Erik course, which is currently in its early stages, I provide an essential introduction that ensures viewers understand the basics before diving deeper. The video covers key concepts such as the difference between writing and executing queries, explaining why certain elements like aliases can’t be referenced in specific parts of a query due to how SQL Server processes them. This teaser gives a taste of what’s to come, setting the stage for more detailed exploration into query optimization, logical processing, and effective database design strategies that will empower you to write better T-SQL code. The link to purchase this video during its pre-sale phase is in the description; by buying now, you can be among the first to access all the comprehensive content as it unfolds.

Full Transcript

Erik Darling here with Darling Data, and today’s video is going to be a small amount of teaser material from my Learn T-SQL with Erik course from the first module where I walk through the basics of selecting from a table with a where clause. Fun stuff, but we all have to begin somewhere, and this is about as close to the beginning as we can. as you can get. And this course does need to make sure that everyone, by the time we get to the end, knows absolutely everything about writing good queries. So that’s where we’re going to start. The link to purchase this video during the pre-sale era is down in the video description, so if you would like to purchase this video, you would like to see the full course content when and as it becomes available. You can purchase now and you can be on the ground floor for when all of the good stuff, the full course material starts getting presented. Again, this is just a small chunk of it. So the first thing to understand, and it’s a fundamental concept for understanding why some queries work and some queries don’t, and why you have to write some queries differently.

from other queries is that the way that you sit down and write a query is a bit different from the way that SQL Server goes and interprets that query when it gets executed. You, of course, start with your select, and there’s a whole bunch of stuff that you can do with the select clause. Then, you know, you can, you know, you can, well, I guess, semi-optionally have a from clause, and there’s a whole bunch of stuff that you can do within your from clause. And then you can have your where clause and your group by, and you’re, and if you have group by, you might even add in a having clause.

And then you have your order by. Now the presentation order by down here is very important because without this presentation order by, we do not have any guaranteed result ordering. This is a problem that I find in many client queries. And this is a question that I see people asking to this day where it’s something like, if every time I run this query, it returns data in the same order. So, do I always need an order by? The answer is, well, yes. If you want that data to always, if you want that query to always return that data in that order, without an order, without presentation order by, might not always do that.

I do talk about why more in depth in some other modules, but for now, just know that that is the case. And one should not rely purely on a small handful of maybe observed behaviors, because those observed behaviors may change as your database changes. And of course, the way that SQL Server starts putting your queries together for execution, it is slightly different.

SQL Server starts with the from, moves on to the where, moves on to the group by, moves on to the having, then down towards the end, it finally gets to select and then order by. And then, well, I did, I did leave this out the first time, but because I hope to God, you never actually have to work with XML or JSON. But if for some reason you do just know that, that happens way at the end. SQL Server is like, I’ll deal with you last.

You mess. All right. But yeah, so SQL Server, the way that it interprets a query is different from the way that your query is written. That’s why there are some elements of your query that are not going to be eligible or accessible in other parts of the query. By that, pretty specifically, I mean that, you know, like if you have some sort of expression in your select list, you can have it in your order by, but you like you can access that alias in your order by, but you can’t access that alias in your where clause or your group by or your having.

Like, I think Oracle recently introduced something where you can now reference aliases in the group by, which I think is very cool because it saves you a lot of like, like perhaps like repasting some complex expressions in your select list in your group by clause. But the main point of writing queries is we tell our database the stuff that we want. And it’s the database’s job to go and give us that stuff, go get it for us.

So the select list tells the database which columns we want. The from clause tells the database which tables those columns are in. The where clause tells the database which rows to include or exclude in the query.

Group by will tell us which rows to produce aggregates with. And the order by tells the engine which way to sort results. In other words, your job is to shape the result set and it’s the database’s job to retrieve it.

So if I were looking to expand the darling data empire and I were saying, hey, database of all these cool people who, you know, show up to events or take my training. All of these wonderful, wonderful, brilliant, beautiful, bright attendees. If I were to say, hey, I need to make some money.

I need some need to put some change in my pocket. I would want to tell the database to give me company contact information for 100 people. Right. You want to send out that big email.

Right. You don’t want to send out a bunch of individual emails. That’s a big, not very, not very wise. I find who primarily use SQL Server who have the highest consulting budgets. And that query would look a little bit like this.

Or I select the top 100. Right. Just limit it to those 100 results. Now SQL Server uses top. Other database engines might use limit.

Some of them might only use offset fetch. But SQL Server has this wonderful proprietary top command. Not sure what Microsoft was coding for there, but we have it anyway. And then I would want to get some company names and some contact names and some email addresses.

And I’d probably actually want to see a number on the consulting budget before I decided what to do here. Because without seeing that, you might sort by the top 100 descending. And you might find someone who has like a 50 cent consulting budget.

And well, that’s just not enough change for the pockets these days, is it? Not in this economy. Not in this economy.

Not in this economy. But then I would limit the results to people who primarily use SQL Server. And my presentation order by. Right, you don’t put the word presentation in there. Just the order by is sufficient.

Then I would order by consulting budget descending. And that would be the data that I want, shape the way I want it, and the order that I want it in. Queries are really just descriptions of the things that you want to see.

It’s sort of like how indexes contain data, right? You create an index and the data that you, the way that you define that index defines what data lives in that index, whether it’s key columns or included columns. And then you get statistics, like you get a histogram on that data that describes what data is, like exists in that index. So databases contain data and we describe the data that we want to see with queries, right? So all the stuff that we do to write that query is just a description, right? It is not an order, right? It is not a, like, it does not tell the database how to do anything. It just tells us what the end result should be. The better job you do of describing the results you want, the fewer surprises you’ll have later when results show up. There’s a lot that goes into this. This is a very loaded statement and that’s what the rest of the course is for, to help you describe your results as accurately as possible. There are going to be certain things that your query is going to be at the mercy of as far as database design goes, though, right? There’s a lot that can go on here that can make or break a query. Data types being one of them, the amount of times that I have seen data incorrectly typed in a table, whether it’s a, you know, I mean, it’s usually something else that is stored in a string column, right? So it’s like a date or like an integer, not like a phone number because phone numbers make sense to store as strings because they might have parentheses or dashes or you might call 1-900-ERIC-RULES. It has letters in it. Someone might have like an extension that they put in, right? So there’s like reasons that to not store phone numbers as integers, but there are things that like that should be stored as the correct data type that I often see are not. Other things that can help your queries when they run, if your, if data is unique, either in a column or in some group of columns, some composite key of columns, you should tell the database this. If your data is closely related to data in another table, enough so that it would, it would, it would best be described with a foreign key relationship, you should do that.

If you have some domain knowledge of what is in your tables, where either the, the values that should, that are allowed to exist in there, then you should put constraints on your tables to enforce that. Likewise, if data should never be null, then a default constraint is a very good idea because that way, in most cases, you can get around nulls ever showing up in there. And of course, your, your database will function best when you have useful indexes to help your queries locate the data they care about, have data in the order that they care about, all sorts of things like that.

Now, the actual material, there’s a bunch of awesome stuff in here. And this teaser video, it’s left to the imagination. I know, such a tease. So when we think about logical query processing, there’s fundamentally no difference between the way the queries, like how the way the queries that I’m going to show you are implemented by the optimizer. And it’s really hard to do spell checks in comments, but I’m going to fix this one. So don’t worry about that. So again, your job is to shape the data. And it’s the engine’s job to get the data. Now, because of the way that logical query processing works, you might notice this nice little red squiggle here. That’s because total display names is an alias for this count big aggregate function, which means because this is in the select list, we can’t reference it in the where clause because to SQL Server, the select list hasn’t happened yet, right? Only the, only like we have the from, we have the where, we have the group by, but select and order by have not happened, have not occurred to SQL Server yet. So this query is going to give us an error. All right. So like there’s, there’s no running this one. Now this is an aggregate, so we don’t want to put a where clause on that. That’s where we use having, but let’s turn on actual execution plans.

Now you can either push this button here at the top of SQL Server management studio, this one right here, or you can hit control and M on your keyboard, assuming that you have a rather standard keyboard. And there are two other ways that we could write this query, depending on how we’re feeling.

It’s not going to make a big difference to performance in this case, how these things, how we do it. It’s, but there are going to be physically implemented by SQL Server in the same way, right? So this query where we basically successfully do what we tried to do above, and we use having to filter out rows produced by the grouped table aggregate, right? Where count big is greater than two greater than or equal to 2000, which is what we were looking for in the where clause here that was invalid, or we can add another layer to the query. So the reason why the logical query processing thing is important is because every time you add in a new from that process kind of starts over, right? SQL Server has to like say, oh, well, there’s a from here. What happens in this from?

Oh, there’s a from out here that we’re froming this from and then like you can really from yourself into a really interesting query, the more layers you add to it. But the point here is that this from is the inner scope of the query because we have these lovely parentheses giving this query a little hug.

And this query has its own little alias out here. And then we can refer to everything inside of that inner scope in this outer scope, right? So the from here gives us a different scope of from to get our data from, right? So if we run these two queries and we look at the execution plans, we’re going to get the same results back and we’re going to get the same query plan. Now performance here is no big deal, but they both do roughly the same thing. They both scan the clustered index. They both use a hash aggregate to collapse the rows, right? By display name. And then we have a filter operator.

And that filter operator is going to be the same for both of them, or rather have the same meaning for both of them. In this one, we have this expression 1001 greater than equal to 2000. That’s the count, that’s the having in the count big, because remember, we don’t know which rows are going to have that count that meets our requirement until we’ve done the aggregate. So we have to group all our data, and then we can filter it. But it’s the same thing in here, where we have a predicate on expression 1001 is greater than or equal to 2000. So this filter happens after we’ve collapsed our data down, and we’ve done our count, and we know which grouped rows are going to meet our greater than or equal to 2000 requirement. So there are two ways to write this query, but the way that the query gets physically implemented is the same, right? So again, it’s our job to describe the query, the data that we want, and it’s the engine’s job to go and get that data and come up with a plan to get that data that works according to our logic. So what I want you to understand here is that the results of queries don’t just have to be returned out to some client, right? Like they don’t have to go out to SSMS, they don’t have to go to your application, they don’t have to, you know, end up like, you know, wherever on some web portal screen.

The results of a query can be passed to another scope of that query. You can even put the results of a query into a temporary object and then use something else to talk to that temporary object. But it’s almost sort of like the pipeline concept of things where each from is sort of like its own little pipeline where data comes from here, and you can pipeline it into another from. So you can really build big queries that do exactly what you want. I’m not suggesting from a performance standpoint that big monolithic queries are always a good idea or even ever a good idea. There is a lot of wisdom in writing smaller chunks of queries that do use temporary objects, but we’ll get into that a little bit more detail later. A lot of the performance tuning I do is on big giant monolithic queries that did not feel like maybe worked out well when the database was small and needed the money. And as the database got bigger and money was no object, that query got worse and worse and slower and slower.

So what when you’re writing a query, there are many things that might seem just downright impossible with just writing a select from where maybe some joins and stuff that when you start introducing outer and inner scopes or perhaps using CTE, if not derived tables, they can make writing a query that seems impossible very, very simple and easy. Anyway, thank you for watching.

I hope you enjoyed yourselves. I hope you learned something. And again, this, the link to purchase this content is in the video description. So if you’re interested in buying this course while it’s on sale, you can get to it now. Anyway, thank you for watching. Goodbye.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Learn T-SQL With Erik: Course Introduction Material

Learn T-SQL With Erik: Course Introduction Material


Video Summary

In this video, I introduce my T-SQL course aimed at SQL developers, data scientists, engineers, and analysts looking to enhance their T-SQL skills or those who are just getting started. I walk through the setup process, including the need for SQL Server 2022 Developer Edition and the latest version of Management Studio, as well as how to download and use sample databases like Stack Overflow 2010 or 2013. Additionally, I provide a preview of some introductory material that covers common T-SQL conventions, basic commands, and data storage fundamentals, all designed to help learners get up to speed quickly. This introduction is just the beginning; stay tuned for more in-depth content starting with the basics of the `SELECT FROM WHERE` query structure.

Full Transcript

Erik Darling here with Darling Data, and we’re going to spend some time, you and I, where I’m going to be presenting small portions of the material from my T-SQL course here on YouTube. It is nowhere, going to be nowhere near the full breadth of material that you will find in the paid course, but it will be enough to hopefully lure you in. It will be hopefully a wiggly enough worm to lure you in to purchase it from me so that you can get the the full goodness from it. So of course, no learning experience is complete without a little bit of an introduction to talk about what course you’re taking. You know, like that. The course introduction slide. Beautiful. And of course introducing your instructor. So this is me. This is my very impressive resume. This is my MBA. This is just about a full encapsulation of all the details of my life, at least the important and pertinent ones to what we’re doing here, learning T-SQL together. This course is, of course, aimed at people who are trying to get into or who are practicing, but maybe not so meaningfully practicing their T-SQL skills. So if you’re a SQL developer, whether it is, you know, you’re developing, OLTP applications, or you are developing other types of things, you could be a data scientist, data engineer, data analyst, but it doesn’t matter because this course covers all sorts of query writing stuff. It doesn’t matter if it’s transactional or if it’s like reporting OLAP stuff.

I’m going to hit everything in here. So whatever your job title is, if you work with Microsoft SQL Server with T-SQL, this course is absolutely for you to learn from me. It’s amazing the way that works out, isn’t it? Something for everyone. The goals of the course are to teach you sound fundamentals, right? To make sure that all the little gaps and cracks and whatever you think you know are filled in. To advance your skillset meaningfully, really step you up so that when you’re writing queries or procedures or creating stuff, you do a real good job at it.

You know, you can make very informed choices as to the things, the structure of your database and your queries and everything else. And you will not be to blame when things go terrible. I’m also here to bust some very pervasive SQL myths because there are a lot of them floating around out there. Probably one of the biggest sources of SQL stupidity on the planet is LinkedIn, which is really just a playground for morons to post LLM nonsense in.

Granted, there are plenty of SQL bloggers out there who are incapable of an original thought that an LLM didn’t spit out for them. But we don’t need to pay too much attention to them, do we? Because we can recognize that and we know better than to traffic their websites. So this course, again, just to sort of recap a little bit, is going to be recording the content that will be presented at PASS.

Kendra and I are doing two days of T-SQL courses. So my goal is to have this entire thing cover the material presented there, plus all the stuff that was sort of on the cutting room floor. Right. Like, you know, two days of pre-cons, you’re looking at about six and a half hours of teach time each. This is going to be way more than that. So this is like all that stuff, plus a whole lot of other stuff that probably can’t fit in there.

The course up until it goes live will be available for two hundred and fifty dollars. And then it will go up to five hundred dollars once everything is said and done and complete. Videos will start dropping for this after May 10th.

I realize it says May 10th, but May 10th is Saturday. So it’s really after May 10th. I just ran out of room. So what I want to show you in this video is some of the starting material that like the from the overview material that you will you will get with the with the full course.

Starting off with some instructions about how to get to the sample data. Data and databases and the tools and things that you will need in order to run queries the way that I am showing you. So you will need SQL Server 2022 Developer Edition.

There is no harm in you getting 2022 Developer Edition and practicing there. Considering that SQL Server 2025 will probably be dropping sometime towards the end of this year, probably five, six months from now. It’s not a bad idea to get acquainted with the current newest version of SQL Server.

And of course, the latest version of Management Studio. I am right now using the SSMS 21 preview. That’s how I have this lovely dark mode.

But you don’t necessarily have to do that if you prefer a lighter mode of Management Studio. The demo databases that you can use for either of these are the Stack Overflow 2010 or 2013 database. They are you can download them from, of course, from these links.

The small version expands to about 10 gigs. The larger version expands to about 50 gigs. So depending on the size of your laptop or VM or wherever you’re going to be running these exercises and doing these practice things, that’s where you should do that. You should you should get a database that will fit reasonably into that hardware.

Now, to extract these, you will either want 7-zip or Windows 11 because Windows 11 can work with 7-zip zippy files or whatever. But 7-zip is very good at compacting things down. So that’s why 7-zip gets used here.

When you unzip everything, you will get four data files and one log file. It is not a backup file. So there’s no like restore database command. It is an attach database thing.

So you need to move your data and log files to whatever wherever your SQL Server is installed. And then you will use the attach database dialog to select the first MDF file and it will detect everything else that it needs to put in there. Along with the sort of get you set up instructions, there are a few other files that I thought would be useful to the folks looking at this because, you know, like me knowing T-SQL, I maybe make you or rather anyone who knows T-SQL may make a lot of assumptions about what other people know about T-SQL, but they might not actually know.

You might say some words, then they’re like, I know those are words, but I don’t know what they mean. So there is a file of what I consider to be common T-SQL conventions that I will refer to throughout the course. There will probably be a few things to get added in here once I get to some of the other stuff.

But for now, it covers logical query processing. It covers, you know, temporary tables, table variables, derived tables, just some basic syntax so that when you see query patterns, you understand. Exactly what they are.

And if I say some words that you don’t, you know, you don’t know the meaning of you can, you can refer to this file and you can find exactly what I’m talking about in here. So there is a very good sort of compendium of common terms and expressions and what they mean in here. There’s also some common T-SQL usage stuff that I thought would be useful.

So just sort of teaching like just some basic commands, mathematical operators, comparison operators, stuff like that. Common functions that you might use for things. So all sorts of good stuff in there.

Some newer functions when working with strings. So this stuff over here. And some information about how you can search. Some searching stuff, which is useful.

Some stuff when you’re talking about working with dates, right? All fun things to do in there. Aggregation functions and, of course, window functions as well. There is also this file here on data storage, which is just a rather brief overview of how data is stored in SQL Server.

So there’s kind of a lot of readie text in here for you to go over and absorb, assuming you need it. And then there’s an example database. I don’t use this database in the course of the training.

I just wanted to have a separate sort of set of commands that would show you common SQL commands like create database, use database, creating a schema and granting permissions, creating tables with different things in them, like different constraints and whatnot. And then, of course, how to like add constraints to tables and add indexes to tables. There are some filtered indexes in here.

There’s all sorts of like just basic command things to help you learn fundamentals of creating tables and indexes and whatnot in SQL Server. So there’s some fun stuff in here just to get people acquainted with SQL Server who are not, or rather T-SQL, who are not maybe completely familiar with it. So just, you know, very basic things to help sort of like serve as a reference if you run into anything you don’t understand or you want to learn how to do something that maybe you don’t know the command for off the top of your head.

So this is not meant to be a copy of like the full documentation for everything you can do in a database, just kind of the most common stuff that like you might have to do day to day and the most common terms and things that I’m going to be referring to during the course of me teaching you these things. So that’s the course intro material. Nothing too crazy in there, but I think it’s a pretty good starting place for folks who are completely unaware or rather like getting basic start starting material for T-SQL.

Anyway, that’s that. After this video, we’re going to do some teaser material from the actual like learning portion of it. And I think this is a pretty good starting place for most folks.

Even if you are pretty good with T-SQL, you can probably learn a thing or two from these as well. Anyway, thank you for watching. Hope you enjoyed yourselves.

I hope you learned something. And I will see you in the next video where we will talk about the very starting point of most of your queries, which is the old select from where combo. So we’ll get into that then.

All right. Thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

SQL Server Performance Office Hours Episode 12

SQL Server Performance Office Hours Episode 12


What are your top recommended books on SQL server performance tuning for someone planning to explore this field for the first time?
How are you Eric? I have a stored procedure that reads data from tables with (option recompile). Initially, the execution plan indicated that some tables lacked statistics, displaying a warning sign. After updating the statistics, the warning sign disappeared. However, the procedure always times out on its first run each day, but subsequent calls are successful. Please advise.
I suspect some tables are not getting accurate stats with the default sampling rate. How would you recommend confirming this and finding a better value to set for persisted_sample_percent if I’m right?
If I remove a very large non-clustered index what is the impact on the transaction log?
You seem to default to turning on SORT_IN_TEMPDB. Why?

To ask your questions, head over here.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

PASS Precons and New T-SQL Course Announcement!

PASS Precons and New T-SQL Course Announcement!


Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Is OUTPUT Broken In SQL Server?

Is OUTPUT Broken In SQL Server?


Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

DevOps Devour Hour – New York City Lunch & Learn – May 23, 2025

DevOps Devour Hour – New York City Lunch & Learn – May 23, 2025


Video Summary

In this video, I’m Erik Darling from Darling Data, and I’m excited to share some insider details about an upcoming event that’s going to be a fantastic opportunity for database enthusiasts. On the 23rd of May, I’ll be joining forces with Kendra Little and Steve Jones at Redgate’s New York City Lunch and Learn. We’ll dive into topics ranging from performance tuning to cloud migrations and zero-downtime deployments, making it an absolute must-attend event for anyone looking to enhance their database skills or just network with like-minded professionals. The best part? There’s even a rumor of a happy hour planned after the sessions, so you can dance the night away while continuing the conversation. If you’re in the area—or just want an excuse to visit New York City—this event is definitely worth your time. Don’t miss out; register now by clicking on the link provided in the video description at the very top!

Full Transcript

Erik Darling here with Darling Data. We have a very special promotional video today for an event that I’m going to be at in May, a little bit later in May. It is a Redgate sponsored event, New York City Lunch and Learn. That makes it convenient enough for me. The nice folks at Redgate have been kind enough to give me a reason to live. Well, maybe not that, but at least they’ve given me a reason to leave the house on a Friday. Okay? And that’s good enough for me. It’s going to be at a space called Industrious. That’s at 730 3rd Ave in New York City. It’s on the second floor, so maybe the view won’t be so spectacular outside, but hopefully the event will be absolutely spectacular inside. Now, it’s going to be myself, Kendra Little, and Steve Jones talking about database stuff. I’m going to be doing a session on performance tuning. Kendra Little is going to be doing a session on migrating from one cloud to another from Azure to Amazon, and Steve Jones is going to be doing a session on zero downtime database deployments. So that’s going to be absolutely magical. And then I’ve also heard rumors of a happy hour that will be taking place after the event from 3 o’clock on. So, you know, get your dancing shoes ready.

Again, that’s Friday, May the 23rd. The link to RSVP will be in the video description at the very tippity top. Usually that’s where I reserve my space for my own benefit, but because this is a wonderful Red Gate event, we will be pushing my stuff down a little bit lower and putting the registration link right at the tippity top. So, if you are in New York City for whatever reason or just maybe in the surrounding area and you’re looking for an excuse to get into the city, maybe catch a baseball game or something, then swing on by here beforehand. Maybe get a tax write-off or, you know, be able to convince your boss that it’s for work and it’ll be a grand old time. Anyway, thank you for watching and hopefully see you there.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

A SQL Server Query Plan Answer

A SQL Server Query Plan Answer


Video Summary

In this video, I dive into a fascinating riddle involving query plans and SQL Server performance. After posing the question in yesterday’s video, I explore the mystery of why removing a column from the SELECT list causes an error when using a hash join hint. I walk you through comparing estimated plans with and without the hint, highlighting the differences between nested loops joins and hash joins. By experimenting with scalar subqueries and correlated columns, we uncover some peculiar behavior related to implied predicates and query optimization. This video is packed with insights for SQL Server enthusiasts, whether you’re a seasoned professional or just starting your journey. If you enjoyed this deep dive into query plan mysteries, consider supporting my channel by becoming a member or simply liking, commenting, and subscribing—every bit helps!

Full Transcript

Erik Darling here with Darling Data. If you watched yesterday’s video, you know that I asked you a riddle. Since I’m recording this way ahead of time, like unbelievably far ahead of time, I have no idea if anyone answered, came up with anything good. If you did, good job. If you didn’t, shame on you. Anyway, let’s talk a little bit about channel stuff and lifestyle. just get it all out of our system. And then I’m gonna tell you what happened with that query plan. So if you love answers or riddles, you can sign up for a membership and you can support all my efforts to bring you interesting SQL Server content. Whatever. There’s a link down in the video description for that. If you don’t have money for whatever reason, or you don’t feel like spending it, I don’t know, whatever Scrooge. You can like, you can comment, you can subscribe. And if you want to ask me questions for my Office Hours episodes, you can click on that link, which will be also down in the video description, right where it belongs. I do consulting. I am a consultant from the Latin word console, which means a tiny lap infant or something. But that’s what I do for money, mostly because YouTube, doesn’t doesn’t pay much of a bill. If you need help with your SQL Server, perhaps you need answers to some riddles. There’s something weird on my forehead. I can never tell. I am available for hire you. You hire me with money, health checks, performance analysis, hands on tuning, dealing with performance emergencies, and of course, keeping your putting your developers in line so that you don’t have performance emergencies anymore. That’s the bulk of my activities. But if you need something else, well, we can talk about it.

And gosh darn it, my rates are reasonable. If you would like some training on SQL Server, I get that too. Boy, do I have it. All the important stuff, all the best stuff. I don’t leave out all the fluff that a lot of other people put in. You get all 24 hours of it for about 150 US dollars. And that is good for life. Por vida, as they say, in various places around the world. I’ve been to some of them, I think. Upcoming events, sequels, Saturday, New York City, ever closer, closer by the day, isn’t it? Taking place on May the 10th of 2025 with a performance tuning pre-con by Andreas Valter on May the 9th. That is the day before, that is Friday. I highly suggest tuning into both of those. The nice folks at Red Gate are taking pass on tour. They are coming to my fine city in August, August 18th to 20th. Someone else’s fine city, Dallas, September 15th to 17th.

And yet another person’s fine city, Amsterdam, August 1st to 3rd. And then Pass Data Community Summit will be taking place in Seattle, November 17th to 21st. So if you live in Seattle, it’ll be your fine city. If not, no, you should show up anyway, maybe. A lot of people just make Seattle their own, from what I hear. But with that out of the way, let’s talk about these strange and dramatic planes.

So the riddle from yesterday, of course, is that when we run this query, as is, it takes about four or five seconds. The performance of it really isn’t the point. That’s not really the riddle here. But the riddle is that it runs and completes successfully, and we get an answer back with a query plan and all that good stuff.

When we try to run that same query with owner user ID removed from the select list and do this, we get an error that the query processor could not produce a query plan because of the hint supplied to the query. So what are we looking for here?

Well, we’ve got a hash join hint on the query plan. All right. Well, that’s, that’s, that’s, that much is obvious. So what, if, if we were to start comparing things here, what we might want to do is get estimated plans for both of these without, without the hash join hint, obviously, because that prevents us from getting a query plan.

Ah, oh, that hurts. Hurts something fierce. So let’s, let’s highlight these.

That took a long time for some reason. That scroll was like ages. So let’s, let’s get estimated plans for these and let’s see what happens. So, um, obviously, this little grabby doohickey will cooperate. Usually it’s Zoomit.

Zoomit is about to not cooperate. Now, before it was the drivey doohickey. The plans are, of course, different. If we sort of frame this up about here, that’s good enough for now. Uh, this, this plan, of course, because of the hash join hint, uses the hash join here.

Uh, and it uses, well, you can see, you can see the, what is another hash join, like, here. But, you know, um, it’s a little hidden from us. I didn’t frame this perfectly.

Uh, and then down here, we have a nested loops join. And, oh, screw it. Let’s just go look. Uh, so here is the second hash join, the first plan, uh, users table. But we have another nested loops join here.

All right. So, there’s, like, obviously the join types are different. But why? And why can’t we use a hash join in this plan when we can in this one? Well, if you were to experiment with these queries, with this query a little bit, you know, we don’t need to experiment with both of them, just this one.

You know, um, like, obviously we could see if the problem is with the comments table, but then we’d have to do a lot more work. So, let’s, let’s, let’s start with the path of least resistance. And let’s quote out this, this, this here scalar subquery.

And we’re not going to use block quotes here because we just need to do this temporarily. Block quotes are forever. The double, double line quotes are for temporary, uh, temporary measures. And now let’s try to get the estimated plan with the hash join.

Well, that, that worked, didn’t it? So, the problem isn’t with this hash join here. The problem is with, uh, the, the, the, the join to the users table. Oops.

That was supposed to be control Z, not capital Z. Uh, join this. The problem is the join to the users table that, uh, gets the display name. But why would that be still? Why?

What, what happens? What is different between these? Well, uh, we have to dig in a little bit here. And we’re going to, I’m going to show you why. And then I’m going to show you some, some, that there is some background knowledge on this, uh, out there on the internet.

If you know, if you, if you know exactly where to look and Microsoft hasn’t ruined it by archiving the, the content yet. But, um, let’s examine the, the, the, the top plan first.

The one that works with the hash joins. Uh, if we look at this, uh, the, the clustered index scan of the post table, we looked at this a little bit in the first, in the first video, but you’ll see that we have a predicate that finds where owner user ID equals 22656.

And then we have an output list down here where owner user ID is emitted, not omitted, but emitted. It is omitted in the second one.

It is emitted in the first one from the scan of the post table, right? And, uh, it is emitted from the scan of the post table because it is in the select list in this one.

If we compare that to the clustered index scan of the post table for the second query that was, we got the estimated plan for without the hash join hint, we will see that owner user ID is not emitted from this one.

It is omitted from this one, right? So owner user ID is no longer emitted. It is omitted.

Exciting stuff. But because of that, SQL Server, uh, can’t, uh, has to use what, uses what’s called implied predicates.

And it uses that in the first plan too. Like, like if you look at the, what happens in the user’s table, we still have the literal value for 22656 here. The problem with this one is more over here. So let’s follow owner user ID throughout the plan.

So owner user ID is emitted here, right? We checked that box. Owner user ID is, um, passed through the hash join here. Owner user ID is grouped here, right?

You can see owner user ID. Uh, and then owner user ID is in the output list of here. But it does not get emitted here.

It gets omitted here. And then because it is not emitted here, that’s why we get the nested loops joined with no join predicate here.

But then here we have a nested loops join where the, that is an apply nested loops, right? But SQL, the, the, the, the value that is getting applied down here is still a literal value. So the, the, the, the online content that I wanted to show you, um, and before I jump into that, uh, it’s, it’s, this is, you know, foundational stuff.

Um, when you have a, uh, or rather if you want a merge join or a hash join, then your join clause has to have at least one equality predicate. If you do not have an equality predicate, SQL Server must use a nested loops join.

For some reason, this one down here, this thing is not our friend when owner user ID is not emitted from the post table. When it is omitted, this thing has a hard, has a real hard time.

So, uh, way back when, uh, I, so you can see the difference here. It is almost, wow. Uh, it is, it is 10 years, almost exactly, uh, to the day when, um, well, I mean, March 23rd, 2019, uh, 10 years to the day when Microsoft, geez, what ha, what happened to my brain?

It is not 2019, it is 2025. So it is like almost 15 years since Craig wrote this post. Ah, my brain’s on fire.

Uh, and this is about implied predicates and query hints. And Craig talks about, uh, another situation where, uh, forcing a hash join, uh, yields an error with one query when it doesn’t with another.

So like changing the predicate from B equals zero on this one to A equals zero. And this one gets us the exact same message that we saw, right? And it’s sort of the same thing happening here, just, you know, in a slightly different manifestation.

So, um, the only thing that seems to make SQL Server hang on to the owner user ID column is to do something goofy to the owner user ID column. Like, um, say P dot owner user ID mod zero plus P dot, uh, P dot owner.

Why can’t I type owner? Wow. And IntelliSense, neither IntelliSense nor Redgate is helping me on this one. So I’m just going to copy and paste that.

But if we do something like this, all of a sudden SQL Server can figure out the query plan again, because we have, we have, we have mangled, we have done enough weird stuff with owner user ID that it now decides to emit it from the post table rather than omit it from the post table.

So if you’re working with a query like this, and you really want to hash join, and you use a hash join hint, and SQL Server’s like, nah, can’t, can’t, can’t do it. Um, part of the problem might be, um, that you have a scalar subquery and your select, select clause, select query clause.

And that scalar subquery, um, references a column that is, uh, well, obviously it’s probably going to reference a correlate, right? That’s what we need to do here.

We correlate. If we correlate on the owner user ID column, but we do not emit, we omit the owner user ID column, then SQL Server will not be able to come up with a hash join plan. Um, that is, that is the best I got on this.

Um, it’s a weird one. It’s a very, very strange one. I admit it. Uh, but that’s why it was kind of a fun riddle. And that’s why we, we thank, uh, any, any higher power that we might believe in, whether it’s a bicycle or a couch or, uh, I don’t know, that’s Maru, the PEZ dispenser, that Craig Friedman existed and blogged as much as he did back when he did.

So, uh, I hope that this answers your riddle to your satisfaction. But if it doesn’t, um, you’re going to have, you’re going to have to go further because I have, I have expended all of my brain, uh, on, on this one.

Anyway, thank you for watching. I hope you enjoyed yourselves. I hope you learned something. And, uh, I will see you in the next video where, uh, we will talk about so many fun and interesting things.

You will, you will forget that this ever happened. So, anyway, thank you for watching. Goodbye. Bye.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

SQL Server Performance Office Hours Episode 11

SQL Server Performance Office Hours Episode 11


We have an application that uses two RDS nodes [Write-Only + Read Replica]. The read replica does not autoscale in any way. Do you see any advantage in this architecture? To me it just looks like wasted resources, since the UPDATE/INSERT/DELETE will have to be executed twice now.
Can you give me a quick summary of the changes made to Query Store in SQL Server 2022?
Hi Erik, Could you record a video where you explain when ‘LEFT JOIN – WHERE IS NULL’ is preferrable from ‘NOT EXISTS’, if there are such cases ?
Our index templates for devs to use default indexes as Desc order. Are there any performance issues you’ve run into with this for single column indexes? I know an index can be read both ways, so curious what else it might impact.
We have 4 SSDs in a RAID10 setup for our database server. We recently replaced the SSDs and the data centre are asking if we want the cache to be enabled or if it was turned off for a reason. Is caching recommended for SSDs in a RAID10 setup? A read only cache?

To ask your questions, head over here.

Video Summary

In this video, I’m Erik Darling from Darling Data, and we’re diving into another exciting Office Hours episode where the community gets to ask me their burning SQL Server performance tuning questions. Whether you have queries about optimizing your database setup or just want some general advice on life and everything in between, these episodes are packed with valuable insights. I cover a range of topics from the practical aspects of SQL Server performance tuning—like when auto-scaling isn’t necessary for RDS nodes—to more nuanced areas such as the changes made to Query Store in SQL Server 2022. I also delve into some technical queries, explaining scenarios where `LEFT JOIN WHERE IS NULL` might be preferable over `NOT EXISTS`, and discussing the implications of indexed templates designed with default sorting columns. Whether you’re a seasoned DBA or just starting out, there’s something for everyone in this episode.

Full Transcript

Erik Darling here with Darling Data, and I am just pleased as punch because we are about to do another episode of Office Hours! Yay! These are the episodes where I answer five user submitted questions that you, the user, submits. Typically about SQL Server performance tuning, but you are free to ask me personal questions about life, love, weightlifting, I don’t know, eating, drinking, sleep hygiene, general hygiene, etiquette, manners, whatever, whatever you need to know about because I’m, I’m a man of the world, baby! I know all sorts of stuff. If you would like to answer, ask questions that I answer on Office Hours, if you start answering questions, we’re in trouble. You have this wonderful, beautiful link right here in the most kindest of blues. That link is provided to you in the description. below the topic of video description. Below the topic of video description, there is also a link where you can sign up for a membership to say thanks for doing all this stuff that you do here on YouTube for SQL Server people. Join 60 or so other people who have said yes to the dress. If you like this content, but you don’t particularly like dresses, you can also like, comment, and subscribe. Other ways to help my channel become the top SQL Server channel on YouTube. All wonderful things that you can do. If you need help with SQL Server, if you just, if you’re staring at this thing and it’s just not getting any faster by you staring at it, you can hire me. I am a consultant and I do this stuff for a lot of people all the time. Health checks, performance analysis, hands-on tuning of your servers, queries, and indexes, dealing with SQL Server performance emergencies, and of course, training your developers so that you have fewer SQL Server performance emergencies in general, which is usually a great way to improve sleep hygiene. We’re talking about that. If you would like to get some training from me, you can do that as well. Go to that link, plug in that discount code, and you can get 75% off whatever you buy. If you buy the everything bundle, you can get that for about $150, and that lasts for the rest of your life.

So, live a long time. Happily, healthily, all that stuff. Upcoming events. This slide, all of a sudden, got very, very busy. There is a lot going on in the SQL Server data community at the moment. We have SQL Saturday, New York City, May 10th of 2025. We have Redgate on tour in New York City, August 18th through 20th, in Dallas from September 15th to 17th, and in Amsterdam from October 1st to 3rd. And, of course, we have the venerable Pass Data. Well, I suppose I should say Pass Data Community Summit, but I don’t want to spell community wrong and mess anything up.

Taking place in Seattle, November 17th to 21st. I should probably double check that, make sure. Let’s see. Our internet’s a little slow here. Let’s see. Pass Data Summit, 2025. Pass Data Community Summit. Oh, that’s a wrap for Community Summit. Hey, there we go. Yep, 17th to 21st. I was right. Cool. All right. Let’s go answer some questions now.

What’s that? I’m dead. No. Yeah, it’s so cold. All right. Let’s answer these questions here. We have an application that uses two RDS nodes, one write-only and one read replica. Well, that makes sense. Well, that depends on what you call a waste of resources.

Well, that depends on what you call a waste of resources. I don’t think you’re offloading reads if you’re keen on that. So I don’t think it’s a waste depending on how you’re using it. I don’t really see what auto scaling has to do with it unless you have a primary on like one set of hardware and a read replica on some diminutive set of hardware.

But that’s between you and whatever buttons you clicked in Amazon. But no, I don’t think it’s a waste. There are all sorts of good uses for replicas that don’t involve auto scaling. Bit of an odd bird there. All right. Can you give me a quick summary of the changes made to Query Store and SQL Server 2022?

Gosh, we are going into reading the documentation mode. All right. So let’s see here. Query Store is on by default. That’s a nice one. Query Store is on by default. There are all sorts of additional configuration options that you can set to decide which queries end up in Query Store.

It got like, what is it, optimized plan forcing or whatever that feature is called. That one’s kind of neat. You get Query Store hints, right? You can add hints. You can manually add hints to queries in Query Store. It got tied into, I think, a couple other features.

I forget if the parameter sensitive plan one needs queries to… It feels like it doesn’t. Maybe it does. I don’t know. But things like cardinality estimation feedback, memory grant feedback, those rely on Query Store. We’re supposed to get like the thing where it’s available on second, like AG second, availability group secondaries.

But that’s still not in… That’s like not safe for production, so says Microsoft, three years later. So, I don’t know. Got me on that one.

But yeah, that’s sort of the important stuff that I can think of off the top of my head. Let’s see. Hi, Eric. Could you record a video where you explain when left join where is null is preferable from not exist, if there are such cases? Well, I would if I ran across such a case.

For the most part, the cases I run into, not exist just works a whole heck of a lot better. So, if I happen to come across a case where the left join, where something is null pattern works out better, I will surely record a video for it because that will be a magical outlier to what I generally witness out in the wild. All right.

Let’s see here. This is probably a weird one. Okay. Are indexed templates for devs to use defaults? Okay.

So, you have indexed templates for your developers that presumably someone created where you default to sorting columns in descending order. Are there any performance issues you’ve run into with this for single column indexes? I know an index can be read both ways, so curious what it might impact.

Well, yeah, sure. I mean, lots of stuff. But it really depends on what your queries are doing.

Queries can be read both ways. However, backward scans are not eligible for parallelism. So, that’s one thing.

If you are using a lot of windowing functions where you need to sort things in descending order, this might be very helpful. If you’re not, it might not be. If a lot of your queries are asking for columns in descending order, this could be useful.

Like in the order by, that could be useful. But, geez, for someone who has made a template for developers, you sure seem unaware of the repercussions of your choices in designing these templates. Or whomever designed these templates could maybe answer the question as to why it was designed that way.

And shed some light on exactly their decision-making process for that choice. So, yeah, there’s all sorts of stuff that it can affect. But does it affect?

I don’t know. You and I would have to look at your queries together. It sounds like a fun consulting engagement. All right. We have four SSDs in a RAID 10 setup for our database server. We recently replaced the SSDs in the data center.

We are asking if we want the cache to be enabled or if it was turned off for a reason. Is caching recommended for SSDs in a RAID 10 setup, a read-only cache? So, here’s my take on it.

Is I don’t want… I generally don’t want any caching on my disks. Generally, I want my caching done with memory.

I know this isn’t going to directly answer your question. But really, your reads should be as non-reliant on disk in any disk settings as possible. You should strive to keep the set of data that you most often care about in the buffer pool because going to disk is always going to be slow and painful, regardless of any caching settings you have on there.

But the only thing that cache settings do is inflate crystal disk mark benchmarks. So, you know, sure, it might be nice to have some moderate cache of, like, hot data for reads on there, but your goal should not be to read anything from disk anyway.

You didn’t ask about right caches, but that’s a whole prickly world of strange things anyway. So, yeah, I don’t have a big strong opinion on this because my opinion gears more towards, hey, let’s not go to disk.

Let’s have it in memory so we don’t have to deal with it as much. You might be in a special situation where maybe you’re on standard edition and your buffer pool is limited to 128 gigs.

Or maybe you’re on enterprise edition, but you have, you know, 70 terabytes of data and you can’t possibly get enough memory to deal with that. That’s when you should be looking into stuff like compression, page compression, maybe even columnstore, maybe even using a handy script to clean up your indexes, like sp index cleanup, which you can get at my GitHub repo.

So there’s all sorts of things that I would want to get through before I start tinkering with disk cache settings because once you get, like, I feel like if you get to the point where you’re tinkering with disk stuff, you have sort of failed your workload in a different way.

Like there are many things within SQL Server that you, and, you know, with memory that you could be doing that would provide you much, much more benefit than worrying about this. This is not a finish line setting at all by any stretch of the imagination.

So work on some of the other more fundamental stuff first. All right. So that’s five questions, right? One, two, three, four, five.

So says the row numbers. The row numbers never lie. So we’re going to put this one in the bucket. Thank you for watching. I hope you enjoyed yourselves. I hope you learned something and I will see you in whatever the next video is. Whatever just explodes out of my brain and onto the screen.

All right. Anyway. All right. Thank you for watching. Goodbye.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

A SQL Server Query Plan Riddle

A SQL Server Query Plan Riddle


Video Summary

In this video, I present a riddle involving two SQL Server queries in SQL Server Management Studio (SSMS) to challenge your analytical skills and deepen your understanding of query optimization. The first query includes the `owner user ID` column from the `post` table with a hash join hint, while the second query omits this column, leading to an optimizer error despite no apparent reason for it. If you enjoy riddles or are interested in SQL Server performance tuning, I invite you to sign up for a membership and help me solve this intriguing puzzle. Your support will ensure more riddle content in future videos! For those who prefer straightforward solutions, feel free to leave a comment expressing your desire for no more riddles—though that might just make the riddles even more tempting.

Full Transcript

Erik Darling here with Darling Data, and in today’s video I have a riddle for you. So if you enjoy riddles, stick around. If you don’t enjoy riddles, I’m giving plenty of warning to avoid having to be riddled by anything. But today I’m going to present a riddle, and tomorrow I’m going to do my best to answer the riddle to your satisfaction. Hopefully you will not react as terribly as the troll under the bridge, and you will not react as terribly as the troll under the bridge. I’m going to eat one of my toes. It hurts to stand. If you would like, if you like riddles, boy, you can show me how much you like riddles by signing up for a membership. So the more people who sign up for a membership after this video, the more riddle content you will get. I will riddle my butt off. If you hate riddles, you can also leave a comment that says, hey, I would sign up for a membership if you never tell another one. The link to do that is down in the video description. If you are just sort of nonplussed by riddles, I hope that’s the right word for that. You can like, you can comment, you can subscribe, or if, I don’t know, whatever you feel like doing. If you want to ask me a question during office hours, I was going to go somewhere with that, and then I decided against it.

If you would like to ask me a question for my office hours episode, that is the link. It will not be blue in the video description, so don’t get freaked out if the colors aren’t the same. If you have your own query plan riddles, if you just have so many riddles that you can’t deal with them all, and you would like some help with that, I am available for hire. You can literally pay me money to solve your SQL Server performance problems.

Health checks, performance analysis, hands-on tuning, dealing with performance emergencies, and training your developers so that they don’t get their toes eaten by the troll under the query plan riddle bridge, whatever. I do all that, and as always, my rates, they’re reasonable. If you would like some also reasonably priced stuff from me, I have training, about 24 hours of it, a little bit more than that, depending on if you watch it at double speed.

If you watch it at regular speed, it’s about 25, 26 hours. But, you can get all that for about 150 US bucks with the discount code right there. Also, it will not be blue in the video description, but the link will be there for you.

Upcoming events, boy, oh boy, oh boy. Spending a lot of time in New York City, aren’t I? SQL Saturday, New York City, coming up on May the 10th, fast approaching, just a couple few weeks away. Get your seats while you can.

There is a performance tuning pre-con on May the 9th by Andreas Valter. And we are all very excited about all of this. We are putting on our finest New York City attire to attend this event. So, please dress appropriately.

And the fine folks at Redgate are bringing Pass on Tour. We will be mini-passing in New York City, August 18th to 20th. The most beautiful time of year to be in New York City.

Then we’ll be in Dallas, September 15th to 17th. Again, beautiful time of year to be in Dallas. And then Amsterdam. Wow, I mean, just beautiful times of year all around. October 1st to 3rd.

And all that’s going to be great. And you should buy tickets and come and hang out with me. Imagine seeing Erik Darling in Amsterdam. Be amazing, right? Who knows?

Who knows what hijinks we could get in together. And then, of course, the big event, Pass Data Community Summit, taking place in Seattle, November 17th to 21st. We will be having a grand old time in beautiful Seattle on the edge of fall.

If you look out across the soundy area, there’s just wonderful foliage everywhere. Highly recommend it. But with that out of the way, let’s riddle ourselves a little bit.

Let’s go over to SQL Server Management Studio. And I’m going to show you two queries. All right? Two queries. We got two of them.

Otherwise, there would be no riddle. And the first query, well, actually, let’s just say there is one difference. Shut up, tooltips. There is one difference between these two queries.

And the first query, the column owner user ID from the post table, that’s P. That’s P up there, is present in the select list. And we have a hash join hint down here.

All right? Obviously, the hash join hint is integral to the riddle. And in the second query, the only difference is that the owner user ID column has been removed. We have quoted that out.

So if we run this first query. Wait, wait, wait. Give this a few seconds. This thing doesn’t have any very special indexes to help it. It takes about four seconds.

But it runs and returns results that are correct. And we get a query plan from it. You can see all this stuff. SSMS is hiding our missing index request a little bit. That’s not so important to the riddle, though.

But we run and we get a valid plan and everything is okay. Now, if I try to run this second query with owner user ID removed, what happens? You get an error.

Not an Eric. We get an error. Though I am frequently confused with such things. The error that we get is that the query processor cannot produce a query plan because of the hints defined in this query. Resubmit the query without specifying any hints and without using set force plan.

Well, why is that? Why would owner user ID being emitted from the select list make this an invalid query? Now, if we don’t highlight the hash join hint, if we just highlight this part of the query and we get an estimated plan, this is what it looks like.

We don’t have hash joins at all. We have a nested loops join. We have a nested loops join where SQL Server is saying we don’t have a join predicate, which is a little bit silly because if we highlight, or rather, if we bring up this tooltip and we see that we have a predicate here for where p.owner user ID equals 22656.

And if we highlight this, we will see that we have a predicate where user ID in the comments table equals 22656. So technically, the predicate has been resolved and we don’t need a predicate at the nested loops join. But we also have now a nested loops join going to the users table.

So there are no hash joins in this query. Your job is to figure out why. Your job is to figure out which part of the query makes this query optimizer error occur and try to come up with a reasonable explanation for it. Should you choose to accept this mission, of course.

If you don’t, I understand. You have better things to do. Waiting for the answer. So anyway, this is where I’m going to leave things.

These two queries and, of course, a link to the demo database will be available in a GitHub GIST or GIST. I don’t know if it’s a GitHub GIST, GitHub GIST, GitHub GIST, I don’t know, something like that. It’s hard to tell how they wanted that pronounced.

I’m not going to get into the GIF versus GIF debate on this one. It’s weird to say a GitHub GIST. Anyway, I promise I’m dead sober for these.

That’s the worst. Anyway, thank you for watching. I hope you enjoyed yourselves. I hope that you enjoyed this riddle and I hope that you will try to answer this riddle. If not, then stick around for tomorrow’s video in which I will provide as in-depth an answer to this as possible.

Anyway, thank you for watching. Where’s the button? There’s the button.

Goodbye.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Does The FORMAT Function Still Suck For Performance in SQL Server?

Does The FORMAT Function Still Suck For Performance in SQL Server?


Video Summary

In this video, I delve into the performance implications of using SQL Server’s FORMAT function, particularly when dealing with large datasets. I compare its execution against a simpler alternative—using CONVERT—and highlight significant differences in query plans and runtime. By running two queries side-by-side—one utilizing FORMAT and the other employing CONVERT—I demonstrate that while both yield similar results for smaller row counts, the FORMAT function can significantly slow down operations on larger datasets, making it less ideal for high-throughput environments. I also discuss scenarios where FORMAT offers advantages, such as custom date formatting requirements, but emphasize the importance of considering performance when working with extensive data volumes.

Full Transcript

Erik Darling here with Darling Data. And today’s short video, so short in fact that I’m not going to do the full intro slides because they would actually be longer than what I have to say about what’s in the video. We’re going to ask and answer a rather simple question. Does the format function still suck? And by still suck, I don’t mean like, does it do something bad? No, it does. Does some actually kind of neat things that you can’t do very easily otherwise? But it can still be pretty bad for performance, especially if you need to put a lot of rows through it, which is unfortunate because, you know, I use it in a lot of my procedures to very conveniently format numbers and percentages and other stuff, but I do it over a relatively small number of rows, tens or hundreds or, I mean, maybe at the high end, like a few thousand. And it just doesn’t really cause much of an issue there. So there is, there is a real need to like, like specify that format really does like the penalty for it really is when you start to get up to bigger and bigger row counts. That’s not to say that it’s free for smaller row counts. It would certainly be slower than other alternatives, but it is, you’re just not going to see like big difference.

I think if you’re like doing like a select top five or something, right? Like you need to format something, use the format function five rows is not going to cause a meaningful difference. You know, if you’re using this in an OLTP environment where absolute throughput is crucial, I might, I might give it a, you know, a second consideration to not to get it out of there because you could be, you could be, you know, adding just enough overhead to every single thing to like mash on the system a little harder, but in general, it wouldn’t be the first thing that I go for. So without going through everything else, let’s go over to Management Studio. And I’m going to show you as soon as that highlighting comes the hell down. So, two queries, two queries. They both use, come on, wake up, zoom it. Almost. Are you there? There you are. There’s a pretty lady. I’m going to use my patented method of forcing SQL Server to fully process a result set.

By fully process a result set. I mean, do all the work in here without actually producing any results. By not producing any results, I mean by filtering to where row number equals zero. SQL Server isn’t smart enough to know that no row number can start with zero. So it does all this work and then throws it all away, filters it all out. So I’ve got this query up here that does this. And you can see that I’m using the format function not to do anything so crazy. I am just formatting this date as year, month, day, right? So in other words, basically like the equivalent of saying convert date, get date, which coincidentally is what I’m doing down here. Convert date, get date.

So these would produce equivalent results, right? Converting creation date to a date. I guess if you want to get really picky, I would have to say like convert and barcar 10 or something around this, but it’s just not going to make a big difference for what I’m going to show you next. The second part of this would be running these, but you know, like I said, aiming to keep this video rather short. So I ran these ahead of time. So you didn’t have to sit there and wait on all sorts of spinny things. And you’ll see that the top query plan, which runs through about 24 and a half million rows in total, that takes about nine seconds.

The bottom query plan, which does the exact same thing, takes about three seconds. So let’s stop here and let’s look at where the differences in this query plan really are. I have to remember which way to turn so that this happens correctly.

You’ll notice in the top query, my hand is conveniently cut off. That scan of the clustered index takes about 800 milliseconds. This one here, a little bit longer, right?

But that’s not because of the function. The function isn’t like in the where clause. It’s just in the select list. If you look up there, we have a sort that takes 2.1 seconds. And that sort’s pretty close, right? 2.104 to 2.181.

Those are little CPU timing differences, maybe. Then we have a window aggregate, a lovely batch mode window aggregate. Oh, dear. Well, you sure do love me a window aggregate.

25 milliseconds, 19 milliseconds. You know, it’s tiny little timing differences. But then we get to our friend the compute scalar. This is where things heat up a little bit.

The compute scalar for the format function. Takes 6.1 seconds. 6.1 seconds. The compute scalar for the convert function.

Takes 23 milliseconds. Then we have a little filter over here. This is where we filter out to row number equals zero. That’s why, this is why the row count goes suddenly from 24.5 million rows to zero rows in both of these.

All right. You can see that pretty steep drop off there. It’s a pretty big cliff. Boop.

Boosh. But all the time is spent in the compute scalar. And if you go to the properties of the compute scalar. So if you’re looking at an execution plan and you’re using the format function. And you notice that there’s a compute scalar that takes us a strangely long amount of time.

You might want to just verify that. So one easy way to do that is to go to the defined values. You’ll see the list of expressions that the compute scalar is responsible for computing.

And if we zoom in up here, we will see the function name for this one is format. And we’ll see all the stuff that format is doing in there. And that’s where the 6 seconds goes.

The, sorry, the compute scalar down here. Same deal, except this one is going to show us convert. And we’re going to see the work that the convert does in here. Where did the convert go?

There, there, there she’s hiding. So if you’re using the convert function over a large number of rows, think about like a big ETL processor, data import thing.

I would strongly urge you to use something different. Use convert with a style that gets you the formatting you want rather than using convert. Convert is very convenient and very easy to use.

And it’s actually flexible to, it’s a little bit easier to do some stuff with format than it is to do with convert. But convert has a lot less penalty and overhead to it from a performance perspective. Just as an example, I have one client who needs to present dates with like a four digit year, two digit month, two digit day, but then only with hours and minutes, like no seconds or milliseconds.

And format does that really easily. That’s a very easy specification in format. I haven’t really seen a way to do that with convert.

You can get convert pretty close and then use like, like, use like a substring or like left or right or whatever to cut off the minutes and seconds and milliseconds if you need to from using convert. But there’s no like, just automatic like style to do that. So just be careful out there.

If you are using format with like big data processing, loading, like modifications, stuff like that. You might want to take a second look at your query, your actual execution plans and see if you have any big time spent in compute scale hours because you might be able to make significant improvements by, by using convert instead. Nor did I said convert and not cast.

We do not use cast in the Darling data household. We use convert in the Darling data household. Cast is icky, except try cast because Microsofty reasons. But we’ll talk about that in another video.

Anyway, thank you for watching. I hope you enjoyed yourselves. I hope you learned something. And I will see you in the next video, which will be about something equally as useful. I promise.

All right. Thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.