Learn T-SQL With Erik: Ordered Deletes and Table Expressions

Learn T-SQL With Erik: Ordered Deletes and Table Expressions


Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Learn T-SQL With Erik: Identity Annoyances

Learn T-SQL With Erik: Identity Annoyances


Video Summary

In this video, I delve into some common annoyances related to the `IDENTITY` property when working with temporary tables in T-SQL. Specifically, I discuss how using an identity column can hinder parallel inserts and provide alternative methods for generating auto-incrementing IDs that maintain query performance. I also highlight the benefits of using a `ROW_NUMBER()` function instead of relying on the `IDENTITY` attribute, especially for larger insert operations into temporary tables. If you’re working with heavy data movement queries or need to optimize your T-SQL scripts, this video offers valuable insights and practical solutions. Whether you’re just starting out with SQL Server or looking to refine your advanced query tuning skills, there’s something here for everyone.

Full Transcript

Erik Darling, your friend from Darling Data. I don’t know, like at this point, I don’t know, I either need to change my name or the company name and spice up these intros a little bit. Something new to say. This is going to be another video of material from my new T-SQL course.

This video is going to be, I don’t know, hopefully a rather short one to just go over some annoyances with the identity property when you are trying to insert data into a table. And not for like single row inserts, but let’s say, you know, you’re creating a temp table that you want to work off of and, you know, you need some incrementing number in there.

Some stuff that I run into, well, like performance tuning queries to do that. All 23 hours of the beginner content is live and published. The course is at the pre-sale price of $250. It will double in value.

After the summer when the advanced material comes out. So I do, I would urge you to buy now and save yourself $250. Unless you are, unless you’re just like, if you’re psyched on giving me an extra money, cool.

But I appreciate that. But, you know, if you want to save yourself some money, get in there, get in there sooner rather than later. So let’s talk about these identity annoyances. Now we don’t have a ton to cover here. It’s just some example queries that I want to show you.

So one thing that I lean on a lot when I am tuning queries is, you know, like, like temp tables are a wonderful mechanism for making queries go faster. You can materialize a result set. You can, you know, create indexes across, you know, a combined set of columns. Like if you have to join two tables together, you can put those two, you can put those rows into a temp table.

And that would allow you to create an index across columns that might come from different tables, which ordinarily you couldn’t do. And would make maybe your performance tuning or just general indexing job a little bit tougher to make things go faster. So there are very, very good reasons to put data into a temp table.

Something that I see a lot of people do is add an identity property to a temp table. It’s never entirely clear to me why, but I see this quite a bit. And, you know, like later in queries, it does end up getting used a little bit for stuff, which is, you know, of course, curious to me.

But there are better ways of giving yourself that same outcome without hampering the performance of your queries. So just for example, let’s say that we create a temp table with the identity property, and then we try to insert into that temp table. Now I’m using the tab lock hint here because one thing that is very useful generally is to like when you’re inserting into a temp table and the goal is to get a fully parallel insert is to use the tab lock hint with the insert.

It doesn’t always work. Sometimes select into does work better. But in our case, if we are if this is our big attempt here, then, you know, one thing that will prevent the insert from being in the parallel zone is going to be the identity property.

If we look at this query plan, what we would hope for is that this gather streams operator would be way over here. Right. And that the table insert would happen within the parallel zone, like over here.

But we don’t get that. Right. The this this query does scan the post table in parallel, does aggregate the data in parallel. And but then we we exit the parallel zone early and we repartition streams.

Like I said, the identity property is something that will prevent you from being able to do a fully parallel insert. Right. And there’s another sort of silly way of doing this.

And that’s when you say select into like this. And I don’t know why this I need to change that keyboard shortcut. But if you do something like this, you can you can also say ID equals identity integer one one.

And you can create an identity on the fly. The thing is, this is this is one of the few things that will like stifle the insert from being parallel. We’re going to see the exact same estimated plan here where the gather streams ends early, the in the table insert is outside of the parallel zone.

So this is not what we want to see again. Again, this compute scale are right here is going to be where that identity starts getting assigned. And this is what causes this is that requires the parallel zone to end where it does.

You can do this yourself with a row number with by using row number. Remember, row number starts at one and just counts up until you run out of rows. So this is often a much faster way of giving yourself what is essentially an auto incremented ID column, but with what with better performance along the way.

Note in this query plan, when we get the estimated execution plan, this is what we wanted to see in the first place. We have everything happening, everything that we care about happening within the parallel zone. So there is the clustered index scan.

There’s the aggregate. In this case, we need to sort some data in order to generate the row number. But, you know, you can index your way around that. I just haven’t gone that far with this simple demo. So then we use our window aggregate in order to produce the row number.

And then notice here, this is what we this is what we cared about last time. This is what we cared about in this video that wasn’t happening in the last couple of queries where the table insert is before the gather streams. So all of the rows that end up in this table would have been spread out across multiple threads.

And this can really, really help larger inserts into temp tables. Small, small row, small row count inserts is probably not going to make a very meaningful difference. And depending on like what your max stop is, sometimes it really helps to bump max stop up to make sure this goes quickly.

But like, you know, there are some like factors involved with figuring out if this is worth pursuing. But when it is, this can make a very, very big difference into how fast rows end up going into your temp table and getting loaded in from the was part of the insert. So just something to consider when you’re tuning queries, if the speed of your insert into a temp table is not what you want.

You might try the tab lock hint. Of course, if you’re going to use a tab lock hint, there are many things that you like. Like you should really just start with a insert into a heap temp table and then add in any indexes later that you want.

But this does have some ramifications around like plan caching and reuse and recompiles and stuff. So, you know, but if it’s the type of thing where, you know, it’s a process and you’re loading a lot of rows in and it’s not like, you know, some old crazy OLTP query that needs to run every, you know, three mil, three milliseconds or something. Then this kind of like larger effort tuning is worth it.

So something to think about when you’re tuning your queries, especially your big, big heavy lifting data movers. If you want to get the fully parallel insert much easily, much more easily. It can be a very, very neat sort of workaround to just use row number instead of an identity attribute.

Get yourself that auto incrementing thing. Create whatever indexes you need later, perhaps, and just use that. Anyway, thank you for watching.

I hope you enjoyed yourselves. I hope you learned something and I will see you in the next video where I think are we talking about? We might be talking about something sequency. I forget.

I don’t know. I don’t want to go look and spoil it for both of us at this point. Anyway, thanks for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Learn T-SQL With Erik: SELECT INTO

Learn T-SQL With Erik: SELECT INTO


Video Summary

In this video, I delve into some lesser-known aspects of T-SQL’s `SELECT INTO` statement, sharing practical tips and insights that can be particularly useful when working with table backups or creating temporary tables. I explore how to use `SELECT INTO` for schema-only copies, which is handy for quickly duplicating a table structure without the indexes and other metadata. Additionally, I discuss some of the quirks and limitations associated with using `SELECT INTO`, such as handling nullability issues and preserving identity columns. By walking through these examples, I aim to help you better understand how to leverage this powerful T-SQL feature in your database management tasks. Whether you’re a beginner looking to expand your T-SQL knowledge or an experienced DBA seeking new tricks, there’s plenty here to discover and apply.

Full Transcript

Erik Darling here with Darling Data. And we are going to talk a little bit more about T-SQL today. We’re going to do some stuff about Select Into. This is, of course, some teaser material from my T-SQL course, Learn T-SQL with Erik. That’s me. All 23 hours of the beginner content is out there in the world, floating around for you to watch and start to learn from. And this is, of course, still at the presale price of 250 US bucks. And that will be going up to 500 bucks when the course is fully published after the summer. And just to reiterate, this is good for life. So you will not have to renew or resubscribe this purchase at any point. Anyway, let’s talk about Select Into a little bit. Now, there are some fun things that you can do with Select Into if you want to create a backup of your table. For example, you can say Select Top Zero and this will just make a schema-only copy of your table. Of course, schema-only has some limitations. It’s not going to bring over indexes and stuff like that. So if you need, like, a schema-only copy of your table, you can just make a schema-only copy of your table.

You might need to script out. It might be easier to either script out the whole thing or just script out additional items after you’ve made the schema-only copy. But for most people, just the schema-only copy is enough. I actually use this technique a bit when I am writing my diagnostic queries because it’s fairly easy to, you know, say, like, select into a regular table from, like, a DMV that may not have, like, a create table statement for it. And then I can script out the created table from it and, you know, just kind of, like, validate some stuff to make sure everything came out all right.

Another thing that you can do in a similar vein is to, say, select into where some, with some where clause that, like, can’t ever possibly be true. So in this case, I’m using where 0 equals 1 to imitate the top, you know, the top 0 thing. You can also do that with a temp table, of course.

And, you know, like I said, when you do this, you have to add any indexes after the fact. Sometimes that’s okay. Sometimes, you know, you’ll want to create the table. You know, there are some interesting downsides that can occur when you do select into, and your select is not just, like, you know, select the columns from a table into something else.

An example of that is something like this. Now, there are, of course, other, like, instances where this can get weird and interesting. For example, if, you know, you’re using, you know, like, like, if you’re, like, combining columns in some way, if you are, you know, concatenating strings or, you know, you have some other expression in your select list, you do have to be careful that whatever the, however that expression is evaluated on the select into results in the correct data type in the final table that you’re putting that data in.

And, like, and this is worth checking because there can be some strange inferences that SQL Server makes when you do select into. So, if you, like, for some select into statements, it is worthwhile and it does make sense to use convert to make sure that you carefully control the data type of what ends up in whatever temporary or staging table you are creating via select into. I think one interesting thing about select into is around nullability.

So, if we run this and we say select into this table, and this doesn’t return any rows, which is fine, we don’t have to. But if we say select all this stuff into this table where account ID is not null, and then we try to create a, we try to create a clustered primary key on the account ID column, even though we’re, in here, we’re saying, like, you know, we’re, like, where account ID is not null, and, you know, we’re, you know, grouping by stuff.

Like, the group by is, like, the group by at least gets us, like, the, like, the unique part of it, but the column still inherits this nullability trait. But there is a way around that, and that is to use is null. This does not work with coalesce.

Coalesce does not give you the same behavior. But if you say is null account ID zero, then, of course, you can get rid of the nullability of this thing, and then you would be allowed to create the primary key on it. Now, this does have some funny limitations.

Well, not really limitations, but just something to be aware of, is that you have to be very sure that you are not, that you don’t have at least multiple nulls in here, right? If you have one null, then it might be okay, unless you have an account ID of zero, because, remember, you’re replacing nulls with a value. So, you might need to use a better canary value, depending on what you’re doing, like negative two billion something, right?

To make sure that at least it won’t, wouldn’t conflict with any naturally occurring values in here that would prevent it from being unique. And in this case, because we’re trying to create a primary key on it. So, that’s something to just think about and be aware of if you are going to use this.

Another thing that I think is neat with select into is that, so like when I said that, you know, a lot of the table, like metadata stuff is not carried over when you do select into. One thing that is carried over is the identity attribute. It does get reset, of course, but like the identity attribute will at least be present on whatever column.

So, if we, let’s just say we’re going to take the users table and we’re going to use the ID column. And in the users table, the ID column is the cluster primary key. It also has the identity attribute to it or identity property.

So, if we just make sure this table doesn’t exist first and then we select the ID column into a new table and we use SP help to show the table definition, you’ll see that the ID column does indeed retain the identity property here. Right. So, this is still an identity column in the new table.

It, you know, does, it is seeded back to one and it does increment by one. So, if you had different stuff for you, like it doesn’t pull over all the, like the current seed value and stuff from your other table. But there’s a way to get rid of that.

And that is if you, on your select into, just say something like select ID equals ID plus zero, then that turns into an expression and SQL Server no longer attempts to, I don’t know, maybe attempts to is the wrong word there. But SQL Server no longer preserves the identity attribute for this table. If we look at the results of SP help after running this thing where select, where ID is ID plus zero, we’ll have slightly different results down here where it says no identity column is defined.

So, there are some neat things that you can do with select into. And I chose this portion of the material because, you know, just talking about inserts with, you know, like values and selects and all the other stuff, you know, it, it, it, it, it’s neat, but this stuff is a little bit neater to me. So, just some fun T-SQL tricks and quirks around select into for you to think about and be aware of perhaps when you are writing queries that use this.

I hope you enjoyed yourselves. I hope you learned something. And I will see you over in the next video.

We’re going to, we’re going to talk a little bit more about the identity, the identity attribute or the identity property, however you prefer to, however you prefer to call it. And some of the effects that it can have on performance when we are doing inserts or select into, however you want to call it. Anyway, thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Learn T-SQL With Erik: Expression SARGability

Learn T-SQL With Erik: Expression SARGability


Video Summary

In this video, I delve into the concept of sargability and its limitations in table expressions like Common Table Expressions (CTEs) and derived tables. I explain that these constructs do not materialize their results, which is a common misconception among T-SQL developers. By examining execution plans and demonstrating with examples, I illustrate how SQL Server processes these queries without materializing the intermediate result sets, leading to potential performance issues. The video also covers the importance of using computed columns for indexing expressions within table expressions, as this allows for better query optimization and improved performance.

Full Transcript

Hey! It’s still me, Erik Darling, with Darling Data. You are still learning T-SQL with Erik. That’s me. This is, of course, the teaser material for my T-SQL course. It is currently $250. It will go up to $500 after the summer when the advanced content is done and recorded. All 23 hours of the beginner content is available, and you can start watching that now. And in this video, we are going to talk about sargability to a somewhat limited degree because there are many things to talk about with it, but of course, there’s only so much time we have when we’re doing a teaser. So, let’s get on with that. Where is SQL Server? I’m still not used to this new icon down here yet. That’s still very strange to me. I don’t quite know if it’s like some harnessing or a saddle or infinity or an hourglass. or a mobius strip. What are you going for with this logo? I don’t get it. It’s just weird blob. Anyway, the thing that I want to focus on in this video is that there is some delusion that remains amongst T-SQL queriers out there in the world that if you put some expression into a table expression, well, we’re going to use CTE here as a convenient vehicle for this example, but the same thing is true of derived tables and pretty much anything else that you just query all in one shot, that if you put some expression into one of these that you materialize that expression when you absolutely do not. There is no materialization of anything that you have to do all in one go.

So, let’s turn on execution plans up here. And we’re going to look at a query that says, select age underscore i. This is the CTE version, of course. You can tell because it starts with this foolish width. And it replaces all, remember the age column in the user’s table is all nulls. So, it replaces all the nulls with zero. And then we select a count out here where this column in this CTE equals zero. So, that is our where clause down here. Now, if we did this and we expected some materialization of the result, we might expect to see something in here like a spool or something else that noted that like a result, like something from our results was put into a temporary structure in the query.

But we do not have that. All we have is a clustered index scan and a stream aggregate which takes care of the count that we did. And if we hover over the clustered index scan, you will see that this predicate is applied where is null, yada, yada, yada. So, there is like this did not get materialized here. And it’s the same thing if we were to use a derived table, right?

So, if we say select count from and then in here we nest our expression and then we stick a where clause on it to say where this equals zero, we will see an identical query plan here. We’ll see where there is no like materialization of anything. There is a stream aggregate and there is a clustered index scan. And if we hover over this, we will see the same predicate applied here.

Now, this isn’t to say that you can’t, you can’t like fix this. It is to say though that table expressions cannot be indexed. You can put an index on the underlying table, but if you wrap your column in a function, you’re going to ruin how well that index can be used. So, like we don’t have an index here, which is whatever, because there’s not a point at creating one, right?

We wouldn’t be able to seek to anything in it anyway. If we put an index on the age column and we wrap that column in is null or coalesce or like left, right, replace, upper, lower, L trim, R trim, len, data len, any one of those things. SQL Server can no longer seek to rows or values in that index. So, don’t do that.

The way that you can tell if, like, you know, if there were like, if we were really going to go on board with like materializing result sets within derived tables or CTE, then it would be like an additional layer of being able to define indexes for that result set, right? Like that might be kind of fun or not. I mean, I think that would just be, that would be like an eager index fool, wouldn’t it?

I hate, we hate those things. But, you know, like Microsoft, we don’t have CTE or table expression, like derived table, table expression materialization. There’s not a hint for it.

There’s not a, at least as far as I know, there’s not a trace flag for it. So, we can’t do that. Like I said, we can put indexes on the underlying tables and the optimizer can pick those up and use them. But there’s no way to define an index on a table expression like this.

And likewise, there’s no way to materialize a table expression like this to like store the result of it somewhere. So, what you have to do if you want to fix something like this is you have to add a computed column that gives you that expression, which then would allow you to index that expression, right? And we can do this for free and we can do this for, well, not as free, but it is pretty easy for just a single column there.

And now, if we run either of these, we will be able to seek to those values because we created a computed column which mimics this expression and then we indexed that expression. But without that, we can’t do any of these things. And SQL Server does not lift a finger to help us one bit.

So, when you’re considering like, you know, query performance and things like that, a lot of the stuff that people will bury away into CTE to make their queries more readable or that they’ll, you know, like they’ll stick in there and think, oh, I have this magical like, you know, like result set that I can just refer to over and over again. It’s just absolute nonsense. So, don’t get stuck in that train of thought because you will end up writing some of the worst performing queries in your entire life.

Anyway, thank you for watching. I hope you enjoyed yourselves. I hope you learned something.

And I will see you in the next video, which I don’t, I forget. I think we might be getting into like modification queries next, which is exciting. It’s very exciting. Don’t discount the fun and excitement of modification queries.

After all, none of us would have up-to-date resumes if it were not for update, delete, truncate, drop table, all that other good stuff, right? Yeah. How else do you keep those things fresh?

Anyway, thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

SQL Server Performance Office Hours Episode 24

SQL Server Performance Office Hours Episode 24



To ask your questions, head over here.

Hi Erik! Can you demonstrate or share topic about HA Setup with Two Database Node and one AG Listener?
You made a comment recently on LI about switching from int to bigint not being as easy as it sounds with compression tricks. What problems have you hit?
If you worked at Microsoft, what would you do with SQL Server to make it competitive with Postgres?
Do you ever offer discounts on consulting or just training?
Do you plan on offering Postgres training or content?

Video Summary

In this video, I dive into some interesting questions from our community and share my insights on various topics. First, I address the challenge of setting up a High Availability (HA) environment with two database nodes and one AG listener, explaining why it’s not as straightforward as it might seem. Then, we delve into the complexities of switching data types from `int` to `bigint`, particularly when dealing with compression techniques. I highlight how most real-world scenarios are more complicated than simple table setups, emphasizing the need for careful planning and potential dependency issues. Lastly, I discuss my hypothetical role at Microsoft if I were to help shape SQL Server’s future, focusing on modernizing T-SQL and enhancing query store features to better support monitoring and performance tuning. It’s a lively session packed with practical advice and candid thoughts!

Full Transcript

Erik Darling here with Darling Data. And of course, we have five questions to answer, so we are going to office hour. Or as the ancient Egyptians said, office horus. Alright, still working on that one. Anyway, if you like this channel and this content, you can sign up for a membership. Link in the video description. Link in bio, as they say. If you want to ask questions, privately, that I will answer publicly on these office hours episodes, you can go to this link, which is also in the video description, and you can ask me questions. If you need some help beyond what asking a question on this whole YouTube thing can possibly get you out of, maybe you’re just in a lot of trouble. I am still a SQL Server consultant. I do all these things at a very reasonable rate. And I am available for hiring. So, get it well done. It’s hot. If you would like to get my performance tuning training, all 24 hours of it are available to you for about 150 USD at that link with that discount code. If you go down to the link down in the video description, you will be very happy to know that it is all wound up in there for you. And I also have my new T-SQL course available, Learn T-SQL with Erik. All 23 hours of the beginner content is fully available.

to you. It is down at the pre-sale price of 250 bucks, which will go up to 500 bucks after the summer when the advanced content is recorded. Speaking of summer, boy, oh boy, we’re wearing a lot of shorts out there. I am going on tour with Redgate on all the Pass On Tour events, as well as Pass Data Community Summit. Pass On Tour is, of course, hitting New York City, Dallas, and Utrecht. And Pass Data Summit is, of course, hitting Seattle, November 17th to 21st. So, get in on those if you are in the nearby area. But let’s go answer some questions now and let’s get through all this stuff. All right. First question. Oh boy. Hi, Eric. Can you demonstrate or share topic about HA setup with two database node and one AG listener? Hell no.

I don’t ever talk about that stuff. I don’t care for AGs. I don’t care for boring setup tasks like this. I don’t care to do walkthroughs. There are other people who do. God bless them. They like this stuff. I am all query performance. That is not something I want to get into. All right. Next up. You made a comment recently on L.I., I believe that’s LinkedIn for the folks at home, about switching from int to big int not being as easy as it sounds with compression tricks.

What problems have you hit? Well, if you saw me make that comment, then you may have seen the video link that I put in there about my thoughts on it. The biggest problem is that, you know, every blog post that tells you about this one weird trick, it’s like a table with like just a clustered and not a clustered primary key, but a clustered index on it. That’s the only index. And it’s just like, oh, magic with row compression or page compression. I can just switch from int to big int without doing anything.

The problem is most of your tables aren’t going to have just one index on them. And even if you rebuild all your indexes with row or page compression, if like you like, you’re going to have all these weird object dependency issues that you have to sort out. So it’s like not only do you have to like, like either drop off a bunch of indexes that that that column is a part of, but like, you know, anyone’s remaining, you have to add, add the compression to, and then maybe you can do it all very easily online. But most people don’t have that simple of a setup. So it’s like, it’s near useless.

I really wish Microsoft would just make this a metadata only change where new pages assigned to the table would be able to use would be designated with the big int. And you could either like rebuild the whole thing later to align everything to a big int or just let new pages come in with with that thing as a big int. But, you know, instead, we got fabric, stupid ass fabric.

Here’s a neat question. If you worked at Microsoft, what would you do with SQL Server to make it competitive with Postgres? Well, you know, like the reason why Postgres is competitive is because it’s free.

Not a lot of people are like feature shopping Postgres and being like, wow, like this, this spec sheet just beats the pants off SQL Server. Or let’s, let’s go with this, like, you know, like, let’s go with this champion work, like racehorse. Most people aren’t go, aren’t, aren’t, don’t, don’t, don’t do that.

Most people are like, does it tick enough boxes that I can use it as a data as a relational database? Yeah. Oh, okay. Great. You know, cool.

Let’s, let’s, yeah. Great. It’s got tables. It can select data from them. It can insert, update and delete. Yeah. All right. Great. Like, what else do I need? Like, you know, SQL Server is expensive, right?

Standard edition, $2,000 a core and Microsoft still hamstrings you with crappy hardware limitations. Enterprise edition is seven grand a core. And, you know, God almighty.

Look, if you look at Microsoft’s pathetic offerings for like platform as a service SQL Server, what are these VMs? Like even with that new knob for like adjusting memory on its own, like you’re, like you still have, like you can still only make it in like one of the available SKUs. Right?

Like you can’t like, like you can’t handcraft it. You can’t get a server with like eight cores and like 560 gigs of memory. Speaking of which, 560 gigs of memory is still the top one. You can’t get a terabyte of memory for SQL Server.

It’s like Microsoft doesn’t even know its own product. It’s like, what are you doing? But like if I work there, you know, obviously like the big things with SQL Server that, you know, people complain about are like the sort of like half-assedness of a lot of the things. You know, like a lot of features that have been around for a long time that, you know, got pushed to like 75, 80% done and then never touched again.

Aside from like maybe bug fixes. T-SQL like hardly sees any improvement. You know, we got all this vector crap, but like T-SQL itself like barely got any like meaningful improvements for SQL Server 2025.

Like there’s just so much stuff that, you know, that other databases are eating Microsoft’s breakfast, lunch and dinner on like, like from a development point of view that that’s really, that’s really where they ought to focus. Or that’s really, rather that’s where I would focus is, you know, just kind of like the modernization of T-SQL. And I think, you know, if like, if I, if I really wanted to get in there, I would, I would make query store much more of a, of a monitor, like a legit monitoring repository than it is.

You know, right now it collects query performance data, but it doesn’t collect stuff like blocking or deadlocks. It doesn’t collect any like, you know, like, like it doesn’t collect weight stats. Well, I think the nice way of putting it, you know, there’s, there’s a lot of stuff that, you know, like as a consultant or, you know, like, you know, you as a DBA might monitor and look at in SQL Server that is not centralized in any way that makes it easy.

Uh, and I don’t understand why Microsoft doesn’t try to make it easier for like, like, let’s just say the average person who’s like, I don’t know what I’m doing. Please just give me an answer. Like, just give me something like, like, like has to now become an expert in like five different things and like learn like XML querying to like, you know, get any good data out of, out of SQL Server.

So like, you know, there’s, there’s all that stuff there, but as far as like what, you know, competing with Postgres, because it’s hard to compete with free, especially hard to compete with free when like free is out featuring you on a whole bunch of stuff. So, uh, you know, but Hey, we got fabric.

Uh, God almighty. These people never learn. All right. Uh, do you ever offer discounts or on consulting or just training? Uh, I am a wheeling and dealing man.

And if, uh, if, if, if you, if you email me directly and you want to, you want to wheel and deal on, on pricing for things, I will wheel and deal all day. I’m happy. I’m happy to do that.

Uh, I, I, I do. There are limits to how much wheeling and dealing I will do, but I will always, uh, wheel and deal. All right. We have another, another Postgres question here.

Jeez Louise. Uh, do you plan on offering Postgres training or content? So here’s the thing. Uh, I could certainly play with Postgres enough to learn stuff about it and produce training content on it. However, you know, part of the comfort there is part of the comfort that I have with producing the training content on SQL Server is, uh, experience.

Right. Right. So like, you know, I’m, I’m not run, like, I’m not obviously not running Postgres and production anywhere.

Uh, I’ve never worked with Postgres live in production and, you know, there are certainly, you know, like. Things that you can still be technically correct about and you can still like read the documentation and maybe put together in a friendlier way. Or even offers like, well, if you use SQL Server and you want to learn Postgres, here’s A and B, like stuff like that.

But, you know, um, you know, especially from a, like a performance tuning perspective, um, you know, the, the type of stuff that you, like the type of stuff that makes for good content is often the type of stuff that you figure out. Uh, working with like real production workloads and seeing real meaningful differences there. Um, you know, that’s also, you know, where you learn a lot of the sort of, uh, limitations, restrictions, uh, you know, just stuff that, um, stuff that is not easy to find just running a query and like, you know, on a server, uh, you know, type, type things.

So, you know, I’ll never say never, but, um, you know, like it’s, it’s, it’s something where, uh, I would, I would need to be exposed to it in like a, a, a realistic way. I think before I felt confident in thinking that I could train others on like how to, how to, how to do things best with it from a, from a performance point of view. All right.

Well, that’s five questions, I think. One, two, three, four, five. Yep. We did it. Hey, good job us. All right. Thank you for watching. I hope you enjoyed yourselves. I hope you learned something and I will see you over in the next video where I don’t know, whatever it is, figure it out when we get there, I suppose. All right.

Thank you.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Learn T-SQL With Erik: NULL Trickery

Learn T-SQL With Erik: NULL Trickery


Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Learn T-SQL With Erik: DISTINCT FROM?!

Learn T-SQL With Erik: DISTINCT FROM?!


Video Summary

In this video, I dive into the exciting new `DISTINCT FROM` syntax introduced in SQL Server 2022, which simplifies how we handle nulls and equality checks in our queries. While it might not be as revolutionary as some of the other features that have been added over the years, it certainly offers a more elegant way to write certain types of queries. I walk through an example where `DISTINCT FROM` is used to replace clunky syntax for checking null values, demonstrating how it can improve readability and potentially performance by making us eligible for hash joins. Additionally, I explore its potential use in handling optional predicates, highlighting both the benefits and limitations of this feature.

Full Transcript

Erik Darling here, Darling Data. Woo! Look at those, look at those Darling Data colors, those beautiful teals. I wish that I could find adidas t-shirts with that same teal on the logo. Then it’d really, really be a company color man. Anyway, in this video, we’re going to talk about something, I guess, sort of new. I say sort of new. It came up with SQL Server 2022. And it is the, is it, well, the, the, the, distinct from syntax. So there is, is distinct from, and there is, is not distinct from. And we’re going to talk about that one here. Uh, even though, uh, you know, uh, SQL Server 2025 did not add anything nearly as cool as, is, is distinct from, even though distinct from is, is only kind of cool, uh, for, because like, when, I don’t know, we’ll, we’ll talk about in the video. Anyway, uh, all this course material is available at the presale price. And if you have a price of $250, it’ll go up to $500 at the end of the summer. So hop on that potato now before, before it, uh, gets, gets old and soggy. Cause you don’t, you don’t want to eat that. Anyway, uh, let’s talk about, uh, distinct from. So, uh, this, this query, if you watch a lot of my, if you watch my other videos, this query might look sort of familiar. I talked about, um, something like this in a video the other day, uh, where we had to write the query like this.

In order for it to be eligible for a hash join, right? So we write this query and we run this query. SQL Server will count, uh, will join the table, uh, post to the table comments on owner user ID being equal to user ID where, uh, owner user ID is null or they’re both null, right? Which is kind of nasty, clunky syntax. What distinct from allows you to do. And why, why it’s sort of cool is because you can just do this instead.

So you can say, select count from post join to comments on owner user ID is not distinct from user ID. And the distinct from stuff handles nulls without you having to write a whole bunch of extra or predicate nonsense. So that’s, that’s kind of nice.

And if we look at the estimated execution plan, we will see that we, we, we, we do, we are indeed eligible for a hash join. And if we run this, we will eventually get correct results back. And, and that, that is the same number that we got back last time.

So that’s, that’s, that’s neat and dandy there. Uh, prior to this, if you didn’t feel like writing all this wacky syntax for that, for, to check for this, you could write the query with a join on exists.

The number of people who I’ve met in my life who are aware that you can use an exists predicate in a join clause. It’s very small. The number of people who I know who are aware of the fact that you can write an exist clause with select a column with no from clause intersected to selecting another column with no from clause is even smaller.

In fact, the, the, the Venn diagram of people who know this, that this is a thing at all. And the people who need to buy, uh, my course content, uh, is, is actually a complete overlapping match, right? There, there are like you, you, you are in dire need of, of learning.

Uh, maybe not this specifically, but just T SQL in general. So if we, if we look at the query plan for this, we will see that we are still indeed eligible for yield hash join. This is just fine, right?

This is just an alternate way of expressing this. But this is just another one of those things where, uh, handling nulls gets very, very tricky in, in databases in general. SQL Server is not like a weird outlier in this regard.

Uh, if you look through anything about Postgres, Oracle, DB2, Snowflake, DuckDB, any of the new cool kid databases, what’s the other one? Red, Red Shaft or something. Uh, they all have the same weird null stuff going on that you have to be aware of.

So, but there are ways to write queries that are a little bit more, oh God, am I going to say it? They’re a little bit more elegant when it comes to handling nulls. So you could explicitly say join on this, or this is null and this is null.

Or you can say join on this is not distinct from this. Or you can say join on exists, blah, blah, blah, blah. Now, when I was initially talking about distinct from, I was, I was talking about how like, you know, like it’s only sort of cool, right?

It’s nice that it, you can do that with it. You can shorten up some queries. You can still get, you can still get good performance and all the other stuff.

But like one of, like one of the common things you’ll see when people start talking about, um, is distinct from, or is not distinct from, is the ability to sort of like handle optional predicates with it. So like, or optional parameters. So like, you know, like this, this right now, this is 22656.

And if we run this, we’re going to get this, this number back, right? And like, it’d be actually grab the execution plan for this. And if we look, this is, you know, we scan this table and we look at some stuff.

And like, you can, what you can do is you can say, well, null here, right? And you can say, oh, well, if this is null, then, you know, we’re going to do this. And, you know, you can put in is not distinct from, and you can do this, all right?

You can get some rows back. And what is neat is that when, with, you know, you can get a seek from some of these plans. And if we put 22656 back in here, then we’ll, you know, be eligible to, you know, get going and seek to some stuff.

But, like, the reason why I think this is only sort of cool is because, like, when you see something like this and you’re like, like, your mind immediately starts spinning and turning. And you’re like, oh, my God, I can go replace all these, like, optional parameter value searches that I have with, like, is distinct from or is not distinct from. And my life can get a lot easier.

The problem is that this only works like an equality predicate, right? And as soon as you get into adding, like, trying to, like, get, like, and or logic correct with this, you stop getting the results that you think you would get back. Like, stapling more stuff onto this gets real, real, real tricky to the point where I’m not even sure that, like, it means the same thing no matter how hard you try.

So, like, you know, like, think about all those optional parameter predicate queries you have where, you know, you might have some equality predicates in there, but then you’re also searching on, like, a date range or, like, a like or, you know, something like that where it’s not just a direct equality. This stuff just doesn’t handle those cases. So, it’s, if you truly just have a query that looks like this, and sometimes, you know, user ID will be null and someone wants to return, like, everything, and sometimes user ID will have a value and they just want to return results for that value.

Cool. I guess you can use it there instead of doing, like, you know, one of the billion other things that you can do that don’t require SQL Server 2022 to do it with. But, you know, I find the number of those cases to be very, very, very small. So, you know, when I said it’s just, it’s kind of cool, yeah, it is kind of cool.

It’s worth knowing about because it might save you some time and trouble for some other queries. But for this specific thing, it just doesn’t scratch the itch that I have. Anyway, that is just a little bit about Distinct From.

Thank you for watching. I hope you enjoyed yourselves. I hope you learned something. And I will see you in the next video where we will talk a little bit more about T-SQL. All right.

Thank you.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Learn T-SQL With Erik: UNION and UNION ALL

Learn T-SQL With Erik: UNION and UNION ALL


Video Summary

In this video, I dive into the world of T-SQL by exploring `UNION` and `UNION ALL`, two powerful set operators that often confuse SQL Server developers. I walk you through practical examples using tables filled with datums (or numbers), demonstrating how `UNION` returns unique values from both result sets while `UNION ALL` preserves all rows, including duplicates. We also delve into the query plans generated by these commands, showing how SQL Server processes and deduplicates data to optimize performance. Whether you’re new to T-SQL or looking to deepen your understanding of set operators, this video offers valuable insights that will enhance your skills. If you want to explore more, consider purchasing my presale course for just $250—significantly cheaper than the full price after summer!

Full Transcript

All right, it’s time to learn T-SQL with Erik. That’s me. This is the teaser content for my learn T-SQL course. Again, this is all in the presale price of 250 bucks. It’ll go up to 500 bucks when the advanced material is written. So that’ll be after the summer. So if you buy it now, you’ll spend less money than if you buy it later, which is a very, very good deal. Right? Just like, just because. Anyway, in this video, we’re going to talk about union and union all because everyone sort of gets this weird and wrong and there’s nothing worse than being weird or wrong. Nevermind being weird and wrong altogether. So let’s, let’s play with some tables here, shall we? Let’s, let’s create a couple of tables with some datum in them, right? Datums, you know, whatever, whatever you want to call them. Ah, that’s a good sign. Connection is, connection is not recovered, possibly recovered. What happened? I don’t know. I probably restarted my VM. I don’t know. Don’t, don’t, don’t mind me.

Anyway, let’s look, let’s actually turn on query plans here and let’s look at what datums are in these tables. And we’ll see that in the first table, we have the numbers one through five, but there are duplicates, right? There’s two ones and two twos and two threes and two fours and two fives and a rogue zoom it being weird on me, right? But we have doubles in here. And then in this table, we have nearly the same data, except we have added two sixes to the end. Oh dear me. What are we going to do? What are we going to do with these extra, these spare slack sixes running around?

Well, the fun thing about union is that union will return unique values from both tables, right? But, but only, it will be fully deduplicated down to just, just unique rows. So if we run this query and we look at the results, we just get the numbers one through six, right? The numbers one through five were in both tables, but we deduplicated that down to just one, two, three, four, five. And then in the second table, we had two sixes, which got narrowed down to just six. So there doesn’t have to be a match between the two tables to union or union all, right?

Like that, like matching the rows doesn’t matter. It’s all just about like returning unique rows from both result sets. Now, if we change, we actually just look at the query plan for this real quick. We’ll see that we scanned each one of those tables, right? They’re just tiny little temp tables. So there’s not really much worth putting an index on.

And then we concatenate both result sets together. And because we have 10 rows here and 12 rows here, then we end up with 22 rows here. But then we have this distinct sort kick in and that brings us down to six rows, right? So this is where the deduplication of everything happens from both inputs.

We can change those queries a little bit, right? And this is logically the same query, right? If we say select distinct union and then another select distinct, union would already do the work of making both result sets distinct. But we’re going to see a slightly different query plan here.

If we run this and we look at the results, we’ll see that we got the same thing back, right? We got the numbers 1, 2, 3, 4, 5, 6, just like last time. But the way SQL Server went and did this is a little bit different.

Notice here that we have two distinct sorts where SQL Server actually before we went and did this merge join union, right? And this is where SQL Server broke things down to just six rows. So we start with 10 rows here. We start with 12 rows here.

We make each of the result sets unique for itself, right? We get down to five here and we get down to six here. But then this just returns the six unique from between those two, right?

So remember, the first one is going to be 1, 2, 3, 4, 5 after we distinctify it. The second one is going to be 1, 2, 3, 4, 5, 6 after we distinctify it. So when it goes into the merge union, it’s 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 6.

And then the merge union takes care of the rest. How you see, how this query plan shows up is going to depend on what your data going into things looks like. If you are selecting a primary key or if you are like another unique column or you put like a literal value in there, you may not see the early sort distinct or other aggregation that would remove duplicates or like do any grouping on the first result set.

So like this really depends on what the data looks like before when like when you apply that when you say select distinct from it, it really depends on what that data looks like and what guarantees SQL Server has about that data. So you might not see this, but if you do, it might be kind of a reasonable performance benefit to think of this because rather than making the final operator do all the distinctification, right? Making like using union to make each thing unique, you might get like we’ve talked, like I’ve talked about in a million videos.

Sometimes early aggregation pays off because you break up the work of doing aggregates that SQL Server would do later over a much larger result set. So sometimes this can pay off from a performance point of view, even if logically both of these like either doing union or select distinct union select distinct is logically the same as select union select, right? We don’t need the distincts because the union implies that everything will be unique.

Sometimes the distinct or like group by or something else going before we do the union would be useful from a performance point of view. And of course, union all does not do anything to remove duplicates. If we run this query, we will get everything back, right?

I’ll fight like from both tables, four ones, four twos, four threes on down until we get down to two sixes, right? So that is everything, which of course is different from saying select distinct union all select distinct, which would just bring us down to two of each because we remove one from each result set. But then the union preserves duplicates coming out of here.

So we end up with one, one, two, two, three, three, four, four, five, five, six, six. And we can sort of validate that looking at the query plan where we go from 10 rows here and 12 rows here down to five rows here and six rows here. But then we have this concatenation operator that just brings the two results together and gives us 11 total rows, which is, of course, one, one, two, two, three, three, four, four, five, five, six.

Anyway, not a whole lot to talk about with union and union all, at least in the teaser material. So if you want to learn more, you should buy the full course, which would give you every opportunity to learn more. Anyway, thank you for watching.

I hope you enjoyed yourselves. I hope you learned something and I will see you in the next video where we will. I think we’re going to continue talking about these set operators. I think the next one is we got intersect, except is not distinct from. We got some fun stuff to talk about.

So I will see you over in those videos next. All right. Thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Learn T-SQL With Erik: Pivot and Unpivot

Learn T-SQL With Erik: Pivot and Unpivot


Video Summary

In this video, I dive into some essential T-SQL techniques by discussing pivot and unpivot operations, key components of data transformation in SQL Server. Starting off, I explain the basics of pivoting data, showing how to transform vertically-oriented results into a horizontal format using aggregate functions like COUNT. I highlight the importance of using derived tables to maintain accurate results when working with pivots, emphasizing that pivots can be sensitive to implicit grouping and can produce incorrect outcomes if not properly managed. Additionally, I cover unpivot operations, demonstrating their forgiving nature compared to pivots and how they handle null values differently. By walking through practical examples, I aim to provide a solid foundation for anyone looking to manipulate data effectively in T-SQL. This content is part of my new course, “Learn T-SQL with Erik,” where you can find comprehensive beginner-level material available at a pre-sale price of $250, with the option to purchase advanced modules later this summer.

Full Transcript

Erik Darling here with Darling Data, and we’re going to do some more teaser-y content from my new course, Learn T-SQL with Erik, in which I teach you all of the terrible things that I know about T-SQL, which is unfortunately a lot. Don’t worry, there’s a workaround. So all 23 hours of the beginner content is out and available. You can start watching that now. The course is available at the pre-sale price of $250 bucks, and that goes up to $500 buckaroos when the advanced material drops after the summer. I do have some things that I must attend to and then get that all wound up for you. So anyway, let’s talk in this video about pivot and unpivot because we might as well talk about something, right? As we’re doing a video about T-SQL. Now, most of the queries that you write will return results that look something like this, where the results are vertically presented and we have a column called type and a column called total type, which I have, of course, aliased as the count of things in here. And then in this column, we have what, you know, the text value of what each type of post is named and account for them. But there may be times in your life when you need to sort of flip this data on its side. And when you do that, that is when things like pivot are very helpful. So what I’m going to do, and I’m going to explain why I have this, this query in a little derived table in a moment, but we’re going to start with this query, which is sort of the foundation of the query up here, just without any grouping or aggregations. And then we’re going to, so we’re going to alias this as PT. And this is going to be, I guess, because it’s like post types or whatever.

And then we’re going to pivot, which we’re going to alias is PV down here. This is our pivot. And what we’re going to do is say, we are going to count post type IDs for all of the post types for all of the things that we saw in that first query, right? So question, answer, wiki, tag wiki, excerpt, tag wiki, moderator, nomination, wiki. I don’t know what half these things are, to be honest with you. But those are all the things that we saw in here, right? All of these things, all this text, we put that down in our pivot. So for all of these types of things that we have in our type column, we are going to count the IDs so that we can figure out how many of these things have occurred. And when we run this query, our results look a little bit different, right? We are no longer working vertically. We are now working horizontally.

And the way that my, I forget what grade I was in, but I had some teacher explain it, where vertical, the V makes an arrow, right? So the V from vertical makes an arrow going like down, or I guess it could, if you flip it, it would be going up as well. So vertical is up and down and horizontal, right? Like this is going across. So that’s how I remember horizontal and vertical in case you need to know where I’m at.

But now we have these horizontal results where each column is labeled with the count for each of the things. So question, answer, wiki, tag, wiki, all this other stuff I don’t really care about. Now, the reason why I put this part of the query into a derived table is because pivot is very, very sensitive. It has this sort of implicit grouping thing going on. So if we just wrote the query like this, and we said, select PV dot star from post types, join to posts on yada yada, and then we pivoted this stuff, we would immediately start to see very, very different results.

And this query would immediately take a very, very long time. I’m going to cancel the execution there because these results are clearly incorrect. But what I am going to do is I’m going to scroll over this way just to show you that the pivoted columns do still come out of these results. But because we are grouping by like every column in the post table now, we are getting woefully incorrect results for all of these.

Like we see some of these are questions and we see some of these are answers, but we are not getting the grouping that we want. And you do have to be very careful of this. So like let’s let’s write the query just in a slightly different way. And what we’re going to do is have a derived table now that selects this stuff.

But now we’re going to look at the score of each one of these things. So we’re going to put score in here and we’re going to group by ID and type here. Right. And when we when we do this and we we write our pivot down this way and we say to sum the score. Right.

Because that’s going to be our aggregate that we that we pivot on. And then we say where type in all that other stuff. The results are going to look a little weird at first. Right.

Like if we if we run this, what do we like? This looks just bizarre. Like we’re just like doing this like stepladder thing down here. And we have all these nulls in the results. And the reason why we have all these nulls in the results, because we have we have selected too many columns in our derived table. If we quote out this ID column, right, we’d say don’t even bother getting that. Right.

We’re not putting we don’t want that anywhere. And then we rerun this. Now we get the sum of scores just without all the nulls in there. So you do when you’re writing pivot queries, you do have to be very, very careful about, you know, what like what you’re doing.

It’s very rare that the table that you are trying to pivot things from is limited to all the columns that you want to group by. And nestling things in a little derived table like this can make your like can make your pivot queries return correct results, which is fantastic. What we want, correct results, sometimes, most of the time anyway. Right.

I mean, we got no lockins everywhere, but hey, we didn’t return any nulls. So be very, very careful, even when you are even when you are using the derived table to only put the columns that you really need grouped by and everything in there. There is, of course, some alternate syntax that you can use if you want to do something very similar.

And we could what we could say is just select a sum and then put a case expression into. Oh, go away, you silly tip. Go and then just put a case expression.

We’ll say, hey, for questions, if post type ID equals one, then add a one. Right. So just sum up that. If it’s not a one, then just add a zero. So we don’t add more to that because we don’t need to do anything.

But this will give us the same thing in there, too. Right. So this will this is equivalent to like the count query that we ran before. So we could, of course, you know, rework this to do the thing with the score above.

But just to give you an idea that there is a different way of writing these queries. There’s not a very much of a performance difference between the two because they both just result in one sort of scan of the post table and then a grouping to do all this stuff. If we come back up to the original pivot query that we ran, we’ll see just about the same thing where it’s just one sort of scan.

Well, this, I mean, obviously has to hit the post types table, too, but it’s just one scan of the post table to do anything. So pivot versus the crosstabs thing, like, you know, there might be some performance difference depending on what you need to do with the queries. So, you know, of course, test both ways if your performance is a big concern to you.

But if you just need to write the query, sometimes, you know, once you at least remember the pivot syntax, it can be a little bit less mentally taxing than writing out all these summed case expressions. Unless you like formulate it with Excel or Dynamic SQL or something. So the next thing to talk about is unpivot.

Now unpivot is a lot more forgiving when it comes to the implicit grouping thing. We don’t have to worry about that so much. The one thing that we do care about with pivot, though, is that pivot will explicitly get rid of nulls. So, like, if we look at what this values clause is returning, there are one, there’s one null in each row, basically, right?

Null, null, null. And if we run this query with the unpivot syntax and we get, we look at the results, we’ll see that we do not retain those null values. SQL Server explicitly filters nulls out of these results, right?

It says no nulls in here. So if you want nulls, you have to write the query a little bit differently. We’re going to use our old friend cross-apply with the values clause. And we are going to put the columns that we want to unpivot on in here like this, along with an alias for them.

And then we want to, or with a name for them. And then we are going to alias the values as call and callValue the way that we have in the table, or rather in the values clause. And then when we run this, you’ll see that this one retains the nulls from the results.

You can, of course, filter those out if you add an aware clause that says where u.call is not null. That’s this thing in here. So we can, of course, get rid of nulls doing this.

But just the basic way of cross-applying values to unpivot things will keep nulls in. So anyway, that’s just about enough pivoting and unpivoting. Again, this is teaser material.

This is a shortened version of the full module of pivot and unpivot. I hope you enjoyed yourselves. I hope you learned something. And I will see you over in the next video where we’re going to talk about more T-SQL fun learning activities. All right.

Thank you.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

SQL Server Performance Office Hours Episode 23

SQL Server Performance Office Hours Episode 23



To ask your questions, head over here.

Can you dumb down what SORT_IN_TEMPDB does? I’ve read the documentation many times, but I don’t get it. Index building uses tempdb even when I turn it off.
My devs use ORMs very heavily. They let them write WHERE IN @Foo and the ORM will convert it to IN (@Foo1, Foo2, …). Is it worth bullying them into using table-valued parameters instead or am I just walking a path that ends in temp tables?
In Query Store I have one query that has two planss. It’s just one Nested Loops join, but one is really fast and one is really slow. What should I look for? They seem identical.
I still don’t get RCSI vs SI. Do I need both? What’s really different?
I want to buy your training but my boss thinks I can learn everything from ChatGPT. What should I tell him?

Video Summary

In this video, I dive into some user-submitted questions on SQL Server topics ranging from index building and TempDB usage to query optimization and isolation levels. Whether you’re dealing with ORM-generated queries or trying to understand the nuances of read committed snapshot isolation (RCSI) versus snapshot isolation (SI), there’s something for everyone. I also address the ongoing debate about whether to invest in my training or rely on AI tools like ChatGPT, highlighting the importance of human validation and technical depth that AI currently struggles to provide. If you have any questions you’d like answered live, feel free to submit them through the provided Google spreadsheet. Whether you’re looking for performance tuning tips or just want to chat about databases in general, I’m here to help!

Full Transcript

Erik Darling here with Darling Data and it is brutally blistering hot outside here in New York. It is like a hundred and something degrees. So if I seem brutalized by external sources, you ain’t wrong. Anyway, we are going to do office hours right now. This is where I go. I answer five. Count them five. That’s all the fingers God gave me. User submitted questions to my little Google spreadsheet. If you want to ask a question for one of these office hours episodes, you can go to this link and you can read some stuff and you can ask a question. If you feel like supporting the absolute insane amount of incredibly high quality SQL Server content that I produce and provide, you can sign up for a membership, you can sign up for a membership. All of all of these things that we’re talking about here are also down in the video description. So if you don’t feel like transcribing from your screen or just guessing how to do something, you can, you can always look to the video description for helpful clickable links. If you need, if you need help with your SQL Server and you’re like, boy, this Erik Darling fellow sure seems like a cool cat. Maybe a cat in a hot tin.

I’m not a cat in a hot tin roof today, but I, of course, am a SQL Server consultant. That is where I make the majority of my money from. I do all this stuff and do it better than anyone else. So we got that. If you would like to buy my performance tuning content, you can do that too. You can get all 24 hours of it for about 150 US buckaroos with that discount code there. Again, full link helpfully assembled for you down in the old video description. Another thing that I have available that you can spend money on in some, some, you know, fun way is my new T-SQL course. Learn T-SQL with Eric. That’s me. All 23 hours of the beginner content is currently published. The advanced stuff is coming after the summer. When it cools down, my brain stops frying. This heat has got right out of hand, as a wise woman once said. If you are coming to pass and attending the T-SQL pre-cons that Kendra Little and I are putting on, you will, of course, get access to all of this companion content for with, with your admission to the pre-cons.

Just a reminder, this is the pre-sale price of 250 bucks that will be going up to 500 bucks when everything is done. So get it while it’s hot or get it while it’s cheap, as they say. Man, do I wish I was out of the house right now? Maybe not in any of these places. I think everywhere is hot. I am going to be on the Red Gate Roadshow Tour this summer. That’s the Pass On Tour dates. New York City, Dallas, and Utrecht throughout the summer and I guess a little bit into the fall months.

All leading up to, of course, the big old Pass Data Community Summit taking place in Seattle this November. So I highly suggest you, if you are in the areas of any of these things, that you show up and we hang out and talk about stuff you’re into. It doesn’t even have to be databases. You can just talk to me about whatever.

Quite frankly, I’d rather talk about anything other than databases. Anyway, let’s answer some questions. And let’s zoom on in here.

And here’s our first lucky person. Oh, you lucky, lucky people. Can you dumb down what Sort and TempDB does? I’ve read the documentation many times, but I don’t get it.

Index building uses TempDB even when I turn it off. So rather than try to dumb down what Sort and TempDB does, because it’s hard to imagine anything dumber than the documentation, let’s focus on why your index building might still use TempDB.

Probably the most obvious one is that you’re using online equals on. That would be the first thing that requires TempDB. The second one would be, if you ever look at the query plan for the index that you’re building, you will most likely see a Sort operator in there.

And if that Sort spills, that Sort will spill to TempDB. So the Sort and TempDB is not talking necessarily about the physical Sort and the query plan, which you can’t help but Sort spills to TempDB regardless of what you choose for a TempDB option.

There might also be some stuff in there if you have lob data involved. But aside from that, I think the online index build is probably the most reasonable explanation for why you are still seeing TempDB usage, despite, I mean, Sort and TempDB being turned off your index build options.

All right. My devs use ORMs very heavily. My rates are reasonable.

I’m just going to say that up front. I’m here. Like, this is all me. So they let them write where in at foo, and the ORM will convert it to in at foo, foo to.

Is it worth bullying them into using table-valued parameters instead, or am I just walking a path that ends in temp tables? So it really depends on what your problem is.

Is there a problem with performance of the end clauses? Is the problem that they generate varying end clauses that all generate different query plans? Are you fighting a plan cache issue here?

If you’re fighting a query performance issue, I sort of doubt. If you’re fighting a plan cache issue, then the table-valued parameter thing will be useful. But if you’re fighting query performance issues, then the table-valued parameter would probably not get you anything all that much better.

It’s just like the lack of column-level histograms on table variables, which back table-valued parameters, would probably lead to some oddities in there. I suppose the nice thing about table-valued parameters is that the table cardinality does get sniffed a bit like a parameter would, which can be good or bad depending on just how many rows are going in there.

But it could be a case, too, where your data distributions are not so rough that using a table variable would hurt you. But, you know, again, without knowing what exact issue you’re facing, it’s a little hard to tell you if it’s worth bullying them into using them.

Of course, temp tables are the solution to many, many paths in SQL Server. So it would not surprise me in either case if you ended up using temp tables instead anyway. So maybe just get them into temp tables.

I don’t know. This might be a thing there. Let’s see what we got here. Oh, I love this question. In query store, I have one query that has two plans.

It’s just one nested loops join, but one is really fast and one is really slow. What should I look for? They seem identical.

Ah, so what you should look at is the nested loops join operator. Just take your MIS and right-click on the nested loops join and hit properties. Over in the properties tab that opens up in Management Studio, you’ll probably see one to two differences in there.

You might see that one nested loop says optimized true and one might say false. That might be one thing that you would see. But the important thing that you will see, the big thing that I find in these cases is that one of them will be using some form of prefetch, either ordered or unordered, and the other will not.

The prefetching optimization happens when SQL Server estimates that 25 or more rows might go into the loop join. And if you get a case where fewer than 25 rows, so like if it’s a, like the easiest way to think about it is maybe a parameter sensitivity thing.

It could also be a table like statistics thing. Or, you know, it could be a table variable. Well, I guess not.

Well, maybe table variable depending on if you’re on 2019 and all that table variable deferred compilation stuff. But, yeah, most likely one is going to have a prefetch optimization applied to it and the other is not. So that would be what I would look for there.

Depending on what you find or depending on what is driving the cardinality estimation there, if it’s a parameter sensitivity thing, you might try optimizing for a value that always gets you 25 or more rows so that you get the prefetch optimization.

Because most likely it’s going to be that the prefetch plan is the faster one. If it is a temporary object, you know, you could always try using some slightly out of warranty stats updates commands to change the row count on your temp table or something.

But, you know, again, my rates are reasonable. All right. I still don’t get RCSI versus SI for the folks at home. That’s read committed snapshot isolation versus snapshot isolation.

Do I need both? What’s really different? Do you need both? No. But you might find a use for both depending on if you’re asking this question, you probably don’t need both. If you’re asking this question, what you probably want is to just turn on read committed snapshot isolation and start removing all those pesky no lock ins.

The real difference is in statement versus transaction level consistency. And the easiest way to think to use hand gestures to tell you what that means is let’s say that this is a begin transaction and a commit, right? The other begin transaction and your commit up here.

And let’s say that there are two select queries within this begin transaction and commit. If you are using read committed snapshot isolation, this is the biggest difference to people who query it. This is not the only difference.

It’s like snapshot can obviously do stuff with updates and there’s conflicts and stuff that can happen. But the most common thing that people who write queries are going to be concerned about is going to be select query stuff. So we have this begin transaction up here and this commit transaction down here.

And we have two select queries. Under read committed snapshot isolation, this first select query will return any versioned rows that it needs to as of the time when that select starts. The second select query will return any versioned rows that are required when that select query starts.

That’s a little bit different under snapshot isolation where, like, you know, again, same deal, like begin transaction, commit, and two selects. This select will run and return any under snapshot isolation. This select will run and start returning any required versioned rows here.

And then the second select will actually use the timing of when that first select accessed versioned rows to figure out when it should return versioned rows from. So it’s statement level consistency, which is our CSI, right, where each statement is consistent. And then transaction level consistency, which is snapshot isolation, which is consistent with when the first query within the transaction accessed versioned rows.

So that’s the that’s the big difference for most people who write queries. All right. The fifth and final question, a humdinger.

Wow. Oh, wow. I want to buy your training, but my boss thinks I can learn everything from chat GPT. What should I tell him? Well, I think the easiest way for me to describe it is that I am a person who produces training.

Right. And when when LL was, you know, there I mean, when LLMs came around and up until today, I have tried to use them at various points to, like, you know, bounce some ideas around because, you know, it’s just me sitting. I am a solo individual person.

And I don’t have anyone to, like, talk to about stuff that I want to do, really, except myself. And so, you know, I’ll talk to the robots and I’ll be like, hey, I want to I want to I want to make some new training. Like, you know, I want to bounce some ideas around.

And I can’t even get chat GPT or like any of the other LLMs to, like, do a good job of, like, outlining good training concepts. They just come up with the worst crap, like like the nonsense gobbledygook that you see people like say over and over again on like LinkedIn, like most prolifically. Like, that’s all they come up with.

They don’t come up with like anything like good or like in depth. They don’t come up with anything that like beyond like surface scratching. And like, you know, they’ll say, you know, 10 things and three of them will be wrong. But like they’re just mixed in with all the other stuff that’s like, you know, either mostly or kind of right or just like situationally right.

So like even for me, like I go through it. I’m like, oh, well, I can’t use this. I can’t use this.

This is wrong. Like skip this. Like so like as a person who creates training content and I like who knows kind of like what good is the stuff that they come up with when I’m trying to get them to like like listen to me about things is not good. So like youth like for a person who needs the training, you would get you would get quite a miseducation in a lot of ways if you were to purely use an LLM unless you have the time to not only like like pester the LLM about stuff, but then go and like validate and verify everything that it’s saying.

So often like the value of the training is that, you know, it’s like mine, like like like I like I write it. I validate the hell out of it, like with all the stuff that I do. And also I get it tech reviewed.

So like I have Paul White like tech review my training so that I make sure that stuff is extra correct. So like like it’s like I’ve already done unpackaged and like like validated and votes for everything that I can like in the stuff that I produce. A normal person doing that would spend way, way, way, way longer getting a lot of stuff wrong and having to fix it.

So I, you know, if you’re if your boss thinks that the chat GPT knows everything, I mean, rude awakening when when that happens. You know, one thing that I tell people is that, you know, if you are if you think if you’re that confident in it, then you should have a conversation with an LLM with something that you. You know, deeply that you like like understand and you you have like a like a great grasp on and see how often it’s it’s you have to you.

You go, oh, no. Oh, that’s not that’s not it either. So, like, you know, that’s pretty much what it comes down to for me.

So, no, like I’m sure that you can, you know, get some stuff out of it. But as far as like, you know, the the depth and the the validation and all the like like the technical correctness of it, that that still does require a human touch. Anyway, thank you for watching.

Hope you enjoyed yourselves. I hope you learned something and I will see you in the next video and of course the next office hours. All right. Thank you for watching. Thank you.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.