Column Store, Delta Store, and More

Column Store, Delta Store, and More


Video Summary

In this video, I delve into some fascinating aspects of columnstore indexes and the Delta store in SQL Server, showcasing how data is inserted and managed within these structures. I walk through an interesting demo where I create a table, insert rows multiple times to observe changes in the Delta store and columnstore index, and discuss the tuple mover—a background process that compresses row groups but can sometimes disrupt my carefully planned demonstrations. Despite this little hiccup, the video offers valuable insights into how SQL Server handles data insertion and compression, providing a deeper understanding of these crucial components for database performance tuning.

Full Transcript

Erik Darling here with Darling Data, celebrating another beautiful Friday in the life of a SQL Server consultant. Lord knows why. In this video, we’re going to talk about some stuff with the columnstore, the Delta store, and a little bit more. This video doesn’t have one sort of overarching point to it. It’s sort of a few different spectacles, all in one lovely demo. So, we’re going to initiate ourselves, we’re going to avail ourselves of some interesting stuff. But before we do, I need to harass you about some other things, some financial priorities. If you need help with SQL Server, I am, like I mentioned, a SQL Server consultant. I specialize in SQL Server performance tuning, and you can hire me to do that for you. I will make your SQL Server faster in exchange for money. There’s a link down in the video description that offers you one of many ways to hire me to do that. There are also links down in the video description where you can buy my training, which is always reasonably priced. You can also choose to support this channel with money, as low as $4 a month, you know, pre-tax in my pocket, in order to keep me doing this sort of thing.

You can also do some other stuff, like ask me office hours questions. I do enjoy an office hours question, or five, every Monday. And of course, if you enjoy this content, you should probably like, subscribe, and tell the giant, massive friends who hang eagerly on your every word about this channel. So, I will be out and about in the world, oot and a boot for the Canadians. Data to Nashville, March 6th and 7th. And data Saturday, Chicago, March 13th and 14th. I will have advanced T-SQL pre-cons at both of those.

So, please do, please do show up. Because if you don’t, I will cry and drink and it will just be a real meltdown nightmare disaster on stage. And you don’t, you know, I mean, maybe you do want that. Depending on, depending on how much you’ll, you may enjoy that actually. I don’t know. But anyway, let’s go, let’s go talk about this columnstore stuff. So, what I’m going to do is, I’m going to do, well, of course, I’m going to drop a table if it exists. And I’m going to create a table. And then I’m going to load some initial data into a columnstore table.

And then, just for fun, what I’m going to do is, I’m going to reload those same rows into the columnstore four times. Right? So, we’ll say go three here. And every time we run this, so go is fun because it’ll start an execution loop when you put a number after it. And so, every time we insert rows into this table, we’re inserting a larger number of rows. Right? So, it’s like another million and then two million and then four million go into this table.

Now, something kind of interesting happens on this insert. And if we did this insert differently, and I’ll describe what differently means in a moment. But when we do this insert, SQL Server chooses to take data from a columnstore, spool it into a row mode eager spool, and then insert it into the columnstore. Right? So, this is, again, row mode. So, SQL Server unbatches all those rows to insert them into the table.

If we put that data into a temp table or even a temp table with a clustered columnstore index on it, SQL Server would not unbatch it. But because SQL Server row modes all that stuff, we end up with a bunch of sort of Delta store data. Right? We can see we have this clustered index.

So, the reason why this is funny is because if you go to the documentation for the view over here that we’re looking at, Sys.dmdb index physical stats, it’ll say, where is it? Down here at the bottom, maybe. Oh, no. It was up higher. This thing is lying to me.

Limitations. Does not return data for clustered columnstore indexes. Right? And, well, I mean, that’s true. There’s no data about the columnstore index in here, but there is data about the Delta store, which is a clustered index that sort of brain leaches onto the clustered columnstore and shows us uncompressed stuff.

Now, SQL Server does have a background process called the tuple mover. That’s when we ran that that time. There were fewer rows in there. So, SQL Server does have a background process called the tuple mover, which will, in the background, go find open row groups and attempt to, sorry, close row groups and attempt to compress them down.

So, when we look at the table now, we see we have some of these compressed. We have one of these that’s open. We have three of these that are still closed. Now, that’s all a side effect of, again, the row mode eager spool between the table and the insert.

So, now, if we run an alter index and we say compress all row groups, well, this will run for a little bit. And then, if we look back at this, now we’ll have these two tombstones in here. So, SQL Server is, like, sort of, like, compressed a bunch of other stuff, but left these two tombstones around.

Right? So, if we go and look at, like, what’s in here now, we go look at the physical stats for this. Now, we just have this one thing remaining. Maybe the tuple mover kicked in and did something. I don’t know. Maybe the tuple mover is ruining my demo.

Yeah, it is. See? Now, those two tombstone row groups are gone. Normally, I would have to run reorganize again a second time to force the tombstone sort of cleanup process to run. But this time, the tuple mover beat me to it.

I just timed this demo terribly because every other time I had lots of time to talk about this stuff before the tuple mover did anything. So, now this reorganize won’t do anything. But I do think that there is one interesting element to this in that if we look back at the physical stats for the columnstore thing, we still get one row back from here.

And there are a couple columns in that view that do reference columnstore data. Apparently, this is non-clustered columnstore stuff because, like, this just says not valid for the clustered columnstore one. So, I guess non-clustered columnstore, it does show more.

But what this is showing us now is an empty delta store. All right? Because we have no pages and we have no records in there. All right?

So, this is the empty delta store. SQL Server doesn’t remove the delta store entirely. It just leaves this empty thing hanging about just in case we need to put more data in there. If I ran, like, a bunch of inserts again to this table, then I would end up with probably rows in the delta store as long as SQL Server continued to choose that eager row mode eager table spool in between the insert and the select. So, as long as we still have this thing unbatching rows, then we end up with a delta store.

Even though we are inserting a number of rows that qualify for compression. So, that’s fun. Anyway, I found this all terribly interesting and also terribly annoying.

And despite the tuple mover kind of ruining one of the highlights here on needing to do the double reorg, I’m still okay with it because, you know, at least we got to see the tuple mover work in action, right? We got to see that it actually does something.

It doesn’t just sit there idly waiting for us to do things. It wakes up about every five minutes or so and says, Oh, look at all these row groups. They’re closed.

I got to compress them. And it does that. You can turn that off, but we’re not going to talk about that. Anyway, it’s Friday. I’m tired. I don’t want to do this anymore.

So, thank you for watching. I hope you enjoyed yourselves. I hope you learned something. And I will see you back Monday with our usual office hours thing that we do on Monday. Because that’s what we do on Monday.

All right. Have a great weekend.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 25% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.

Merry Christmas from Santa and Definitely Not Erik

Merry Christmas from Santa and Definitely Not Erik



Would you believe Erik Santa forgot to turn off this coupon?

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 25% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.

More LOBs, More Locks (in SQL Server)

More LOBs, More Locks (in SQL Server)


Video Summary

In this video, I delve into the intriguing world of large object (LOB) data and its impact on locking in SQL Server. Erik Darling from Darling Data shares his insights by demonstrating how incorporating LOB columns like text or ntext can significantly increase locking problems, particularly through examples involving exclusive and shared locks. By exploring a specific update query that targets a single row with a LOB column, I illustrate the complexities of lock management and highlight the importance of proper indexing to mitigate these issues. This discussion is not just theoretical; it’s practical advice for anyone dealing with large data types in their SQL Server databases.

Full Transcript

Erik Darling here with Darling Data, feeling the spirit of the season upon me. And what I want to give you is the gift of lob data. And tell you a little bit about how it can increase locking problems in SQL Server. Shocking, I know. Down in the video description, you will find all sorts of helpful resources to give me money. If you want to hire me as a consultant to make your SQL Server faster, you can do that. If you want to purchase my training material, let’s say you want to learn all the things that I know, and you want to go forth into the world and tune SQL Server on your own, you can do that. For a very reasonable price. You can also support this YouTube channel, Ask Me Office Hours Questions, and click on the requisite buttons to like and subscribe and share with friends all of this wonderful, magnificent, free content. content about SQL Server. Isn’t that just grand? The spirit of the seasons. The spirit of the season is upon us, my friends. Once the spirit of the season leaves us, and the new year begins, and the earth, at least in this hemisphere, begins to thaw, I will be leaving the house once again, like the spring flower that I am. I will be at Datatune Nashville, March 6th and 7th.

And I will be at Data Saturday Chicago. So much data. So many datas. So little time. March 13th and 14th. I will have advanced T-SQL pre-cons at both of those. And I would suggest that you come to those if you want to learn some crazy T-SQL stuff. Because that’s what I do. That’s what I do. That’s what I’m into. That’s my fetish. Crazy stuff. Anyway, let’s talk about lobs and locking. Now, over here, we have an update query. And this update query will seek to update a single row in the post table, where the ID, which is the clustered primary key of the post table, equals 1169.

All right. And we’re going to use my table valued function, what’s up locks, which is available at my GitHub repo. It’s at code.erikdarling.com. I actually had to do a big rewrite of this before this because nothing was wrong with it normally. But for this demo, performance was not good. But we can save that for another day. So let’s go update that single row. And let’s look at how involving lob data in a query makes for some interesting locks.

Now, if we look down in the results of what’s up locks, we will see that request session ID 57, that is this SPID, right? That is us here. It has some intent exclusive locks. But more importantly, it has this exclusive lock at the key level. And it has just one of them, right? So there’s just one total row lock here, right?

So what we’re going to do now is come over to this window and look at when some different queries get blocked up here. So if I select the top 100 rows from the post table, where ID is greater than 900, this will actually start returning rows. All right. But this will get stuck at a certain point. Like if we scroll way down in the results, we have returned up to row, we have returned 90 rows and we have stopped at 1149.

But now we are blocked. And if we come look at the locks that my query is attempting to take, we will see that this one shared lock is being blocked, right? This is the one that’s waiting, right? So we do not, we are not able to get this lock. So this query was able to read up to a certain point until it hit this one lock and now it is stuck, right? Because we have not committed that other transaction.

So this one gets stuck there, right? Let’s say that now we want to order by score, right? So now we are ordering by the score column. Things will get blocked in a far different way, right? This one does not even start returning results. And part of the reason why this one does not start returning results and part of the reason why I had to tune WhatsApp locks pretty heavily this morning and still add a bunch of query hints to it so that the optimizer stopped doing, gosh, dumb things all over the place.

Sorts, stream aggregates, merge joins. It was a bad scene, man. This thing was taking like 40 seconds to run. I got it down to about five seconds, which is like, like glory, hallelujah. It’s a Christmas miracle. How dare you? But look what happens now. This is the important part, right?

It’s like we have this, this thing down here that’s waiting, right? But look at all of the lob locks that have been taken, right? All these shared locks. We now have 417,000 Hobbit lock counts, heap or B tree locks.

And that, that results in, well, I mean, 4,000, 4, 417,000. Wait, is that 417,721? Oh, oh my. That’s a seven digit number. 4.1 million page locks. And this is because SQL Server is now has to follow all of the sort of the lob pages for the body column. All right. And we can, we can prove that’s true because if we, if we change the query a bit, right?

If we cancel this thing, cause that’s going nowhere and we say, select the top 100 and we, we leave the body column up. We’re just going to select the store, the score column. Now, if we run this, this gets blocked like in the, in the same way, but we don’t take that for those 4.1 million locks in here, right?

There’s no 4.1 million number in that one. So, so just selecting that lob data will contribute to additional like locking stuff in shared locks needing to happen. Now that would also happen if we just had body in the where clause. If we said where body is like, you know, just a single character, just find me any body, any body will do.

This thing will go back to taking like five seconds again. And we end up, well, this time we end up with, let’s see, 1720804. So still a seven digit number of 1.7 million heap or B tree lock counts. So just involving lob columns anywhere in your queries can lead to increase locking requirements and increased locks and other unpleasantness, probably blocking and some other stuff.

We could of course, uh, ameliorate. I think that’s, I think that’s the appropriate word for what would happen here. We could solve the problem. I, I ought to just speak plainly sometimes. Uh, if we had an index, uh, like this, that we are, we could return the data.

We would have far fewer problems going through things. Um, so like we would, like we would have the, like be able to sort the data by score and then find whatever IDs we care about. Everything would be fine. We wouldn’t even necessarily need to have ID as a second key column here because it is the clustered primary key.

And it would end up there anyway, if we just had a single key column on score. But then someone, someone might come along and fiddle with our index and add other columns in here and screw us all up. So sometimes it’s best to be explicit about, uh, the, the order of columns when we have even a clustered primary key involved.

And we want that to be where it is for a reason. Then we could maybe add other columns over here. Like, I don’t know, accepted answer ID without messing up the order of the initial keys that we, we want in there.

Anyway, strings were a mistake. Big strings were an even bigger mistake. And as much as possible, you should avoid max data types in the database.

There are a bevy and variety of reasons for, uh, for avoiding them. Uh, and I’ve talked about many of them, but this is, this is yet another one. Anyway, thank you for watching.

I hope you enjoyed yourselves. I hope you learned something and I will see you maybe tomorrow’s video. I don’t know. I might, I might, I might take some time off for Christmas, New Year’s and stuff. We’ll see how it goes.

No promises. But if, if there, if there is a video tomorrow, I will see you in it. All right. So, thank you for watching. Thank you for watching. We’ll see you next time.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 25% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.

Query Store Hint and Plan Guide Annoyances in SQL Server

Query Store Hint and Plan Guide Annoyances in SQL Server


Video Summary

In this video, I delve into some of the frustrations and annoyances associated with query store hints and plan guides in SQL Server. I explore how these tools can sometimes hinder rather than help, particularly when trying to override certain behaviors or improve performance. For instance, I demonstrate the quirks of using a `NO_WAIT` hint in a transactional context and highlight why Query Store’s inability to support table hints is such a significant limitation. Additionally, I discuss the cumbersome nature of plan guides, especially their requirement for maintaining semantic affecting hints that might be detrimental to query performance. These issues underscore the need for more robust and flexible tools within SQL Server to better meet the diverse needs of database administrators and developers.

Full Transcript

Erik Darling here with Darling Data. And today we’re going to talk about some of my annoyances with query store hints and plan guides. If you have a problem with that, you can leave. Before you leave, before I talk about that, there are some links down in the video description. If you want to hire me as a consultant to make your SQL servers faster, in exchange for money, you can do that thing. You can also buy my training if you want to try to do that on your own, learning from me, because I’m good at it. You can also support this YouTube channel because, I don’t know, I mean, I guess the camera’s already paid off, but, you know, these ring lights, electricity ain’t getting any cheaper, right? Something like that. If you want to ask me office hours questions, I do that every Monday. I answer five of them, as long as there are five to answer. If there are five to answer. If there are not, I sit in a corner and cry and wonder why no one likes me. But, anyway, I’ll be leaving the house starting in March of 2026, specifically going to Nashville and Chicago, one weekend after the other. Data Tune Nashville and Data Saturday, Chicago. Good times ahead. I will be teaching advanced T-SQL pre-cons at both of those.

So, buy those tickets now, because they don’t last forever, kids. Didn’t always have it this good, did we? Couldn’t just always leave the house and go get some SQL Server training outside. I don’t know, right? Anyway, let’s talk about why these things are annoying. So, that’s not what we needed. We needed Management Studio, not Hyper-V. Hyper-V is an unserious technology for consultants like me. Anyway, let’s truncate this table called bad idea, and let’s stick five rows in the bad idea. And so, what I want to show you here is, I mean, it’s just kind of a funny hint, right? This no wait hint. Under normal circumstances, no wait won’t do anything, right?

But, like, if we, like, say, start a transaction, and we try to insert a row into this table, we don’t close the transaction. Then no wait does kind of a funny thing. It actually just does, like, what a lock, like, the lock timeout thing does just immediately, right? I mean, it still gives us, like, the first five rows, but then on the sixth row, it bails, right? It’s like, ah, there was a lock. I’m gone, right?

So, it doesn’t return results. We could select, sorry, we can’t select ID6, because ID6 is the row that we have locked for the insert, but we could select ID5. Why we have this hint, well, I mean, I don’t really know or care all that much, but we’re here to talk about plan guide annoyances. Now, let’s say that we wanted to get rid of this hint. Let’s say that this hint was ruining our lives.

Let’s say that this hint were completely destroying everything that we believed in as far as SQL Server goes. We’re like, why does this query just, like, just die the second it encounters a lock? Is that, should we be doing that? Is that correct?

Like, maybe we should use lock timeout, make it wait a little bit longer. But, so, Query Store has the ability to add hints to queries via the option mechanism. I don’t know why ZoomIt is being difficult, having some trouble starting up. Apparently, it’s cold.

But, if we were to try to add a table hint, the way that we can add a table hint with plan guides, well, Query Store gives us a rather infuriating error. Query Store says, no, you cannot set query hints, or rather, setting query hints table hint in Query Store is not supported.

Query Store says, no, you cannot set query hints. Well, Microsoft didn’t think very hard about this one, did they? I mean, they thought hard enough to make an error message, but what a giant gap in capability.

Query Store says, no, you cannot set query hints. Someday, competent people who care will work on SQL Server again, I’m sure of it. We don’t know when, don’t know how, but someday it’ll happen for us.

Now, with plan guides, you can supply table hints, but plan guides also have a rather annoying thing with them. Where, if we tried to create a plan guide like this, and the cool thing about like table hints is you can use them to ignore a lot of stuff. Like, if there was like an index hint on this thing, and you were like, wow, that index hint is stupid, you could use a table hint like this, and just point the table hint to the alias for the table, and that index hint would effectively be ignored for the query.

But we can’t do that with like walking hints, right? If we try to run this, SQL Server will say, well, let’s put this in a way that is a little bit more easy to digest when I zoom in. Let’s see.

Cannot execute query. Semantic affecting hint no weight appears in the with clause of the object bi, but not in the corresponding table hint clause. Change the option table hints clause so the semantic affecting hints match the with clause.

Now, I might very well be using a plan guide specifically because I want to override that semantic. Maybe I hate that semantic. Maybe that semantic is hurting things.

Maybe I want to get rid of it. I wish that this thing had like the PowerShell like dash force parameter for it. Because it’s like, no, I specifically want this to not happen.

I’m trying to make this not happen. Why are you making me do this? So that’s my annoyance there.

With query store, it’s just like, no, no table hints whatsoever. With plan guides, it’s like, no, but you need to keep this thing because we say so. And you’re like, no, I want to get rid of this thing.

This thing is actively hurting me. Now, like, you could do that with a query like this. And you could, like, if there was an index hint, you could say, hey, like, let’s ignore that index hint.

That is completely successful. And, of course, we could get this to work with the table hint down here. But it would completely undo what we’re trying to do, which is not have the no wait hint on here.

And this will happen with, like, lots of dumb semantic affecting hints because SQL Server is like, oh, no, the semantics. But you’re like, I want to get rid of the semantics. But you’re making me keep these semantics that I don’t want.

So it would be really nice if with, you know, query store, we had the ability to supply table hints because what kind of half-baked situation is it that we don’t have that? And with plan guides, it would be nice if you could say, no, plan guide, I really do want to override that semantic affecting hint because I don’t like it. I think it’s bad and I don’t want it there.

So that’s that. Neither one of these things exactly, neither one of these plan forcing features exactly covered itself in glory here. Query store, of course, with the completely half-baked query hints and plan guides with not allowing me to get rid of the semantic affecting hint that I intended to get rid of with a plan guide.

So thanks for both of those. Nice Christmas presents. All right.

Happy Merry. Thanks for watching. I hope you enjoyed yourselves. I hope you learned something. And I will see you in tomorrow’s video where I think we’re going to talk about how Wob data makes for some weird locking. And we’ll have a lot of fun with that, won’t we?

All right. Thank you for watching. Thank you. Thank you.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 25% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.

SQL Server Performance Office Hours Episode 46

SQL Server Performance Office Hours Episode 46


  • The missing index DMVs keep recommending the same index with 15+ included columns that would basically duplicate our clustered index width. The equality columns make sense, but I’m suspicious about blindly adding all those includes. How do you evaluate whether a wide covering index is actually better than a narrower index with key lookups, especially for an OLTP workload?
  • I thought table variables were immune to parameter sniffing issues because they’re not parameterized, but I’m seeing wildly different performance for the same stored procedure depending on the data volume in a table variable we populate. The cardinality estimate is always 1 row regardless. Is there a way to get better estimates for table variables, or should I just rewrite everything to use temp tables?
  • We’re getting 1205 deadlocks on SELECT queries that don’t modify data. The deadlock graphs show intra-query parallelism exchanges as the resources. These are reporting queries with MAXDOP 8 hitting a columnstore index. Why would read-only parallel queries deadlock with themselves, and what’s the best way to prevent this without just setting MAXDOP 1?
  • I’m getting sort warnings in execution plans even though the memory grant is 10x larger than what the sort actually needs according to the plan properties. The ‘GrantedMemory’ is 500MB, the sort only uses 50MB, but I still see a sort warning. I’ve ruled out parallelism issues – this happens even with MAXDOP 1. What causes sort warnings when there’s obviously enough memory?
  • I created a filtered index on WHERE StatusId = 2 for a common query pattern, and the index is way smaller than the full table index. But the optimizer still chooses the full table index with a residual predicate instead of using my filtered index. I’ve updated statistics, rebuilt the index, even added query hints – nothing works. What are the gotchas that prevent filtered indexes from being selected, even when they seem perfect for the query?

 

To ask your questions, head over here.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 25% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.

New Course: Get AI-Ready With Erik!

New Course: Get AI-Ready With Erik!


Video Summary

In this video, I introduce a new course titled “Get AI Ready with Erik,” designed to help you navigate the upcoming AI features in SQL Server 2025 and beyond. Given that Microsoft is positioning SQL Server as AI-ready, it’s crucial for professionals like us to be prepared to work with these new capabilities. The course focuses on practical applications rather than deep theoretical knowledge, teaching you how to leverage vector search effectively without getting bogged down by the complexities of underlying mechanisms. We cover everything from foundational concepts to advanced topics such as hybrid search and multi-vector search, ensuring that you can make informed decisions about when and how to implement these features in your production environment.

Full Transcript

It’s Friday and we’re introducing a new course. It’s called Get AI Ready with Erik because Microsoft says SQL Server 2025 is AI ready and you need to be ready to work with AI in SQL Server. Whether you care about it or not, whether you think it’s good or bad or not, at some point you’re going to have to deal with these new features and I want you to do it in the least stupid way possible. The whole foundation of the course is based on this premise. You don’t need to know how it works. You don’t need to know all the crazy math. You don’t need to know all the weird theory. You don’t need to know all the, you know, gears and whistles and bells and stuff that gives people six fingers on the, on the backside. You just need to know how to work with these features in smart ways. That’s what I want to teach you. The whole thing started because I do SQL Server performance tuning and I saw this whole thing, the whole suite of new things hit SQL Server and I said, gosh, the way things are going, I’m going to be a little bit more. I’m probably going to have to do some performance tuning work on this. And it turned out on my path to learning how to do performance tuning work on this. I had to learn a whole lot about how all the stuff worked inside. So that’s my distillation here, how all this stuff works. Now you can work with it too. Vector search is fairly easy to grasp as a concept. It stores these embeddings, which are a series of floats, which might, I guess, store it as a JSON. And then when you change other text into a similar, embedding another array of numbers, vector search can tell you how close, how similar those strings are without a bunch of nasty wildcard searches and stuff, right? No longer just like saying where something is like, you know, percent word, percent word, percent word, or like, you know, just long strings of or clauses doing this stuff. It allows you to sort of search by meaning and intent, not just like, you know, normal keyword searches. So you can find similar things pretty fast, even on pretty big tables.

It’s not perfect though. Details often get lost. And sometimes you still need to do some keyword searching and sometimes you still need quality filters on things to make sure that, you know, not only is the content that you’re providing in your search is relevant, but also that it’s not just like garbage, right? So, you know, that’s no fun. Microsoft is, you know, Microsoft is, you know, of course gets a late pass on this one. That’s, you know, behind the curve a bit on this. Most other, you know, of their competing databases have had this stuff around for a while, but, you know, moving data to another database, as I’ve learned from seeing so many people talk about how they’ll be in Postgres in three to six months and they never seem to get there. Your data is already in SQL Server, moving it off to either a specialized database or a whole other data platform just to take care of vector search requirements is probably a non-starter for a lot of people.

You’ve got everything already in SQL Server. Adding a vector column or a table with a vector column in it beats migrating for most people in the world. Microsoft was late, but, you know, they actually showed up. You have some generally available features available right now and you have a lot more features still in preview.

Will they ever be real? I don’t know. Will Microsoft ever finish anything again in SQL Server? I have no clue, but I can tell you what’s there and what’s not. So this is what the course covers. It’s in eight parts over around 28 videos. It’s about 10, 12 hours of total content.

We cover foundations, practical applications, hybrid search, vector indexes, native integration, managing embeddings, production patterns for dealing with all this new vector stuff. And then some more advanced topics like rag and multi-vector search. It’s a pretty solid course, I think.

And one thing that I want to say here is that, you know, where a lot of this stuff is in preview and like maybe isn’t out yet in SQL Server 2025 is RTM. Like we don’t even have one cumulative update for it as of this recording. I’m going to be adding in stuff as cumulative updates add, you know, substantive changes and new stuff to SQL Server.

So this isn’t like a course that’s just going to rot on the vine. As Microsoft works on this, the course will get updated alongside it. So there will be new content to cover new things as the features move along.

What I hope you get out of this is understanding when vector search makes sense. Because when your boss comes along and says something like, hey, we want to do this and here’s what we want to do it for, you can speak intelligently about it. You have some background.

You have some knowledge. You have information. You can feel empowered to speak authoritatively on the subject. Right. We want to know what’s production ready versus what’s still in preview today, how to monitor and troubleshoot performance issues and sort of like capacity planning and all the other stuff that goes along with, you know, DBA and developer tasks. And we want to know how we can implement this in production and make working with the current set of limitations as pain free as possible.

So right now, if you go to training.erikdarling.com, there should be a link down in the video description directly to this course as well. You can use the coupon code AIREADY to knock $100 off the cost of the course. That coupon code is available for the first 100 people who sign up.

So if you want to be one of the lucky hundred to save $100, I suggest you get moving because things are already flying off the shelves and, you know, I would hate to see you miss out on this wonderful opportunity. Anyway, thank you for watching. I hope you enjoy the course and I will see you, well, when I see you.

All right. Goodbye.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 25% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.

In Defense Of Merge: Non-Deterministic Updates

In Defense Of Merge: Non-Deterministic Updates


Video Summary

In this video, I delve into the often maligned `MERGE` statement in SQL Server, defending it against its critics and highlighting its unique benefits. You might be surprised to learn that `MERGE` can actually help catch non-deterministic updates—something a standard `UPDATE` query won’t do. By demonstrating how `MERGE` issues a warning when you attempt a non-deterministic update, I aim to show why this statement deserves more respect in your database toolkit. The video also covers practical solutions for making these updates deterministic using window functions like `ROW_NUMBER()`, ensuring that the data integrity and consistency you expect from SQL Server are maintained. Whether you’re looking to improve query reliability or just want to understand a lesser-used feature better, there’s something here for everyone who deals with complex data operations in SQL Server.

Full Transcript

It may be very little surprise to you. I am Erik Darling with Darling Data. But what may surprise you, may rock you to your very core, throughout your soul, is today I am going to defend merge statement in SQL Server. I know, I know, no one will ever defend it. Right? People will hate on it and other people will say, it’s not that bad. No one ever defend it. I know, no one ever sticks up for merge. I’m going to stick up for merge today. Because, uh, I think there’s a really interesting thing that merge will do, if you write a non-deterministic update, that a normal update will not do. Maybe it should, because shouldn’t everything in a database be deterministic? Doesn’t, does ACID mean nothing to you people? What’s wrong with you? Down in the video description, which is my favorite part of the video, there’s all sorts of helpful links. Like, uh, ones where you can hire me for consulting, buy my training, become a supporting member of the channel for as little as $4 a month. You too can, uh, I don’t even know what, what that would do. But it’s not about what $4 a month does, like, like on its own. It’s what $4 a month does from all of you in the aggregate. Right? That’s, that’s where things get special and interesting. So, gang up on me. Uh, you can also ask me office hours questions, because I like answering questions.

Especially good questions. If you have a good question, that’s even better. I’ll give you a high five on that one. And of course, as usual, please do, uh, like, subscribe, uh, tell, tell your mom about my YouTube channel. She’ll probably dig it. Uh, I’m a, I’m a, I’m a likable fella. Uh, you know, sometimes, uh, um, you should drink when I do these. Anyway, uh, I’ll be at a couple of events coming up in March, which is, you know, a few months. Two months off at this point. I guess like four months off at this point. But hey, um, that gives you, buy a ticket now. So, you know, it’s like verge of selling out probably at this point. So might as well get in there before, like all the butts are in the seats and one of, none of those butts are yours. Right? So, uh, data tune in Nashville, March 6th and 7th. Data Saturday, Chicago, March 13th and 14th. I’ll be doing pre-cons at those advanced T-SQL. You know, the stuff that will blow your mind. Um, it won’t melt your brain. I hate when people say melt your brain. I think that that’s what the kids call cringe, but you are, you can’t, you can learn a lot from a dummy. So you should, should probably do that. Anyway, we are feeling festive today, ain’t we?

Okay. So let’s, let’s go defend merge. So, uh, I’m going to make a table here, a temp table. Temp table is not the problem. The temp table is just fine. Uh, but the problem becomes when we try to update this temp table, right? So we’re just going to put one row in there because all we need to prove this theory is one single row. Right? Uh, and we’re going to use John Skeet cause John Skeet’s just real easy to pick on stack overflow. John Skeet causes, I mean, for his, I’m going to say he causes a lot of query problems, right? Everything John Skeet gets involved, everything blows up. So the thing here, oh, you know what? I’m in the wrong database. We gotta move that over. There we go. Now it makes more sense. Life is, life is grand now. Right? So let’s, uh, let’s insert that one row again. Oh, professional. And, uh, so the, the point here is, uh, if we turn on query plans and we look at the plan for this query, uh, we will see that this query produces a parallel execution plan. Right? And by parallel execution plan, I mean, it, it looks like this, right? Where there’s, there’s parallelism. So great. Right? Uh, everything went parallel and everyone was happy. So that’s, that’s, that’s one of the issues that comes up here. And the, what that issue contributes to is that if we try to update this repeatedly, right?

If we run this update multiple times, SQL Server will update our table with multiple, with different rows on almost every execution. Right? So if we run this pair of queries together, and wait a second for it to run, it does some stuff. Uh, we get post ID. Oh, let’s do our special zoom here. We get post ID 156426. And if we run this again, the amount of time this takes is better work. Uh, there we go. 157291.

And if we run at a third time, I bet that you’re going to be surprised. We get 148352. This is a non-deterministic update. And by non-deterministic, I mean, uh, we don’t know what row is going to end up. As the update values, right? Now, how I’m here to defend merge and why I’m here to defend merge is because merge, unlike the update from center, unlike the update syntax, merge will, will, will, will warn you about these things.

Merge will say, Hey pal, uh, something’s a muck here, right? So we’re going to merge into our temp table, and we are going to use our select query as the source. And we’re going to say when matched, uh, that didn’t go well. We’re going to say when matched and exists, uh, a difference between the sets of columns in here, then we will update the table with, with those, those columns. And if we run this, let me wait a second or another second. There we go. Look, it happened. Oh, oh, merge.

Thank you. Thank you for caring about acid merge, but you could have been a little bit, maybe pithier with this error that we got here. The merge statement attempted to update or delete the same row more than once. This happens when a target row matches more than one source row. A merge statement cannot update, delete the same row of the target table multiple times. Refine the on clause to ensure a target row matches at most one source row or use the group by clause to group the source rows. Oh, merge. Thank you. I mean, you could have just rewritten the whole query for us by the time you spit that book out. But here we are, here we are. And I mean, sure we can, you know, maybe it’s a sign that we could aggregate some stuff or we should have done things differently, but at least it’s nice that merge tells us like, Hey dummy, something’s not gonna, something’s gonna look weird if you run, if you keep doing this, like you you’re screwing up. Right? So the way that we can fix this and we can, we can use merge here for this is what we can do is we can use row number, the row number function to make sure that we limit this to one row. Right? And, and something that is particularly important here is that we do not rely on what is essentially a non-unique column, right? Because we could have, we could have multiple last activity dates, right? Like the same. We, this is not guaranteed to be unique, that we have a unique tiebreaker involved in our window function. So now if we run this merge and I’m going to, I guess, pause here to point out what’s new is this thing here where we are saying we’re only where row number equals one, right? We can only put one possible thing in here. Merge executes successfully, right? We, we still get a parallel plan. All this mergey stuff still happens, but we no longer get that error. And if we run this, we will get back a completely different row than any of the rows we saw before. So when we made this deterministic, not only did we get the same value updated in this temp table, but we actually got what is probably the correct value for our temp table, which is fantastic. And sure, there are other ways you could fix this query. Every time I point out how to fix something, someone says, you could also, you could also, I know there are other ways you could rewrite this to fix it. You could even rewrite the update query with the row number to fix it.

I, I, I, I understand that. The whole point here is that merge will warn you when you write a non-deterministic update or delete, apparently. According to the error message, deletes also have to be deterministic. So if you write a non-deterministic update or delete, merge will warn you. Normal update and delete syntax will not. Granted, it’s probably less of a big deal with deletes because you’re like, well, it’s already gone. Who cares if you try to do it again, right? But that’s what I, that’s my defense of merge today. You can, you can hate it all you want, but at least it does that for you. Okay. Well, thank you for watching. I hope you enjoyed yourselves. I hope you learned something and I will see you tomorrow for something else. All right. Goodbye.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 25% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.

Love Your Data Types: Dates

Love Your Data Types: Dates


Video Summary

In this video, I delve into the often-overlooked world of date and datetime data types in SQL Server. Erik Darling from Darling Data shares his insights on why these data types are frequently mishandled and how to properly manage them for accurate results. We explore the nuances of date formatting across different languages and regions, emphasizing the importance of using unambiguous date formats to avoid sorting issues and other anomalies. I also discuss Microsoft’s recommendation to use `datetime2` over `datetime`, highlighting its advantages in precision and portability, which are crucial for globally deployed applications. The video covers practical tips on using the `convert` function with appropriate styles to ensure your dates are handled correctly, making it a must-watch for anyone working with date data in SQL Server.

Full Transcript

Erik Darling here with Darling Data. Feeling extra optimistic about this video. Why is my phone being? Anyway, in today’s video, we’re going to talk about continuing to love our data types. And we are going to do that today with dates. So we have much to go over here. So get a coffee or something. Down in the video description, all sorts of ways that you can help me pay for stuff. You can hire me for consulting, buy my training, become a supporting member of the channel. And of course, if you just want to harass me with questions, you can do that. There’s a link. Ask me office hours questions. Without you, we have no office hours. I’m not asking myself questions. That would be weird. I don’t want to think about that world. And of course, if you enjoy this content, please do like, subscribe, tell a friend, all that stuff. We are well on our way to having nearly 7,852 subscribers, which is miraculous in this day and age.

Out in the world, data to Nashville, data Saturday, Chicago, that’ll be out in March. I’ll have some new stuff to add to this slide shortly. There have been some other stirrings and acceptances. I don’t know. I don’t know how I’m going to do that. I might need, I might need two slides to talk about all the stuff that I’m going to be doing. Maybe even three. We’ll see. If I can only fit two per slide, man, we’re in trouble. It’s going to be, it’s going to be a crazy year for darling data getting out in the world. Tell you that much. But for now, let us cherish the spirit of the season and talk about dates.

Now, dates are without question the data type or dates and date times and all that stuff are without question the data type that I see people behave the laziest with. The absolute laziest. And SQL Server is partially to blame for that because SQL Server goes out of its way to make this easy on you. There are all sorts of crazy things SQL Server will do to accommodate whatever nonsense you type into your strings and expect it to infer the date timiness of.

So, first, let’s just start by using this function, sys.dm exec describe first result set. And we have some candidate dates in here. We think these are very datey. We think SQL Server should be able to date these appropriately. But alas, all of these columns come back as being various takes on varchar. We have one that is an eight. That is this one right here.

And then we have a bunch of tens. Just because you type a date or a date time into a string does not mean that SQL Server automatically says, Oh, look at this wonderful date or date time that this caring user typed in here. I will infer it as such. It does not do that. All right. So, sometimes we might have to tell it or nudge it or give SQL Server a little bit of extra information in order for it to understand what we’re after.

Using the convert function, even without a style here, but I am a big proponent of using styles. We just can’t use one here for reasons that we’ll discuss. If we do this, then all of a sudden SQL Server figures out, yes, these are all dates.

Thank you for pointing that out, right? We have successfully converted our strings to the correct data type with the magic of the convert function. Wonderful stuff. Wonderful.

It’s fantastic the things you can do when you type a little bit more. The reason why we can’t use a style on these, though, is because, well, you know, I’m an American and, you know, my country tis of thee and all that. But we are completely alone in our treatment of the date of date formatting.

Let’s do our magic SSMS 22 zoom here. We are the only one. And while the USA is number one, it is quite lonely on top, at least in this DMV.

You’ll see that U.S. English is the only one where the date formatted is treated as month, day, year. There are 29, I think, if I’m remembering correctly, sorry, 24 rows, 24 rows where the date is treated as day, month, year in the format. Some friends of ours, like the British and the Danish and even I have one Czechoslovakian friend, so we’ll.

Or Czech Republican friend, what do you call them these days? I don’t know. Those maps keep changing.

It happened to Gorbachev. But there’s a whole bunch of languages in here that do not accommodate the month, day, year format. There’s many things in here that, well, some of these quite suspect, quite suspect typing in there, but that’s not the point.

Anyway, there are even countries that do things without MDY or DMY. There are some that do YMD, like our friends the Croatian-Lovakians and the Lithuanian-Lovakians, right? The Swedish, who are not Swiss, right?

So all of these different countries and languages, I shouldn’t say countries, these are languages, right? I mean, some of them are, you know, specific to a country, but some of them are definitely multi-country languages. But these ones are treated even different.

And like something I didn’t point out in the others, but there is even some disagreement. Even once we have sorted out the date format that we want to use, some people disagree about which day should be first in the week, Monday or Sunday. So we don’t, so there is disagreement about how this gets treated, even when we have agreement on date format.

So you have to be very careful out there when you’re building SQL Server applications and queries to make sure that when you pass in dates and date times and all those other things, that you are quite careful to write unambiguous dates that cannot be confused. Because there are many ambiguous dates.

Now, we’re going to go back sort of to our original problem, which is that SQL Server does not automatically infer dates just because a string looks like a date, right? So we have three dates here. We have December 1st, 2025.

And then, well, I guess there is ambiguity in either of these and all of these, right? Because this could be December 1st of 2025. This could also be January 12th of 2025.

And likewise, these could be January 2nd or February 1st of 2025, 2026. We don’t know, right? But the point is that without SQL Server knowing these are dates, SQL Server sorts these as strings, which is completely understandable.

I would not want to figure this out either, right? So 12 gets sorted down here, even though technically that year is earlier. We do not automatically infer any datiness from our strings.

Which gets even weirder. I mean, not weird like bad weird, but just like where you might find weirdness in your applications if maybe someone from outside of America uses your application. You might find some oddities if you are not passing in dates, again, in a consistent, unambiguous format.

So let’s look at this. So we’re going to set the language to English, which, you know, yeah, U.S. English, right? Proper English.

But over in our results pane, right? We get back these three rows in order, one, two, three, which as far as the U.S. dating system goes, this is correct, right? We have December 1st of 2025.

We have January 1st of 2026. And we have February 2nd, sorry, February 1st of 2026. So according to U.S. dating policy, this is correct.

But according to our friends in Britain, this is different. And I assume that British also encompasses Canada, Australia, New Zealand. And I don’t quite know where else.

But let’s just go with those ones. If we set the language to British, right, we look over in the messages tab, we will see we are very British now. We are hip, hip, cheerio British.

And we look in our results. The row ordering has changed, right? Before it was one, two, three. Now it is one, three, two. Right? We still, well, I mean, this changes. I mean, this is no longer December 1st of 2025.

This is now January 12th of 2025. But these two rows here that were ambiguous to us, right? These flipped, right?

Because according to British dating policy, these are different from American dating policies. And so these dates, they are not what we think they are. Actually, none of these dates are what we think they are.

We have been lied to. We’ve been run amok, bamboozled, hornswoggled. I forget how the rest of that thing goes. But all of those things, right? Swindled.

Swindled. So really, what you should always do is put the year first. Four-digit year first. That will at least buy you some forgiveness from me. As if you always put the year first.

Because you reduce ambiguity when you put the year first. Where things may still be ambiguous. And let’s just, let’s quote this out so it doesn’t get in the way. Sorry, English.

We’ll come back to you, I promise. Where things that remain ambiguous is with sort order. Because if we set the language to British, we do not have a reliable sort order for our rows, for our duplicates. Right?

Because, you know, these dates basically repeat. Right? Like row one is 2025, 1201. Row four is 2025, 1201. There’s some dashes in here for some of these.

So you can even be forgiven some dashiness if you, if you, long as you put the four-digit year first. But if we want, if we want unambiguous sorting, then we also need to have a deterministic way to sort this data when we encounter duplicates. So remaining with U.S.-British, now we get the data back in 142536 instead of 415263.

But these dates have all been applied correctly to us. Oh dear, I’m getting some green screen artifacts. We’ll have to stay close to the camera so we don’t mess anything up.

But if we bring, if we set this to U.S. English now, again, the number one English, we will get back the same thing without the ambiguity. SQL Server is no longer switching things around on us. Right?

So things came back the way that we expected them to. Where things get, so we’ve just been talking about dates. And what’s crazy, wild to the extreme, is that Microsoft considers date to be a more modern data type than date time. We’re going to talk about that a little bit more in a minute.

But date time is technically a legacy data type. And Microsoft even says you shouldn’t use it anymore. Right? Don’t use date time. Use date time 2 instead with whatever precision you need.

It goes from 0 to 7. Do all sorts of things with it. But date time is no longer good to use. And that’s especially painful for me to talk about because the very, wow, this green screen is going crazy on me here. What’s happening?

There we go. Maybe I should, maybe I’ll stay here. The reason why that’s so painful and crazy to me is because the Stack Overflow database that I use, all of the date columns are date times. And I hate it because I look at them and I have to write procedures that use date time.

Because who has the patience to convert all that stuff over? Do you have any idea how many demos I’d have to change? It would be a nightmare.

Right? And then if I did that and I asked anyone else to look at something and their copy still had date time, who knows? Right? So it would just make no sense. But anyway, we’re stuck with date time in Stack Overflow land. We’re also stuck up going up to, we’re also stuck with the world ending in 2014.

So we got an assortment of problems. But anyway, where things get even weirder is when you get dashes involved with date times. Right?

So what we’re going to do here is we’re going to look way ahead to, this is the final Friday the 13th of 2026. It will be in November. November 13th of 2026.

That’ll be a Friday the 13th. And if we try to do this in English, this will work. SQL Server is like, oh, you’re an American date. Gotcha.

No problem. Right? So US English, this is totally fine. This arrangement works beautifully. But British English, this stops working when we put dashes involved and we try to convert to a date time. This now returns a null because there is no 13th month.

There is no 11th day in the 13th month. I’ve heard various rumors that if we had 13 months in the year, that we would have, all of them would have 30 days. And all of them would start on a Monday and end in a Friday.

And I’m like, wow, we’ve really made things complicated for ourselves. All these 12 months and various other things. So I don’t know.

Maybe, maybe that would be better. I don’t know. Anyway. I’m not even sure if that’s true. Maybe I was lied to. Maybe Wikipedia lied to me. There are some things that I want to talk about before we end this video.

The first one is I hate the cast function. Cast is for lazy dummies. And I say this having been a lazy dummy many times in my own past. Having to be confronted with old code that I’ve written where I see cast use.

And I’m like, man, I screwed up. So learn from me. Learn from the patient zero lazy dummy. Don’t use cast.

Use convert. Convert has styles. And those styles can be very, very useful when trying to figure out if your things are ambiguous. I mean, there’s all sorts of things that you have to get crazy with. And then when you’re dealing with binary and XML and other stuff.

But those are outlandish cases. For the general SQL Server user, you will care very much about styles for having unambiguous date formats that don’t mess things up. For example, 112 is great if you just need dates.

121 is actually the default for time. Date. DateTime2 and DateTimeOffset. Great stuff.

Right? And 127 is ISO 8601 with timezone Z. Right? So this thing at the end. Right? Timezone Z. So very important stuff here.

If you go to the documentation for, well, I mean, we should just not, like, really the documentation that we care about is for convert. Cast has very little documentation. It’s just garbage.

Don’t use cast. Lazy. The documentation for convert has all sorts of helpful things that can help you use convert better. Right?

So we do care about that. Another thing, and this is coming back to something that I said earlier that I didn’t, I said, we’ll talk about this in a minute. And here we are talking about this in several minutes. But, okay.

And that is at the documentation for DateTime. Right? And that is avoid using DateTime for new work. Right? Do not do it.

Right? It is a legacy data type. It is no longer good enough. Instead, use the time, date, DateTime2, and DateTimeOffset data types. These types align with the SQL standard and are more portable.

Time, DateTime2, and DateTimeOffset provide more seconds precision. DateTimeOffset provides timezone support for globally deployed applications. And, gosh darn it, don’t we want our applications to be globally deployed?

Why settle for dominating a single market when you can dominate them all? Right? Why would you want to do that? So, I told you this video would be a little bit more involved than the string one.

I didn’t lie to you. Please, do not use ambiguous dates or date formats in your code. Um, I’ve seen so many dumb things crop up over the years, either because of implicit conversions, or a lack of implicit conversions, or errors, or just, like, things not functioning correctly because people decided to type dates stupidly.

And they paid dearly for it. They paid me dearly for it. Which I appreciate. But, you know, if you want to avoid paying me dearly for these things, use consistent, unambiguous date formats.

And please, use the convert function with a style. Just, I ask very little of you. Please do these things.

Anyway, thank you for watching. Hope you enjoyed yourselves. I hope you learned something. And I’ll see you in tomorrow’s video, where I’m going to go crazy and I’m going to defend merge. Alright.

Adios.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 25% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.

Love Your Data Types: Strings

Love Your Data Types: Strings


Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 25% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.

SQL Server Performance Office Hours Episode 45

SQL Server Performance Office Hours Episode 45



To ask your questions, head over here.

  • what is the sql server version of having a tramp stamp?
  • Do you ever feel like you’re being too hard on Microsoft when you make fun of SQL Server features?
  • You’ve talked about building some personal projects with LLMs. ANy plans on releasing those?
  • What’s the worst non-performance related SQL Server problem you’ve dealth with?
  • Do you have any SQL tattoos?

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 25% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.