bit OBSCENE Episode 12: Interesting Inserts

bit OBSCENE Episode 12: Interesting Inserts


Summary

In this video, Joe Obbish and I delve into a fascinating discussion with Joe Obisch from Someplace, Somewhere, about an intriguing question that popped up on Stack Exchange. The question revolves around generating sequences in SQL Server for partitioned data types, and it highlights the challenges of handling high concurrency while maintaining performance. We explore various approaches to solving this problem, including using sequence objects, which are a relatively new feature introduced in SQL Server 2012. While we discuss the pros and cons of these methods, we also touch on the broader context of research and how to approach similar problems when faced with limited information or outdated systems. This conversation not only provides valuable insights for database administrators but also offers a glimpse into the thought processes behind crafting effective questions and answers in the tech community.

Chapters

Full Transcript

Erik Darling here with Darling Data, joined by Joe Obbish of Someplace, Somewhere. Am I still in line, Joe? Shouldn’t I be? Oh, yeah. No, no, no. I’ll change it. God. You know, you interrupt me at the worst times. Let me insert Joe now.
All right. Before I was so rudely interrupted. Welcome back to the Bit Obscene radio program brought to you by Pistachio Poppy toothpaste. It is very, very pistachio flavored and has a very, very pleasant green color on the toothpaste, on the toothbrush, rather.
This is not my actual toothbrush. This is a demo toothbrush that I use for cleaning shoes and other things that I don’t want to put in my mouth. Anyway, thank you, Pistachio Poppy. I don’t think that was a very good endorsement, but I’m sure they’ll forgive us, right?
Pistachio Poppy. It tastes a lot like pistachios. All right. Anyway, Joe had an interesting question pop up on his Stack Exchange. He got it. He got it sent directly to his email from Stack Exchange.
That’s true. And he answered it. And Joe would like to talk about how interesting this insert question is. Well, it’s not only about my answer.
I felt like the question itself and some of the answers provided a good learning opportunity. Really? And considering, like, nowadays, all this stuff is just used by AI agents that are trying to destroy the world.
But at least, like, you know, get some positive value out of our questions and answers while we can. Sure. And everybody’s looking for easy content.
That’s actually the hardest part of this, not recording them, but figuring out what to talk about. Which is weird because we normally have no idea what to figure out talking about. I think I’m viewing my browser.
You sure are. Yeah. See, this is where you kill me. So this whole, like, I was like, when you said you wanted to talk about this, I was like, and I saw the open browser tab in the last video. I was like.
That was a preview. It was. And I was like, ah, Joe’s going to go look at it. And while he’s looking at it, I’m going to reward the bounty and his score is going to go up 500 bucks. But you ruined it. And now it’s not going to happen.
That’s okay. Because if you scroll up, you can see my generosity. I mean, I’m not a stressed out person because I’m using Microsoft Azure products. Oh, okay.
I’m just all like, I’m just calm. I’m in a Zen state. Yep. I’m not worried about it. This really does smell like pistachios. It’s crazy. It’s like pistachio frosting.
I hope one day in the future, we have podcasts that, that present smells too. Me too. Only. Me too.
Everyone should smell this is a delightful aroma. So I assume that some of our audience are like running marathons or shoveling snow or whatever.
They’re not actually reading the question. So I feel that, well, first let me follow good, good practices and make it bigger. Right? Yeah. That’s a good idea.
Because everyone, everyone listening to a radio show thinks, ah, I can’t wait to read this screen. That’s right. So, so really, I think it’s a somewhat common problem. The question asker effectively needs a sequence, but it’s partitioned by another common table.
So his two types as an example are invoice and the receipt type. And the sequence is one, two, three for invoice and one, two for receipt. So, you know, think of it as like many sequence objects in the same table with one per type.
And he says that procedures execute very frequently over a hundred thousand times an hour. And he’s, he’s very, very focused on it being concurrency safe and it performing well, which, you know, fair enough.
He, he also presented his attempt at solving the problem, which is not, which is not handle concurrency at all, which he confesses. Um, and then once again, he talks about how, you know, he, he has really heavy concurrency and he’s a scale well, and so on and so forth.
Now there are a few things that stand out to me here. Um, well, first of all, in terms of like, is this a good question? I mean, considering the typical question is how is Babby formed?
This is like an excellent question, right? It’s not a perfect question, but it’s an excellent question. Um, he has a table definitions and example data, the, uh, code he tried.
He explains the problem. He tries to lay out his requirements. You know, he’s very concerned about high throughput and high volume and, and so on.
Now, in terms of making this question better, it would be good to define, like, if gaps are okay. Like, um, I also think that, I mean, just speaking for myself, I usually Google a problem or search for it on the website before asking a question.
Some, some people don’t, and more part of them, questions have to come from somewhere. But it would be good to know why he’s not using, uh, sequence objects. And I can guess the reason, like maybe has a very large number of distinct document types.
Like he has a hundred different document types. He doesn’t want to create a hundred sequences, or maybe the list of document types is, is dynamic or unknown. And he doesn’t know all the lists and he doesn’t want to dynamically create sequence types.
Now the great thing about stack exchange as opposed to real problems is you can like pretend what the question is. Like you can make the question be whatever you want. And sometimes I will find some ambiguity in a question and tilt it towards being more interesting.
And I’ll answer what I think is interesting, which is probably helpful to someone, but not necessarily the person himself. One thing I would like Eric’s opinion on is. Oh boy.
Oh, he, he frequently talks about his scalability and, and one of the comments. Yeah. He says that he has a hundred thousand logged in users at the same time. Now I am not as promiscuous as you when it comes to SQL servers.
Have you ever experienced such a workload? Okay. So not on a single SQL Server.
Um, I’d also like to maybe different, maybe like, I don’t know, there, there’s some terminology in there that leads me to believe that there is perhaps some, I don’t know. Hmm. It feels misleading to me.
So a hundred thousand logged in users, right? Um, okay. So maybe there are a hundred thousand users logged into a website, like me, probably not all actively hitting the database server. If you have that many users, you probably have enough developers to set up a reasonably good caching layer.
You probably have other databases that have, that do other things in there. And SQL Server just handles the transactional end of it. When there is a transaction, you know, like Amazon has completely different stuff for like product search than it does for like shopping cart and checkout, right?
It’s like different systems. So they might have a hundred thousand users logged into like whatever web, whatever website hitting other database services, but a hundred thousand active users on a single SQL Server. Do you have any idea what thread pool is out of your mind?
If like, no, that is bonkers to me. Yeah. Like if you just take a literally, like, I don’t know how big of a server you need to even have a hundred thousand active connections, even if they aren’t doing anything.
All right. It feels like it would have to be pretty big. Um, I would personally be very interested in seeing that server.
Modern hardware is a lot better than I thought in terms of getting very wide sockets. Like you can have a, what was it like, I think like 256 physical core, two socket server nowadays, but divide that by a hundred thousand users. It’s everyone gets a very, very small, uh, portion of one CPU.
The other thing that’s worried about it is if you’re actually living in this business environment, wouldn’t you be like super good at handling concurrency? Right. Right.
And you wouldn’t be like asking questions to stack exchange or. Okay. So let me ask you, let me, let me, let me throw this one at you. Um, if like you look at that code and does that look like the code of someone who runs a SQL Server with a hundred thousand logged in users?
Can you imagine? Like if, if that’s an example of like the starting code that they’re like, this is how it’s going to work. Um, man, I, I weep.
I weep. Yeah. I could be making a lot of money off these people. Yeah. I was just taking that. I could be, I should be by all rights, making a lot of money off whatever this situation is. But, uh, you’ve got to, uh, you’ve got to try this guy down.
Yeah. Yeah. What, what is there? What’s in his profile? Like, is there an email address? Cause life is a journey.
Yeah, sure is. Yeah. Um, yeah, I mean, it, it does feel suspicious, but like I said before, you know, if we can, make questions more interesting for ourselves, it’s, it’s a bit more fun.
Right. So I don’t have a problem with that. Um, I don’t believe it. I don’t think it’s relevant. I, I will, I will give them credit for actually specifying a target, even if it’s wrong or misleading.
Cause you know, as, as, as you’re well aware, like it needs to be fast. It needs to perform. Well, it needs to scale.
Well, we’re running it super frequently, right? Like you’re often to get that level of detail as opposed to a number. So I will give props for that. And maybe we should be more considerate to this poor suffering developer who has a hundred thousand active users at all times.
Cause it sounds like his professional life is pretty tough. I’m, I’m ready to get to the answers. If, if you are.
Oh yeah. Let’s do it. Let’s see where I want to go to first. So first one, use a sequence object or type. It’s a, I mean, it’s, it’s, it’s an answer.
Um, I think it works well enough. If, if there’s a relatively small set list of types, which could be true is probably true. Um, could have been good to talk about the cash for, for sequences.
Do you remember sequences cash by default? Uh, it’s like a thousand, I think. Okay.
Well, a thousand or 10,000 or something. If there is a 10,000 default cash, then. No, you know what? It’s not. There’s no way it’s 10,000. Let me see. From what I remember.
I think it’s a thousand. I, I, I think it doesn’t cash by default, but I, I’m willing to be proved wrong in real time. Uh, okay. So if the cash option is enabled without specifying a cash size, the database engine selects a size.
However, users shouldn’t rely upon the selection being consistent. Microsoft might change the method of calculating the cash size without notice. Oh yes.
But it defaults to cash. So, okay. So it defaults. Yeah. So it says it defaults to cash being on. Uh, but then if you don’t say what cash size you want, Microsoft’s like, roll that dice. Yeah.
It’s, it’s the way it lived dangerously. You know, this, this with this hundred thousand active users, he should probably specify that a cash. Cause I, uh, do doubt that Microsoft is, is picking their cash with a hundred thousand active users in mind.
Right. Yeah, absolutely. I mean, think of the licensing. Yeah. Um, yeah, but you know, overall, like if, if the guy doesn’t know about sequences, this could be a very bad lines for him because he, he does specify in some of the comments that he’s okay with gaps.
So maybe the solution is good. Um, I think I would like put this in a variable or a case statement or something. So we’re not duplicating all the code, but I’m just kind of quibbling at this point.
Anything to say about the very boring, but safe answer here? Uh, no, I mean, it’s fine. I’ve, I’ve not, I’m not shamefully, but, um, I’ve, I’ve done similar things to give people a, you know, unique thing per thing in a table.
Um, you know, it’s, it’s, it’s a lot easier than a lot of like before sequences, you know, like a lot of this stuff had to be done either with a cursor or with a window function in ways that were very unfun. But, you know, um, uh, so I, I, I, when the sequence objects came out, you know, like this, that, like that was the one use that I ever found for them, at least in client work. Yeah. So maybe that was the solution.
Of course. Yeah. What’s annoying. So like in your, in your answer. Well, which is it, which is a great answer, by the way. Well, thank you.
Um, you, you linked to the Paul white post on sequence tables. And every time I’ve seen a client using sequence tables, it’s always been really messed up and like a lot of locking and blocking around them. And I would show, I would show them two things.
Paul white’s article. And the fact that sequence objects exist and then never do anything about it ever. It was like, no, we can’t touch that part of the code. It’s too scary.
It’s too old. Every, like if that breaks, like we’ll go out of business. Like this is all screwed. Like, no. I don’t know. Like, I guess it’s on the one hand, it’s nice that this person is asking before they screw everything up. On the other hand, I looking at their code.
I think it’s going to all be screwed up anyway. This, this does go back to what I wanted. One of the things I wanted to talk about, which is like how to research and answer your question.
So no shade at this guy. Maybe he’s didn’t know about sequence objects and had no way of knowing. Maybe his work VPN doesn’t let him Google things.
I don’t know. But if you’re aware of sequences and they don’t work for your use case, in my mind, a logical thing is like, okay, well, this was a feature added in SQL Server 2012. What did people do before the sequence was added?
Because the problem didn’t, you know, come into existence in 2012. Right? Like it was.
No. All the code that I see from this is from like the early 2000s. Yeah. Then I thought when I was trying to answer the question, I was trying to make them more interesting. Well, like maybe he has a thousand different types or an unknown number of types or he, you know, he has a good reason that you sequence.
Well, what did the old, you know, wizards, you know, back in the day, like, how, like, how did they solve this? Because I’m sure there are many bad solutions that are posted and with a couple of good solutions. And yeah, you can’t really go wrong with, with Paul White.
And there is a lot more in here that’s needed for the question. But sure. I used it as a starting point because I was too lazy to do anything else.
Honestly, like, you know, there are a select group of people who, every time I need to do something, I am like, I always Google their name and that thing. Because I know that it will, like, if I find anything that they did with it, it will give me a much better starting point than all of the stuff that I would have had to, like, break my brain on along the way. So 100% no problem with that, like, yeah, no, I’d rather, I’d rather start from like, the 10th floor than start from the ground floor and screw up for 10 floors.
Or you could be starting from the basement, depending on what you’re looking at. Yeah, you know, I don’t know that. I think it’s an insult to basements, Joe.
I don’t know, I think. Anyway, we have an answer that that comprimes it as the correct way to guarantee. With guarantee in bold, which sounds like a very strong guarantee, right?
Oh, I mean, it’s guaranteed to deadlock like a son of a bitch. Yeah, it feels a bit too narrow, like answering the literal question. And the guy was very concerned about scalability.
That’s what he said like five times. And, you know, serializable isn’t very scalable. I would say it’s anti-scalable.
Especially serializable with an up-delock on that. That’s gonna sting. Well, and presumably this table is being used for other things too, right? Yeah.
So you’re just, I mean. Documents sounds like a well-trafficked table to me. Yeah. I’m not much of a locking guy, but this means that no other session can even select from the query. Right?
I mean, depending on, you know, indexing and, you know, where other queries are hitting in the index, there could be a significant amount of locking there. But, you know, when you look at the query itself, right? Like, you know, you have, like, your where clause is on document type and you’re getting a max serial number every time.
If you don’t have very good indexes in place to support that, that’s not gonna be fun for you at all. Yeah. And to be fair, the index is mentioned here.
No, I know. So, like, I’m just saying. Just to clarify for the audience and not totally me. The serializable hint for this query is applied.
It’s not applied to a table level lock, but it’s applied to the locks that are otherwise taken. Right. Is that correct?
Correct. So serializable would be maybe one row or, like, not many rows? But, like, I think you have a hard time guaranteeing that.
Also, like, with that index in place, you would probably be okay. Assuming that document type is unique, which I don’t know. Because, like, if someone’s, like, storing a table of documents and they store a document type, then it’s usually not unique.
And, but, like, the combination of document type and serial number might be. So, like, maybe, but, like, I still think you’d have a real tough time, like, figuring out if you were getting, like, row level locks or page level locks. I don’t think this would be a situation where you’d escalate to a table lock, but, like, like, either, like, row or page, you would end up, you could end up jamming things up pretty good.
Especially if there’s, like, just a whole buttload of these running at once. Right? Yeah.
Or if you had, like, just this random reporting queries on the table that are looking at who knows. Yeah. Scanning. Scanning, right? Yeah. But you can always put no lockings on those.
I’m sure that the 100,000 active user database is very well tuned. So you’re not going to, you know, you’re not going to have those. One can only assume.
Yeah, I mean, like, maybe it works in practice. I would be very wary of this. You know, I would say this is, again, a non-fun answer to the question, which, to be fair, can be often useful. For people.
Yeah. But, you know. Whenever I go to stack exchange, I’m generally looking at. Yeah. But I think, you know, what’s interesting here is, like, you know, a serializable up-delock that does give you, like, a boatload of pretty reasonable guarantees under the default recommitted locking isolation level. If you’re using a different isolation level, like a row versioning isolation level, like RCSI or snapshot, you kind of lose out on a little bit of that with other select queries.
Like, maybe not like other ones of these because of the hints that are there, but other queries might, you know, start seeing some weird stuff. And what was the other thing I was going to say? It was a very useful point.
Paul White’s post with the sequence tables. He doesn’t even use serializable or up-delock. He only goes to read committed lock just in case you’re using RCSI, which I think is a very, very interesting thing. Like, like, because, you know, I assume whatever Paul says is right.
So, like, because what am I going to argue with? Yeah. And it makes sense to me, my very narrow understanding of lock ink. The fact that you could use read committed for Paul’s code and you need serializable.
Yep. I’m the next answer from Steve, who says, design-wise, that is quite clearly idiocy. How does Steve have no upvotes?
Hang on. Hang on. Hold up. Steve. Maybe his upvotes were canceled by downvotes. You know? Who knows? Steve, I’m sorry I missed you.
Hang on. We’re going to watch Steve get upvoted in real time. Bam. Yes. This is, you know, this is part of why it’s important to be as clear as you can about, oh, can I have gaps?
Like, where are my exact rules? And so on. Because we have to, you know, write conditional answers based on this.
I think the basic point here is if it’s going to show without gaps, it’s going to require like a resource which is being locked in some way and that limits scalability, which is true. Yeah. But there isn’t any way to really to get from that to this because computers are faster than some people think.
Yeah. And if you have the right design, you can quite easily pump out way more than a hundred thousand locks an hour. Yeah.
No, that’s not a problem. Even if you were using like SP get app lock to manage concurrency for this, like, I can’t imagine, like, that’s not hard. Yeah. Yeah.
SP app lock does a lot of overhead. It does. Compared to Segal locks. Yeah. But I think that you could get to that level. I agree with you. Yeah.
I wouldn’t use SP get app lock. Right. Just saying. But like I’m saying, even if you were like, like, I got to use SP get app lock for something. Here goes nothing. I think you could do it.
Like, you’d have to be pretty bad at stuff. And before, um, I, I think it’s good to, to keep your options in mind in terms of like, well, I’m trying to store something. They’re going to start going to calculate instead.
The main issue had what this answer is. So, so in this answer, he’s using a row number to calculate the serial number for the people who are, who are shoveling snow or doing whatever else while listening. If a row gets deleted from the table, like now your serial numbers are changing, which doesn’t feel like a good property to have for a serial number.
Yeah. So I do feel like it’s kind of done the water for this use case. I mean, another, I think another thing that’s missing is that, um, for, okay.
So maybe, maybe not here, but in a lot of cases where I have used row number for this sort of generation, uh, you always have to go find the current max anyway, and then add that to whatever row number you’re generating. Cause like, that might not be necessary here based on some of this stuff, but like a lot of the times, like you are like, you have to go find like what’s the current high one anyway. And then add that to whatever row number you’re starting at so that it gets the next bunch.
Um, like I realized this is a little bit different cause I think it’s a unique group for a new document ID going in there or something, or like, like something along those lines. But like a lot of times I have to go, a lot of times when I’ve done this, it’s been like, well, like we like lift off here. It’s like, you’re almost like remaking an identity, but like kind of faster.
There might be a use for this. If you wanted to be really slick in that, say you don’t calculate it on insert, have a background job, which updates it. Yeah. And then, and then you use this to calculate it for the rows that haven’t been processed yet by the background job.
So it’d be a very small number of rows, you’d remove the scalability bottleneck. So such as it is. And since the life cycle of, you know, going from null to assigned is pretty short, you’re probably getting away with the, well, if I delete a row with what else happens.
If you’re not doing something like that, then this also just gets slow over time because you’re going to have more and more rows. You know, like he was saying higher thousand executions an hour, that’s like almost a billion a year. If it’s 24 seven, you know, it’s questionable if that’s true, but as, as, as all of us know, of course, this type of query where you’re basically, you know, you’re forced to scan.
All of the relevant rows in order, if you have an index, great, but that’s still going to be a, probably just a serial scan of. Yeah. That index in order.
And as you’re getting into billions and billions of rows, it’s not going to be particularly quick or, or maybe the more important thing is to say that it is going to get slower and slower time, which. Right. Isn’t performance will degrade.
Yeah. It’s something that I like for the solutions that I’m designing. Yeah. But like, and then, and then, you know, if you, if you do something like that, you know, the, the place most people are going to go with how to, how to keep that up as quickly as possible is most likely going to be a trigger. And then you’re dealing with someone who maybe doesn’t like have a good grasp of concurrency and edge cases and other fun things might write some rather unsafe trigger code and screw some stuff up along the way.
But, you know, perhaps, perhaps everyone is, you know, much better educated and making much more well-informed decisions than I’m used to seeing out in the world. Everybody knows you only ever insert a single row at a time. That’s true.
Makes the, it makes the trigger code simple. So let me ask you a question. If any people only ever insert one row at a time, why is it always in a loop? Like, why does the trigger always have to have a loop in it?
It’s insane. It’s people. Anyway, go on. Time to get to the answer. The answer.
The answer that I scrolled past, which, which is my answer. This is my answer. Um, and like, like I said, I was trying to have fun here. Uh, this is a quote from, I forget the guy’s name.
The quote was more memorable than the name. That’s always a good, good, good thing. Uh, Slava. Slava Ox?
Yes. Oh, right. That was in your columnstore presentation. Yeah. Yeah. That’s right. Um, it’s, uh, still serving me. Yeah.
Like main point here being, if you say I need sequences and I can never have gaps and I must have a truly strict ascending order. Um, that’s harder to do that limit scalability because you have to hold locks for a longer period of time, right?
Like for that type of no gap approach, you can’t use sequences. Yeah. Yeah. So you’re going to need some kind of table based, uh, sequence table and the transaction has to include both getting the sequence number and the insert.
Yeah. Which is less scalable than being able to break those, those transactions up. So the thing I’m advocating for is like, you, like you probably don’t actually need, like the way out, the way out phrases a requirement is unique integers that start small and generally trend up with the occasional gap being acceptable.
And the reason that I, I did that switch is I want transactions that are as short as possible. And, you know, like I, I, I want to make my sequence table to be as scalable as possible. The reason I went to a sequence table is there was already an answer for sequences.
Sequences are, are, are boring, but maybe has 10,000 document types. Who wants to create 10,000 sequences? That’s not a good time.
Not me. Yeah. Not you. Yeah. Do you remember the time, what were you doing? You were, you were trying to create like a whole bunch of things all at once. And it just like dragged the server down.
You were trying to create like a thousand tables or something. And like, every time, every time you added, every time you created a new table, it was like 10 seconds slower after some point.
Was it tape? Was it schemas or tables? It was some, it was one of those things. I remember, I only remember it a little bit, but it was, it was, it was one of the most hysterical things. I think the thing that I’m thinking of is, which might be what you’re thinking of.
I had a very busy server and it would take like a minute to create a table just because of the latches that were needed. And there was just so much signal wave.
No, I remember that. Yeah. That was a, that was a different one. That was a different one. Yeah. I don’t know if I remember that one. Okay.
I get a little bit more coming back to me. You were building a dynamic string that would have created like 10,000 either schemas or tables, but like the bigger the string got, the longer the whole thing took or something like that.
And at some point, just building the string past a certain point started taking a long time. Does any of that ring a bell? I have no idea why I would be doing that or. Maybe it was like partitions or something like that.
I don’t know. Anyway. It could have been partitions. I think it was, maybe it was partitioning related. I might’ve been experimenting with like having 10,000 partitions actually on a table just to, just to see what would happen.
Cause I do remember there being a design discussion around like, cause I don’t, as I think they increased the limit and then the max partitions at some point making it 15,000. Yeah.
But that was a long time ago. Yeah. Right. But like I was new to partitioning. I was trying to find guides in terms of like how many partitions is too many. I think it went from a thousand to 15,000.
If I’m. Something like maybe like a thousand 20. And it was, it was surprisingly hard to find like any real data or like anything written about it.
So that, that might’ve been it, but I honestly don’t, don’t remember. Um, so in the transcript, I’m sure. Somewhere.
Going back to this. Clearly you’re, you’re, you’re, you’re going to want a big answer. Our. Biggest ins you can get. Very scalable, highly concurrent workload with many rows and many executions. So the thing that I did was using my relaxed problem statement.
I don’t think there’s anything too fancy here. Store procedure to give you the serial number based on document type update the sequence number table and increment by one.
If the update doesn’t do anything, then that document type has never been seen before. Right. So let me, let me ask you a question.
How come you didn’t use, um, like row count big or something instead of the serial number? You, you, uh, could, um, I don’t really have a, have a reason for it. I just didn’t know if like you tried it and it sucked or something.
Oh no, no. Just like. Serial number. Summer was right there. And I thought. Why not? Use it, abuse it, overload it. Yeah.
Like, uh, to your point, the more common pattern for this type of thing would be to use for a row number. But since we had this convenient variable right here, I just used it. Um, then I’m inserting into the table with using UP lock, which I think you need to actually make this work, uh, to prevent duplicates.
Yeah. Uh, yeah. So, uh, I think, I think that, I think you might actually need, you might need serializable for that, but.
Oh, well. I can leave a comment. Um, and in, uh, this case, I’d be fine with using serializable just because, you know, like the insert workload, the insert creation runs so infrequently for this.
Right. You would hope so. Yeah. I mean, well, like if that isn’t true, then you basically have a bunch of types to have like one row.
Well, if it, if it isn’t true, then you just, if it isn’t true, then you just reverse the order. Yeah. Yeah. Sure. That is also a fair point. Um, but well, wait, no, I’m totally right.
Cause you have a hundred thousand executions an hour. You’re not going to have like, like a billion document types a year. Right. Hmm. I don’t know what kind of documents they get. Hmm.
You never know. I suppose. Never know. Yeah. Is it advisable? Maybe it doesn’t. Who knows? Um, only one way to find out, run it. Once again, I am not, well, I, I ran it and worked fine at my machine.
So there you go. Uh, I think normally you have a transaction here. I don’t have one because I don’t actually care which session wins.
I think that like, you know, this is a bit of a technical point, but in terms of defining, like what the right order is, it’s kind of a meaningless statement. Mm hmm.
And that’s like, you could have session a call this procedure, you know, one microsecond before session B, but then session a immediately gets off the scheduler. And then like session B executes this code first.
And like that session wins. Like it ends up like not really being, no, they’re a million meaningful thing in terms of like, like, like, like, I feel like there isn’t really a well-defined order here that it’s meaningful.
So, so that’s why I don’t have any transactions here because it’s, it just seems it does not practically matter in like any business sense. Yeah. So then after square is run, we know that some session inserted, hopefully one row and there’s only one row.
And now we have our one row. We can just blindly update it. Yay. That’s how we’re getting our serial number document type. And on my low end machine, I have eight physical cores.
I ran 16 threads and I was getting about 40 million executions an hour. How many hours did you let it run for? 0.016 hours.
Ah, okay. So after scaled up. Okay. So, um, we don’t actually know in the, in the longterm that this would continue to be scalable.
I think we do, but Hey, if you, if you, if you want to run overnight and feel free. You have Azure. You’re the one who can just run things.
Do I have Azure? Yeah. You said you did. You said that was making it calm. That’s true. Yeah. You can’t take that back now, Joe. It’s true.
All right. It’s not like we can edit the video or anything. We must do our, we must do our closing statements here because I have to pick a kid up in about 10 minutes. So, um, I do have to finish this. All right.
So closing statements. Eric rudely challenged me to make this faster than a sequence, which I think is very unfair because I made you a better person. Sequences are custom designed to do one simple thing and very optimized for it.
Because, you know, tables have all kinds of baggage and locks and latches and all kinds of transaction logs and so on. Now I gave it a try.
I failed. I did get really close though. And the way I, I, I gave it a try was, you know, like given our very relaxed, uh, definition for how we’re assigning the ideas, we can actually hash the table.
So take the session ID. Modulo 24. So now instead of one row, which is being locked by everyone for the document type, we have up to 24 rows and it is way more concurrence.
In fact, it’s so concurrent to the point where if, if, if I changed the workload and only ran my new procedure, which uses the, the, the version that that has the hash key added to the primary key, there effectively is no waiting on latches or locks.
It is solely right log, which I can figure out how to do. Um, I, I tried some in-memory LTP and it ended up bombing out due to right conflicts. I feel, I feel like in-memory LTP isn’t really great here.
Cause you kind of, you kind of like need, you kind of need some lock or that latch. Yeah. Like it’s not like you want like snapshot isolation.
It doesn’t make any sense for this. No, no. Right. And, and, and the table didn’t needs to be written eventually anyway. Like it can’t be schema only or I don’t know what it’s called. Like it can’t be schema only.
So I don’t like, like maybe there’s some way to make a memory LTP worked. Uh, one thing I noticed in my very brief attempt was if you want it natively compiled, you, you like, can’t do like serializable hints or tab lock hints or it’s just not compatible.
Yeah. So it probably couldn’t be natively compiled and you had to figure out the right conflicts. And I don’t know, I couldn’t get that to work.
Maybe someone could, but with my row based attempts, I did get within like, what is that? Like, not like 93% of the. Buddy, you want me to figure that out?
Of the rate for sequences. Um, so 166 million per hour for my eight physical core machine. So I do think that we have achieved the scalability needed for the question and answer.
Well done, Joe. Well done. And on that note, uh, I’m going to go get a child from the outside world.
Uh, thank you all for listening and or watching to the bit obscene radio program. Uh, if you’re, if you’re sitting, uh, at home or in the car, uh, you can safely do it. Give, give Joe a round of applause for nearly beating sequence objects at their own game.
Well done, Joe. Do you have any, any parting words for us? Maybe, um, someone else’s hardware.
My code is, is actually faster. That’s what I’m, uh, coping with, you know, cool. Maybe, maybe it’s faster on your hardware. Maybe.
Well, we will never know because you have still haven’t put the table definition that you used in there. Uh huh. So as soon as, as soon as we have that information, the, the darling data foundry will, will put that code to the test. All right.
Thank you for watching. Thank you, Joe. And, uh, we will see you all next time. Uh, as long as our pistachio poppy sponsor continues sending us toothpaste so that, um, I don’t, I don’t know what I’m going to do with 10,000 tubes of pistachio toothpaste, Joe, but I’ll figure something out. All right.
Goodbye.

 

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.