Building Reusable Queue Tables and Procedures In SQL Server
Thanks for watching!
Video Summary
In this video, I delve into the intricacies of building reusable queues in SQL Server, a topic that is both fascinating and crucial for maintaining efficient data processing workflows. I explore various queuing strategies, including different ordering methods like FIFO (First In, First Out), LEFO (Last Entry First Out), and FAFO (First Available First Out). Additionally, I discuss the importance of choosing appropriate data types for your queue table, particularly emphasizing the use of `bigint` as an identity column to avoid potential scalability issues. The video also covers how to efficiently manage processing eligibility and track duration using computed columns, along with detailed explanations on locking hints and isolation levels to ensure smooth operation in high-concurrency environments.
Full Transcript
Erik Darling here with Darling Data, and in the interest of fiscal solvency, we’re going to talk about building reusable queues. What does one have to do with the other? Absolutely nothing. I just blurted that out and went with it. And I don’t feel like starting this thing over. So you’re going to just have to cope with that mentally somehow. So there are two aspects to building. There are a lot of reasons for building. queuing processes in SQL Server. And there are all sorts of different ways to do it. There’s like first in, first out. There’s, you know, first, last in, FIFO, LEFO, FAFO, last in, first out. I’m going to skip over the third one. And there are all sorts of like, you know, different ways you can choose to dequeue something. And by dequeue, I don’t mean Dairy Queen. I mean to like remove something from the server. Eligibility for queuing. For me personally, you know, it really does depend on why I am the intent of the queue. There are times when I would want to, you know, just delete something from a table when I have just when I have started processing it. That’s a completely legitimate way to do it. And this in this example, I’m just updating stuff. And sometimes the update thing is nice because it like rather than just delete an item from a queue table, you can have the queue table retain some information about the queue, the queuing process, right? So it’s like you can have some, you know, generally useful stuff in there. And that’s like, that’s the way that I kind of like to do it. You can kind of see that in the setup of my queue table here.
So I am going to do what everyone, every good little SQL Server DBA does. And I’m going to have an ID column. Mine’s going to be a bigint, because I believe strongly in using bigints as for identity columns. The process of having to fix that is a real nightmare. And I wish Microsoft would make it less of a nightmare. To me, that’s a pretty major scalability issue that they just ignore. And what do you call it there? Yeah, if like, I’m saying for like domains, like things that you understand can never go past a certain number, like, like, you know, dispositions of things like, you know, like, like, constants of the Stack Overflow database, vote types, or post types, you know, you can never hit, like, over 2 billion types of posts, you might hit over 2 billion posts, that’d be a lot of posts, that’d be a lot of questions, but you’re never going to have like more than like, you know, an integers amount of types of posts.
So for when I’m when I’m making a table with an identity column, I’m gonna I’m gonna suck up, I’m gonna suck up it, suck it up and use those four extra bytes. Oh, my God, the four extra bytes. And I’m gonna I’m gonna use a bigint because it’s the funny thing to me is that, you know, when when you start talking about data types, you have the absolute worst people on Earth will start talking to you about bytes.
and especially when you talk about in some big ends are like, well, integers are four bytes, big ends are eight bytes, you know, disk, disk, disk, like, well, okay, fine, it is legitimate. The thing is, about the point where you’re going to start maybe caring about the four extra bytes that you’re storing, is probably going to be right about the time that you’re starting to run out of integers.
And when you have those four extra bytes is available to you in bigint form, you’re going to be so much happier that you don’t have to fix that problem. So, you know, that’s that’s my thing. Right there. And so for this queue table, since we’re not doing a delete queue, we’re doing it when doing an update queue, we need we need a column like this to tell queries when when a row is being processed.
In this in this case, the thing that I’m using for the processing point is just the reputation column from the user’s table. It’s really just completely senseless. And then I have some this is this is what I was talking about where I said it’s nice to have some information about how long it took to process a thing.
So what we have is a couple couple three columns in total start date and end date and computed column that calculates the duration in milliseconds between the start date and end date. So, you know, I’m using milliseconds here because like this particular queue brought would process very quickly. But in real life, you might, you know, want to do like seconds or minutes or something, you know, get get more more human understandable numbers out of it.
And then the two two important things are when you for processing this queue table, this one probably the most important because what we what we need here is which will make more sense when you look at the queries that actually hit the table. What we need here is a really efficient way to find things that are not being processed or skip over things that are being processed and a way to order the results after that in a way that we can process the queue in order. If we were if we were going to process these things differently, for example, the lovely, talented Canadian Bertrand, Canadian Aaron Bertrand, this Canadian Bertrand is going to be his name from now on, has a series of posts about snake sorting.
And snake sorting is a way to is sort of a way to divvy up work equally amongst threads. So they all sort of finish around the same time. The context of his post was like distributing backups to threads. So like you wouldn’t have, you know, like, you know, one thread. Oh, I got an eight terabyte backup.
Oh, now I got a 10 terabyte backup. They would be like, I got an eight terabyte backup. Now you take that 10 terabyte backup and the other two threads. You take like the five, four terabyte backups and we’ll all chug through these things more equally.
So if you have that kind of knowledge about like the size of a process or the size of a thing that has to be written, you might not want to do something as naive as just say, like, you know, get just give me the next one in sequential ID order. You might want to use some additional logic like Aaron does to split things up to make the thread each each thread do a more fair amount of work. So depends on how you’re dealing with that stuff.
So here’s here’s the store procedure that I will the sort of a template store procedure that I use to run the queues. And it’s just a perpetual loop until we run out of work. So and something is pretty important in here.
I’m going to split this up into two parts. The locking hints in here are particularly important, regardless of which isolation level you’re in. Because, you know, if you’re using I mean, just like option wise, if you’re using read commit, just really, if you’re using pessimistic versus optimistic, using a pessimistic option up isolation level like read committed, you’re going to have a blocking nightmare if you don’t put stuff like this in.
Right. And if you’re using an optimistic isolation level, you’re going to have a pretty hard time with a time of things if you don’t add in some additional locking so that so that like, you know, in some cases, like you’re going to want read queries to not see version data. So you have to be just be be careful with that.
If you are using an optimistic isolation level, you will probably want to use the read committed lock hint for queuing processes that goes for queuing processes. Or if you’re doing anything like building a sequence table, Paul White has an excellent article on building sequence tables where the he uses the read committed lock to great effect in there. And I think that’s probably something that you would want in a queuing process like this.
If you were doing if you were using an optimistic isolation level and if you are a good little data darling out there in data land and you are being real smart, you are using an optimistic isolation level because pessimistic isolation levels are the turds. They are nothing but problems. So, yeah, there we go.
And you know what I had known I accidentally deleted something in here that I shouldn’t have deleted. I just realized it now because I saw some red squiggles at ID integer. No.
Reputation. Sorry about that. Integer. No. No. No.
That should be. No. Like that. And then. So what this code does is it’ll look for things that it needs to process. Right.
So while we have at least one row in the table that needs processing, we’re going to enter a loop that looks like this. And this is where the index thing makes a little bit more sense. So we’re going to use a CTE here just because it’s a little bit cleaner than using a sub query or any really.
I mean, if we could use a drive table here and it would be the exact same thing. But, you know, for this, it just kind of, it does make the, it does make the sort procedure a little bit more readable. When I use a CTE here.
So what we’re going to do is just select the top one row. And we’re going to hold on to that one row. We’re going to say row lock up D lock. We’re going to hang on to that one row.
And we’re going to, again, you know, like I said, this is just very naive sorting to just get me the next, next available thing in sequential order. You might need to do this a little bit differently depending on what your process is. And then we’re going to use kind of a neat thing where we’re going to use an update to not only set a couple columns in the table equal to something.
So what we, this update will start by changing in, in the in process column to one for whatever row that we find. And it will also update the start date to the time that we started this. So that when we’re done at the, at the end, we can add a, we can update the end date column.
So we know how long something took. And then we’re going to do something that I think is particularly clever. We are going to, in the same update, we are going to update our two parameters to be equal to the columns in the queue table that we found up here for that, for the one row that we pulled out up here.
We’re going to set those equal to the column values that we pulled out there. Then, you know, just what we’re going to do is we’re going to use those as part of our queue process thing. Uh, my pretend example is just selecting a count from the users table where reputation equals whatever reputation we pulled out there.
And we execute that SQL down there. And then we use the ID column that we fetched. If this zoom, it will listen to me.
Zoom, it will ever listen to me. I’m hitting escape like every three seconds and zoom it. It’s just like, I don’t care. Uh, come on, zoomie. There we go.
All right. That took a little bit more wrestling than I cared to do. And then the final thing that we’ll do down here is update the queue table to just set the end date to when the, when this thing finishes and the ID equals the ID of the queue item that we pulled out. And that would, that will be, uh, that will be really quick.
Um, there’s also just a little safeguard, uh, there, um, to make sure that we only, uh, would only ever update a role that’s in process and not a role that is not yet being processed. Even though that’s sort of technically impossible with this. I just like to be extra safe when I’m, when I’m running these things, when I’m automating tasks like this, because if you’re not, you can run into some really weird bugs.
Um, so all of, all of the code and, uh, a somewhat deeper explanation of some of these things is available on my blog. Um, I, there are two posts that will be in the, in the show notes, as they say, about building reusable queues. I originally wrote it as a two part series.
Uh, I forget why, probably cause I needed an extra day of blogging to, to, to cover something. Uh, it’s hard, hard to tell sometimes, but I, I think it’s a very interesting, uh, problem when you have to tackle this sort of thing and doing it correctly and doing it in a way that, uh, is, is effective for high concurrency and not having a bunch of awful blocking problems, uh, is especially interesting. And that’s where, uh, table design is really important and, uh, making sure that, you know, your table is set up in a way that allows you to find the rows that you need to process in the order you need to process them as efficiently as possible.
Uh, so like, you know, if you’re doing something where there’s going to be like, you know, four or five worker threads and they’re going to like chug along on like huge tasks for a long time and then come back, the locking and blocking thing probably isn’t going to be all that important. But if you need to orchestrate like really high concurrency queuing, uh, and distribute those queues without running into a blocking nightmare, that’s where this kind of stuff becomes much more important. So anyway, thank you, you, especially you, the good looking one sitting right there, uh, for watching.
I appreciate you more than words can, words could ever possibly describe. Uh, I hope you enjoyed yourselves a little bit. I hope you learned something.
Uh, if you like this video, um, thumbs ups and helpful comments are, are always nice. I do enjoy those. Uh, if you like this sort of SQL Server content, uh, please subscribe to my channel.
You can join the nearly 4,000 other data darlings out there in YouTube land who, uh, who, who, who have done, who have done that and, uh, who, who make me feel like a, a special, helpful, useful, good person by, uh, allowing little bells to go off in their head every time I publish a video. So, uh, yeah, you can, you can, you can like, and you can subscribe. You can do, you can do them both.
And, uh, then, then you’re, then you’re extra, extra data darling-y. And I owe you a hug. So, anyway, uh, I got some more of these to record.
I got to bulk up things, uh, before I go away for a little bit. So, uh, this one’s getting canned right about here. Thank you.
Thank you.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.
Love your videos. Thanks for them. I usually view them on your website though, not YouTube. Would it be possible to include the links you put in the YouTube show notes on your site blog posts as well?