A T-SQL Query To Get The Text Between Two Delimiters In SQL Server

A T-SQL Query To Get The Text Between Two Delimiters In SQL Server



Thanks for watching!

Video Summary

In this video, I delve into the intricacies of using the `SUBSTRING` function in SQL Server to extract text between two delimiters. Erik Darling from Darling Data shares his insights on a common pitfall: misunderstanding how the arguments work. He explains that while many assume `SUBSTRING` works with start and end positions, it actually requires specifying the length after the starting point. To illustrate this, he uses examples from the `sys.messages` view, highlighting the importance of being precise when defining these parameters to avoid errors like “invalid length passed.” The video is a must-watch for anyone who frequently deals with string manipulation in SQL Server, as it demystifies a function that, despite its simplicity, often trips up even experienced users.

Full Transcript

Erik Darling here with Darling Data. Of course, who else would have me at this point? I can’t imagine. I can’t imagine at all. In today’s video, we’re going to talk about how to get the text between two delimiters in a SQL Server query. The reason why we’re going to do that, and I know this may sound dull and pedestrian to many of you, the reason why I’m going to do this is because every month I get this email from Google, and it’s all about my website, search results, performance, and stuff like that. And every month, the blog post that I wrote about this is the number one post. All the stuff that I write and record about, all the things that I do, how to get the text between two delimiters is the big winner. Alright, fine. Okay, you get a YouTube video too. That’s your prize. For being the number one blog post on erikdarling.com, formerly ericdarlingdata.com, you get a YouTube video. Memorialized, canonized, itemized, you get a YouTube video.

So the first thing that everyone gets wrong about substring is what each of the arguments does. And this isn’t purely in the context of SQL Server. There may be other implementations of substring that do things differently. But in SQL Server, the substring function has three arguments, and they are the input expression, in other words, the thing that we are going to get a substring of, the start position of that string, and then the length of the string. It is not start and end, right? This third one does not meant to find the end position of a string.

This is where you tell substring how long the string is after the starting point. Alright? So the way this works, and I’m going to be using the sys.messages built-in view for this.

And I’m going to specifically be searching for entries in the text field that have at least two semicolons in them so that this works. If you don’t do that, or if you don’t have something, if you don’t, like, if you’re not really confident about the substring stuff of the character presence or position in the string, you’re going to write a lot of really defensive, like, code around this.

Otherwise, you’re just going to get a bunch of, like, you know, invalid length passed to whatever errors. And those suck, and I hate debugging them, and it’s miserable, and I don’t want to do it. So I try to be as defensive as I can when I write these things.

So anyway, this is sort of a basic way of doing stuff. The fully parsed string portion is down here. This is the only section…

Zoom it, you are on my last nerve. This is the only section that’s going to actually do what I’m trying to teach you how to do. There are a few other columns in here to enumerate what the actual first, second… Come on, zoom it.

…and third arguments are as they’re passed in. All right? So we’re going to run this. And we’re going to look at the results. And what we’re going to see is…

Well, I mean, exactly what I was telling you about here. The first argument is the text of the error messages from sys.messages. The second argument is, like I said, the starting point of the string.

And the third argument… Notice that this is not always a higher number than this, because all this is is the length of the string after this argument, right?

So for this one, we go from the 71st byte and then we go 6 bytes over. This one, we start at byte 59 and we go 441 bytes over. This one, we start at byte 31 and we go to a 13, right?

So again, first argument, the input string. Second argument, starting position. Third argument, length of the string.

That’s why when you’re doing this stuff, what you have to do to make sure that you get all that positioning right is a lot of jumping around.

And it’s really annoying and I hate it. And I’ve had to do this for my entire SQL Server career. And I almost never remember how to do it correctly the first time around. And if it’s one of those situations where we need to do like a reverse and find like the last thing in a string and then jump, forget it.

I have to look for like 15 different blog posts before I can get that one right. So for the parse string one, though, the thing that we’re actually learning how to do today, the first argument, pretty self-explanatory, just m.text, right?

That’s all we need there. The second one, we want the car index in the text column of colon plus the length of whatever you’re searching for. So if we were searching for something that were two characters here, if we were searching for like double colon, then the len would have to change to two, all right?

Or we’d have to change plus two or plus double colon here. Otherwise, we would have an inaccurate starting position. The third one for the length of the string, this is where things get annoying and complicated, right?

The first two things, fairly self-explanatory, right? The thing that we want to split, the thing that we want to substring rather, and then the car index of the first thing that we care about plus the length of that first thing that we care about so that we start on the thing, like right next to it, right?

We like colon right next to the colon. You get what I mean. The third one, this is the tough part.

So one of the nice things about car index as a function is that it, where that, so pad index does not have this, car index does have this, is car index supports a third argument optionally, where you can tell it the starting position that you want to start looking at.

So for this is going to look exactly like it did up above. We’re looking for the first colon in the string, but the third argument is going to tell our query to look for, is going to start looking after it finds this.

So basically, after it finds this thing plus the length of this thing, right? This line up here is basically just a copy and paste of this. We want to find the very next colon in text after we find a colon in the text plus the length of the colon right there, right?

So because remember, we got to bump up one over so we don’t capture like anything stupid. Then, and this is where, this is the part that always murders my brain. After that, we have to subtract, right?

We have a little subtracty thing here. That means subtract. The subtracty thing means subtract. We have to subtract the length of the thing that we’re looking for. And we also have to subtract the car index of the first position that we find, right?

So essentially, we have to subtract this from this so that we get that, we get the correct string length. Because remember, we’re not, substring in SQL Server is not expression start end, it’s expression start length.

So in order to figure out how many characters or how many bytes over we need to go from the thing we care about, we need to subtract the first position that we find and the length of the thing from the second position that we found, right?

Because that’s what gives us the text between delimiters and not just text from delimiter to the end of the string. I know.

This stuff isn’t fun. Most sensible people, if they have a job and they work for a place, they’ll probably write an inline table valued function to do this because remembering this is awful.

But it is the most popular post on my blog, a popular blog on my entire website, so it gets a YouTube video.

And that’s about it. So apparently a lot of people always forget how to do this. Myself included. Anyway, thank you for watching.

I hope you finally learned how to use substring in SQL Server. I hope you enjoyed yourselves. I hope you learned something, like maybe what substring actually does in SQL Server.

If you like this video, lots of thumbs, lots of comments. I’ll take anything.

Send me a letter. Just kidding. I don’t want, I can’t read, I can’t read. And if you like this sort of SQL Server content, maybe not exactly this SQL Server content.

Apparently a lot of people really like this. Maybe this is what I’m going to blow up on. If you like this sort of SQL Server content, you can subscribe to the channel, like 4,000 almost other people have.

I don’t know what I’m going to start saying when over 4,000 people have subscribed. I’m optimistic that I’ll survive, I’ll live to see that.

But anyway, thank you for watching. Happy, happy strings, happy substringing. Oh, yeah.

Okay, goodbye.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

How To Get Only Letters Or Numbers From A String For Free In SQL Server

How To Get Only Letters Or Numbers From A String For Free In SQL Server



Thanks for reading!

Video Summary

In this video, I delve into three underappreciated functions from my GitHub repository that can significantly enhance your SQL Server string manipulation tasks. Often, when working with clients, I encounter subpar versions of these functions—clunky and inefficient code that could be easily replaced by the well-optimized solutions I present here. The focus is on `get letters`, `get numbers`, and `strip characters`—functions designed to streamline common string operations like extracting specific character sets or removing unwanted characters from strings. These inline table-valued functions are efficient, avoiding the pitfalls of scalar UDFs and multi-statement table valued functions, making them a valuable addition to your SQL toolkit. Whether you’re dealing with phone numbers, usernames, or any other text data that requires cleaning, these functions can save you time and effort. So, give them a try and see how they can improve your workflow!

Full Transcript

Erik Darling here. Still alive. Amazingly. Still talking about SQL Server. Probably a little less amazingly. In this video I want to talk a little bit about three functions that are in my GitHub repo that don’t get enough, if any, attention whatsoever. Look, you know, look, look, I got a GitHub repo. Ooh, everyone has a GitHub repo. It’s not anything crazy to have a GitHub repo. Most of what gets attention in my GitHub repo are the stored procedures I write to help with SQL Server troubleshooting. Things like the stuff over here that starts with the SPs like SP Health Parser and SP Human Events and SP Log Hunter and SP Pressure Detector and SP Quickie Store. There are a lot of SPs in there. These string functions get very, very, very, very much, attention. And it’s sort of a shame because when I’m working with clients, I often see like really crappy versions of these functions like copy and pasted from some website from the year 2000. Inevitably scale our UDFs, every single one of them. Some of them multi statement table valued functions. But like always like while loops and other crap that make your query easier.

Keep those queries awful and hate you slow and make you need to hire me. Maybe I shouldn’t be talking about it. Keep those queries slow. Call me. I’ll do something. I’ll wave my hands at them. But the three basic things that I see a lot of people trying to do in queries a lot has to do with like string manipulation. Microsoft has given us some help with string manipulation lately, sort of. We got a string split and a string ag and I don’t know. I guess that’s it. You get what you pay for. It’s only $7,000 a core. Why develop anything useful? Apparently we’re getting regex. Azure SQL DB has regex in preview, which can only mean SQL Server vNext is getting regex in preview. So, um, can’t wait to start seeing regex in a where clause. To be honest with you, gonna make the, revive the entire SQL Server community having regex in where clause. Everyone’s coming back. Everyone’s coming back.

All right. Not a sink and ship at all. So, uh, these are the, what the three functions do. Uh, they are called get letters, get numbers, and strip characters. Uh, they all do nearly the same thing. I’m going to talk about what’s in these files a little bit before I show you the, the stuff. So in every file, there’s two versions of these. Uh, there’s one version where you can use a numbers table and the numbers table does help a little bit with performance. There’s also another version that uses a CTE to build up a sort of, uh, internal numbers table, uh, that looks like this. Right? So there’s a little bit more work in there. Uh, the numbers table is a bit faster. It’s not, you know, depending on what you’re doing, the, this, the speed difference is, uh, somewhere between negligible and, um, uh, profound.

So, uh, you know, make sure that whatever version of these you’re using suits your use case appropriately. Uh, so what these things both do is, uh, this one is get letters. So what this does is it basically, uh, uses that string splitting thing to do some XML-ing. And what we do is look for, uh, basically any single character in a string that matches the AZ-AZ, uh, uh, uh, pattern, right? So that’s all the, you know, characters between A and Z. Um, so that, that, that, that’s what that does. Uh, if you need out weird characters, I don’t know, write your own function. Uh, and then the, the get numbers function does just about the same thing, except it only looks for where, uh, the single character is like zero to nine.

The strip characters one is a little bit different because what this does is it seeks to remove, uh, some, some matched expression from, uh, from your strings. All right. So they, they all function slightly differently. Um, I think maybe, I mean, sort of theoretically, technically, strip characters could replace both get letters and get numbers. But, um, I, I sometimes find that it is, it is, it is helpful to write code that is geared towards a specific task.

Uh, the, the more, um, the more that your code might have to do or, like, the more generalized your code is, sometimes the less efficient it is. Uh, overly generalized code, big, bulky, slow, lots of thinking, decision-making, uh, things to check on. Uh, more focused, narrow-casted code generally tends to be faster and work better.

So, uh, with that out of the way, let’s go over to Management Studio and let’s just see a couple examples of these things working. So, uh, this is the get numbers, uh, function. And if I run this, uh, we’ll see, uh, a bunch of nulls where some usernames don’t have numbers.

And then for lines where the usernames do have numbers, we will only get the numbers from them. Things like 4614. Without the user, right?

So, that’s, that’s, that’s get numbers, which is pretty handy. Uh, get letters does the exact opposite of get numbers, where, uh, when we get down to this batch of users, where there, there, there were numbers there at some point, uh, the numbers have been completely stripped out.

All right? So, fun stuff there. Uh, the strip characters one, well, like I said, there’s a, there’s a solid case for strip characters replacing both. So, both of those, uh, in different ways.

But, um, well, so like there are some weird, there are some weird characters in some of these. It’s a screwy Unicode thing. Uh, the results look a little weird sometimes with this.

But if you look at, uh, the, the, the display name column in the Stack Overflow database has some, has some very weird things in it. It’s an Envarcar and strange things happen. So, but we, when we want to, uh, strip out characters, um, well, this is, this is the one where we, where we were removing numbers.

And we can see the numbers get removed in here. Like that. And for the one where we were looking to, uh, do the opposite, which is one where we were trying to remove all that stuff.

Uh, so there’s, like I said, there are some strange things in here. But the, the important thing is that for these, for these rows, uh, we only get the numbers back from those. Uh, for, for these, uh, I would have to, like, go through and, like, do, like, an ASCII or Unicode check to figure out exactly what is odd in here.

But Tomek Melissa, I guarantee you, has some weird Unicode character somewhere in that name. Uh, we just, I, I just didn’t really do all that much digging on it. So, if you, if you have a need in your database to either, um, a lot of, I see a lot of this stuff with, like, phone numbers or, you know, um, something along, something along those lines.

Uh, and it’s usually pretty good for, these are pretty good for that stuff. Uh, they are inline table valued functions, so they don’t have the same problems that scalar UDFs or multi-statement table valued functions would have. So, if you want to give these a shot and see if they fit your use case, you know, I got get letters.

I got get numbers. And I got strip characters. And they all do, you know, just exactly what they sound like. So, you should try them.

And if you, I don’t know, like them or you find bugs or you find things that can be improved from a performance perspective, well, this is what GitHub is for. We collaborate. We’re a community.

We all high-five each other. No one gets paid. No one gets paid. All right. Cool. Uh, thank you for watching.

I hope you enjoyed yourselves. I hope you’ll try my handy little functions out. Uh, I hope you learned something. Uh, what else? If you like this video, I like thumbs.

I love thumbs. Thumbs are the best. Comments are nice, too. Uh, and I also like subscriptions. Subscribers.

I like when people say, I want to hear from you more often, Erik Darling. Uh, because, you know, that’s how I make friends. I yell at my camera on YouTube.

So, anyway. Like. Subscribe. Hang out. Spend some time with me. Mm. Try some functions out. They’re free.

The first function’s free, kid. All right. Uh, thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Check Constraints And Query Performance In SQL Server

Check Constraints And Query Performance In SQL Server



Thanks for watching!

Video Summary

In this video, I delve into the world of database constraints and their impact on query performance, particularly focusing on how tweaking these constraints can lead to more efficient execution plans. While most viewers might not have constraints in their databases, I explore a scenario using Stack Overflow data to illustrate why constraints are important for maintaining data integrity and optimizing queries. By adding and modifying constraints, we see firsthand how SQL Server processes queries differently, leading to significant improvements in plan shape and performance. This video is part of my ongoing series on database optimization and query tuning, aimed at helping those who want to improve their skills in these areas. If you found this content amusing or informative, I’d love to hear your thoughts—thumbs up and helpful comments are always welcome!

Full Transcript

Erik Darling here with Darling Data. In today’s video, we’re going to talk about how you can tweak constraints in your database to improve query performance. Now, this video is admittedly a pretty good bit of narrowcasting. I know most of you out there, you do not have constraints in your database. If it’s a data warehouse, I’m with you, though. Like, check constraints, unique constraints, foreign keys, and data warehouses. The domain of morons. Do not mess with your database load times. Your data warehouse load times. And your OLTP databases, OLTT, not a thing. Oh, your OLTP databases, where, you know, referential integrity and things like that are pretty important, you should have constraints like uniqueness and check and foreign key because they’re good for your data. They’re good little data vitamins. You should have your data on those vitamins. They can get big and strong like Fred Flintstone. So, I mean, this is just kind of a funny thing. It amuses me. And if it amuses me, I record it. And hopefully it amuses you, too. That’s why at the end of every video, say, I hope you enjoyed yourselves. Because, oh, crap, I hope it’s funny to someone.

If it’s just me laughing at my own jokes, boy, that would be so depressing. Oh, dear. Oh, dear. I’m spiraling. Spiraling real hard. So, I’ve got my Stack Overflow database, and I’ve already created this index. And this index is on ZoomIt. I’m having real ZoomIt problems. I might have to, like, restart my computer or something because ZoomIt is just being acting a wild. Wild fool on me. This index is keyed on reputation and upvotes, and it includes display name. Now, what I want to point out about this index before we go any further is that this index does, in fact, have a standard, let’s see, is that, well, almost 200 row history. I’m 194 rows. And you can see all of the histogram stuff in here. This is our 194th row. Maybe if we, oh, I mean, we created the index, so that is a full scan.

There is our 194th row. This is our highest value, 1047863. 1047863. Oh, that’s seven digits. That’s John Skeet. And then we have various and assorted reputations up here. And the very first reputation in the table has a range high key of one, right? So SQL Server knows that the lowest range high key that it got from this data is the number one. The thing that this doesn’t guarantee in any way, shape, or form is that the lowest number that will always be in here will always be the number one, right?

SQL Server has no logical guarantees about that because we don’t have any constraints that enforce that. So what we’re going to do next is look at a query plan. And when we’re going to look at this query plan, we’re all going to gasp. We’re going to be shocked and awed and horrified at how disgusting this query plan is. I mean, it runs pretty quickly because let’s, like, look. They don’t all have to be awful. Some of them can just get the point across.

Some of them, you really just… See the pictures up there? There’s a seek and a sort and a parallel gather streams and a top. Sometimes you just need the pictures to tell the story. In a way, you do kind of miss the old days when there were no operator times in query plans because then you could just, like, take a picture of something and be like, trust me, it was awful. But this, you know, keeps you a little bit more honest. So me being honest with you, I just want you to pay attention to the operators in the plan and the plan shape.

Namely, we have a parallel plan. Our plan has a sort in it. Those are the two things that are important here. Now, if you’ve watched my other videos about indexes and indexing and how indexes put SQL Server data in order, you would think that, you know, having a less than or equal to predicate on reputation would make it fairly easy to… For SQL Server to know that, like, it doesn’t have to sort upvotes, right? We order by upvotes here, but we have, like, an equality predicate.

And one is the lowest in the table. So, SQL Server, you should know. But it doesn’t know because we don’t tell it that. We have to over-communicate to databases. You know, they’re very neat. Databases are like big neat balls.

Like, you’ve got to really spend a lot of time coaching and coaxing. It’s a whole thing. So, if we add this constraint, where we say reputation greater than zero, SQL Server knows right now that every single reputation in the users table is greater than zero.

And so, you would think that SQL Server would logically be able to say, when this query runs, well, everything’s greater than zero. So, that really means that the only number that could show up in here is the number one. I mean, yes, theoretically, technically, scientifically, there are numbers between zero and one.

But this is an integer column. But we don’t have that level of precision. But SQL Server, still not very confident. And we end up with the exact same execution plan with the seek and the sort, and it’s all parallel.

It’s very interesting, isn’t it? If we want to fix that, if we want to change that for good, we need to get rid of the constraint as it exists here. And we need to change the constraint. Constraint? Constraint? I don’t know what a constraint is.

I don’t think that’s a word. We need to change our constraint to be reputation is greater than or equal to one. And when we do that, now SQL Server changes its whole mind.

We have a single-threaded index seek with no sort and no need for a parallelism gather stream because there’s no parallelism. So, what’s kind of funny here is if you look at this index seek, this just says seek predicate reputation equals one. So, we have that equality predicate now on reputation.

And we have the SQL Server with that knows that upvotes is all in the order that we want it to be in. So, we don’t have to actually physically sort anything. Upvotes is all in the right order.

If we go back a step, oh, dear, what happened? That was a bit scary. That was like one of those end-of-the-world movies. If we go back a bit and we look at this constraint or look at the query plan with the greater than zero constraints in place, the index seek looks a little bit different.

Right? We still have the seek predicate, but it’s less than or equal to one. So, just having that zero in there, just having greater than zero doesn’t really help us. Having the greater than or equal to one gives us a much tidier query.

So, let’s be honest here. Very few of you have constraints. Very few of you pay this much attention to query plan details and maybe to the way constraints are written.

And how that can be changed to improve query performance. Very few of you would even probably pay attention to that query plan that we just saw. Why?

It runs in 100 milliseconds. Are you going to tune that? I don’t know. Do you need to? Maybe. Could you do it by changing a constraint that you probably don’t have on your table anyway? Yes.

That’s the magic of databases. These are the kind of little things that matter. I’ve said it before in other videos. You know, the further along you get in your database.

I mean, I gear myself towards the query tuners. The further you get along in your query tuning career, the more these small details and these little minutiaes make a difference. So if you want to improve, leave the query performance stuff aside for a little bit.

If you just want to improve yourself and you want to improve your query tuning abilities, these are the kind of things that you’re going to have to start paying attention to. Now, let’s say we’re starting from a place where you have no constraints on this table. That’s the first thing you’re going to want to go after, right?

Because if this is an OLTP environment, not a data warehouse. Data warehouse, constraints, no. Get out. You’re in an OLTP environment and there’s like specific domain rules and regulations around some of the columns in your database. Overcommunicate to SQL Server what can and should be in those columns.

It will help you in the long run. And then further on, when you start to have really insane problems in databases, pay attention to how those constraints are written. Because to SQL Server, which apparently does not trust integers, there’s a big difference between greater than zero and greater than equal to one.

So just stuff to keep in mind. Stuff like this I find all the time. Stuff like this I find and it surprises even me.

It just kind of takes a little bit of like the curiosity to tinker and, you know, toggle with stuff a little bit and try to figure out exactly what’s wrong with SQL Server today. Because it’s always something. Always something wrong with SQL Server.

It’s sort of like a Munchausen type thing. I’m not a doctor, but it sounds like a reasonable diagnosis. Anyway.

Yeah. One, get some constraints. Two, write your constraints right. Important lessons there.

Right? For you. Thank you for watching. I hope you enjoyed yourselves. I found this particular scenario funny. I hope you also found it funny.

If you didn’t, well, I don’t know. Maybe the next one will be better. I can’t make any promises, though.

If you like this video, thumbs ups and helpful, not hurtful comments are always appreciated. And, of course, if you like this sort of SQL Server content generally, and you would like to get notified whenever I talk about things that amuse me, well, you can subscribe to the channel, and you can get the notification along with, like, 4,000 other people.

And like we’ve said before, 4,000 people have never been wrong. Never once. All right.

Thank you for watching. I need to go do something else for a moment. What it is, you’ll never know. You can only guess. You can only dream. You can only speculate. You can only be conjecture.

But I’ll know. I’ll know.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Join Me At Data Saturday Dallas Sept 6-7

Spring Training


This September, I’ll be presenting my full day training session The Foundations Of SQL Server Performance Tuning for Data Saturday Dallas.

All attendees will get free access for life to my SQL Server performance tuning training. That’s about 25 hours of streaming on-demand content.

Get your tickets here for my precon, taking place Friday, September 6th 2024, at Microsoft Corporation 7000 State Highway 161 Irving, TX 75039

Here’s what I’ll be presenting:

The Foundations Of SQL Server Performance Tuning

Session Abstract:

Whether you want to be the next great query tuning wizard, or you just need to learn how to start solving tough business problems at work, you need a solid understanding of not only what makes things fast, but also what makes them slow.

I work with consulting clients worldwide fixing complex SQL Server performance problems. I want to teach you how to do the same thing using the same troubleshooting tools and techniques I do.

I’m going to crack open my bag of tricks and show you exactly how I find which queries to tune, indexes to add, and changes to make. In this day long session, you’re going to learn about hardware, query rewrites that work, effective index design patterns, and more.

Before you get to the cutting edge, you need to have a good foundation. I’m going to teach you how to find and fix performance problems with confidence.

Event Details:

Get your tickets here for my precon!

Register for Data Saturday, on September 7th here!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Tweaking SQL Server Queries To Induce Parallel Execution Plans

Tweaking SQL Server Queries To Induce Parallel Execution Plans



Thanks for watching!

Video Summary

In this video, I delve into the nuances of parallel execution plans in SQL Server and why sometimes a hint or trace flag isn’t enough. Erik Darling from Darling Data shares his insights on using an `ORDER BY` clause to trick SQL Server into generating a parallel plan when needed. He explains how adding unnecessary sorting can make SQL Server perceive the query as more costly, leading it to choose a parallel execution path. The video also covers other methods and considerations for inducing parallelism, emphasizing the importance of validating whether such plans are truly beneficial or just adding overhead.

Full Transcript

Erik Darling here with Darling Data. And I have sort of a short video today because sometimes a short video is what I have time for and sometimes a short video is what you have time for. So this is going to be a good and happy time for when I only have time for a short video and for when you only have time for a short video. It’s just amazing how that’s going to work out. Well, one thing that comes up a lot for me in my consulting and in, well, I mean, I guess sort of like a general query tuning methodology is, you know, there are definitely times when a parallel execution plan is preferred. And faster. The problem is that Microsoft doesn’t give us a min-dop hint. There’s no way to tell SQL Server via a hint called min-dop that you want to set a minimum degree of parallelism for a query. It would be nice if you could. It would save everyone a lot of trouble. Unfortunately, Microsoft doesn’t care about you that much. Or apparently, it doesn’t care about you that much.

Or just wants to make you suffer. And that sucks. What we do have, we have a trace flag called 8649. The trouble with trace flags is that, you know, you do need admin permissions to use them. And, you know, you can’t put trace flags in some modules like views, things like that. Like, you know, you can’t put, like, you know, option query trace on whatever at the end of a view query. And we also have a use hint called enable parallel plan preference. That one has fewer drawbacks, but the problem with both of them is that Microsoft does not document nor support them.

So, you know, we as query tuners are left with this unfortunate sort of, you know, do we want to use undocumented, unsupported hints and trace flags in our queries? Yes. Yes, yes, we do. It’s fun. Or do we want to find other sort of more natural ways of getting SQL Server to think that a query will do so much work that we need a parallel plan for it? Now, there are a lot of circumstances where you just don’t need to encourage SQL Server all that much.

Most of the time, SQL Server will just be like, you get a parallel plan and you get a parallel plan and everyone gets a parallel plan. Go crazy. So, you know, that’s fun. But sometimes you got to play some tricks. There have been various attempts over the years that are more successful in some compatibility levels and in some contexts than others.

I often find that, you know, you can usually coerce SQL Server into using a parallel plan by making it do something extra that doesn’t logically change the query that it wasn’t doing before. I think probably the easiest one to talk about is like sorting, like making SQL Server order data. A lot of the times when you make SQL Server order data, it will freak out and be like, whoa, expensiveo.

Parallel plans for everyone. So just to give you like, I realize these queries, you know. Okay, first off, like if I was yelling at someone about this query, top without order by is complete garbage nonsense.

If you’re writing queries like this and you don’t have like the, like, if God did not hand you a rock that said this query can only ever return one row anyway, and you do this, I think, I kind of think you’re an idiot and you should stop. This is not a good practice.

But I just want to show you really quickly that using this top one query and then using a top one with an order by, how that, how SQL Server doing more work makes it be like, oh, we need a parallel plan now. So let’s look at these two things.

All right. Top one, same query, top one with an order by. And if you look at these query plans, you might be shocked. You might be horrified.

You might be mortified to see that the first query we ran with no order by runs for nearly 11 seconds. All right. So we spend about almost nine seconds in the votes table and then doing some hash flow distinct hippie stuff right here.

And this thing runs for quite a while. And what’s really amusing about both of these queries is neither one of them return any rows. I did that on purpose.

Kind of a funny, kind of a funny query. I know the stack overflow data apparently well enough to get queries to do dumb things like this. So the second query does have an order by.

But here’s what’s tricky about it. It has an order by on the ID column. And how did that end up over there? That is not where, that is not correct indenting.

One, two, three, four, five. Oh, we need one more. There we go. Ah, no, come back. There we go. That is proper indenting. Four spaces in for select list columns. I don’t know why SQL prompt sometimes screws me and aligns this thing with the top or distinct.

It’s awful looking. But this query, those were ordering by u.id. And the funny thing about u.id is that it is the clustered primary key of the users table, meaning this thing already has an index on it.

It is well indexed. It should be, SQL Server should be able to return this in index order very easily. But that’s not what SQL Server has a problem with.

What SQL Server has a problem with is the votes table. Look at our mighty votes table. Look at all these rows.

These rows coming out of the votes table. Look what we do to those rows. We sort them. We have to sort these rows.

Because we ask SQL Server to return this data in order. Ah, curse you. To return data from the users table in order. SQL Server thinks that from an I.O. perspective, it would be a whole lot smarter if we had data in order from the votes table.

All right. So to make the loop join a lot faster. All right.

So SQL Server does us a solid here. And sometimes this kind of stuff, this general approach works with like all sorts of queries. But especially, you know, when you have exists and not exist queries where you’re dealing with queries that have these sort of funky little row goal things in them.

You really need to, you know, make sure that the plan that you’re getting is not one that is susceptible to row goal problems. Row goal problems are painful problems. Because this query up here, quite honestly, has a row goal problem.

If you look at this stuff, that’s a sassy row goal problem. The SQL Server is like, ah, I think there’s only one. Mm-mm.

Mm-mm-mm, SQL Server. You’re wrong. We’re all wrong. Well, I think being wrong about row goals is the worst kind of wrong you can be. Because when you’re wrong about row goals as a query optimizer, you really barf all over everyone’s day.

It is a sick and sad thing to see in the world. But this query, these numbers look a lot smarter to me. And this parallel plan turns out a lot better, especially when we’re dealing with lots and lots of rows.

So, and if you’re out there trying to tune a query and, you know, let’s just say on the off chance that maybe you’re like, I think a parallel plan will be better here. And there might be all sorts of reasons you might think that. You might, you know, look at one of your operators and be like, holy cow, 50 million rows on one thread?

That seems like a bad idea to me. We should have those rows on dop threads because dop threads would be a lot faster. So, if you come across a situation where you’re pretty sure that a parallel plan would be a lot faster.

And, you know, like it’s okay if you want to test queries using trace flag 8649 or the enable parallel plan preference use hint. But if you need to find a more natural way of inducing parallelism, often, you know, adding things that don’t really change much about the query but do change the work SQL Server thinks it has to do are pretty valid ways of getting a parallel plan where you otherwise just wouldn’t be able to get one. So, order by is one of those funny things.

And there are all sorts of places where you can stick order by, especially with big tops, like top two bajillions, in order to get a parallel plan where it just, it doesn’t change logically or semantically what the query is doing, what the query is returning. But it does fool the optimizer into thinking that a query is going to be a lot more expensive from a costing perspective. Not necessarily like, oh, it’s going to make the query slow.

Remember, costing and slowness are not the same thing. But it’s just a pretty good way of getting SQL Server to think, to like operate under the impression that it has to do a lot more work than it actually has to do. Get yourself a parallel plan and just make sure that, you know, like you can validate pretty quickly if the parallel plan is faster and if the parallel plan is efficient.

Remember, one of the key things about parallel plans is that you should see a fairly good difference between CPU time and wall clock time. If you have a DOP 8 query, ideally, you would see CPU time be 8x whatever wall clock time is, but that’s not always going to be the case. But what I want you to keep an eye on is something like the closer CPU time and wall clock time get to each other in a parallel plan, the less efficient the parallelism was.

That’s where you’re going to see like a lot of rows ending up on like one or two threads or something within the parallel plan. And that’s going to just just going to show you that the parallelism was pretty inefficient. So I’ve covered that in many videos, many, many videos, thousands of videos at this point, maybe even billions.

Hard to keep track. Only. Only there was some sort of dashboard told me how many videos I had.

All right. Anyway. Thank you for watching. Hope you enjoyed yourselves. At least a little bit. I hope you learned something.

And if you like this video, if you are into playing tricks on the optimizer, like I am, throw this video a like or throw me a comment that says, Cool dude bro job.

I think you did a cool job, bro. Something like that maybe. Your English might be better than mine. Mine’s terrible.

And also if you like SQL Server content, messing with the optimizer, giving it noogies and wedgies and, you know, all that good stuff. Bullying it a little bit. It’s okay to bully the optimizer.

It’s a computer program. It doesn’t have feelings. So if you like that kind of content, subscribe to the channel. Like almost 4,000 other people have. I mean, 4,000 people can’t be wrong.

If I’ve learned anything from elections the world over, 4,000 people cannot be wrong. All right. Cool.

With that sort of crushing optimism, we’re going to sign off here. Thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Why Operator Times Are Wrong For Some MSTVFs In SQL Server

Why Operator Times Are Wrong For Some MSTVFs In SQL Server



Thanks for watching!

Video Summary

In this video, I delve into the peculiarities of query plan times for multi-statement table valued functions (MTVFs) and why they can appear so strange. While I generally advise against using MTVFs due to their frequent performance issues, I explore two specific examples in detail. By examining three different queries that call these functions, we uncover how disabling interleaved execution is crucial for getting accurate timing information. The video highlights the discrepancies between actual runtime and the misleading query plan times, offering insights into why MTVFs can be problematic and when they might not provide the performance benefits one hopes for.

Full Transcript

Erik Darling here with Darling Data. And if I look a little bit different in this video, it’s because I’m really happy. So, you and I, we’re gonna get happy together. We’re gonna get real psyched on all this. You and me. So, in this video, we’re gonna talk about why query plan times for some multi-statement table valued functions look really weird. Now, I don’t want you to think that I contain, that I condone, not contain, I contain, I contain multi-statement table valued functions. Multitudes of them. Multitudes of multi-statement table valued functions. No, I don’t want you to think that I condone the use of multi-statement table valued functions because in general, they’re, in my experience, they are either like a complete performance albatrosses tied to the next of your queries or they’re just whatever. I don’t think I’ve run into a situation in my, in my, in my, my history where switching to a multi-statement table valued function made things dramatically faster. I suppose there’s some, there’s some chance of that happening if you like, you know, were splitting up a larger, more complicated query and you decided to use one to like, to like, materialize some intermediate results but even, even that’s a bit of a, of a, of a, of a far-fetched scenario. That’s a pretty big gamble because of the many limitations that table variables have in SQL Server. And if you have questions about that, watch the rest of my channel videos because I talk about it a lot. All right? So, do that. Do, do, do a little bit of diligence. All right? A little bit of diligence is due from you.

You, you won’t, you won’t, you won’t, you won’t be one diligence. So, uh, we’re going to look at two different multi-statement table valued functions. Uh, and we’re going to look at three different queries calling two different multi-statement table valued functions. That sounds about right. So, this is the first one. It’s called score stats. Uh, it accepts a user ID integer. It inserts some stuff into a table variable. Well, actually, let’s, let’s zoom in here a little bit. This is what makes a multi-statement table valued function. A multi-statement table valued function and not an inline table valued function is that we are returning a table variable. Right? This is our return clause at out table stuff. Uh, and I make everything with schema binding because I hope that, and hope and pray that someday I find, uh, a, a situation where schema binding makes things better or faster.

In the meantime, I dislike making anyone trying to change tables or columns experience some pain. Be honest about that. A little poke, a little poke on you. So, that’s the first function. The second function does something rather similar, uh, also with schema binding. Uh, we return a table, uh, it does an insert into the table and, uh, then returns the table down at the bottom.

I think I neglected to show that in the first one, that this is the final return statement returns our table variable. I know, calm down. Calm down. Please. It’s, it’s, it’s all too much. It’s all too much. So, these are the three queries that we’re going to look at. Uh, we have the first query, which, uh, calls the first multi-statement table valued function.

We looked at, uh, score stats. We have another multi-statement. Well, we have another query that calls the other multi-statement table valued function that we looked at called vote stats. And then we have, um, we have a third iteration of that query, uh, where I have specifically disabled, uh, interleaved execution. And the reason I’ve done this is because it used to be that if I ran these two queries back to back, if I ran this one, then the re-ran the same query, uh, I would get accurate execution times shown in the query plan, uh, on the second run, when it was, uh, you working with a cache plan.

We’ll talk about why in a second, but I don’t get that anymore. So now I have to use the disable hint to show you what’s going on. So for the first query and the first query plan, uh, we get accurate and honest times. Uh, this runs for 6.38 seconds. If we come over here and we click on the properties, like good little property clickers, and we look at the query time stats, we get, uh, the right timing on those.

Uh, this function is not, at least, uh, I don’t think it was, uh, this function is not eligible for interleaved execution. Uh, there’s, we’ll see in the other functions, there’ll be a message up in here that’ll say, oh yeah, you’re cool with that. You can, you can get me some functiony goodness. You’re a nice hip happening person.

Uh, but this one, uh, we don’t, we don’t have that. Uh, but anyway, this multi-statement table value function does return correct and accurate, uh, query timings. Uh, the second one does not. Uh, so the second one, the query that doesn’t have the hint, uh, where this thing says it ran for one second, is an absolute lie. And if we scroll down to the second query where I’ve disabled, um, the interleaved execution, it’s, it’s run for 45 seconds.

This query did too, and we can, we can verify that this query ran for 45 seconds. Cause we, if we go to the properties here and we look at the query time stats, we can see, uh, a complete and utter nonsense lie for CPU time. And then 42 point 42 and a half seconds of elapsed time, right?

So the CPU time is way screwed on this and the, but the elapsed time is probably more accurate. Now, coming back to what I was talking about with the interleaved execution. If we go in here and we look, uh, but this part of the query plan, uh, we’ll see this contains interleaved execution candidates is true.

How nice we have, we have found a truth in the world. Uh, and that is, that is just excellent for us. And then if we look at the properties of, uh, the table valued function call up here, notice the, notice this query plan, this query plan are a little different.

Uh, this one is just like, you know, blah, blah, nested loops, table valued function. This one is like table valued function sequence. And then we have a table scan on the, the, from the, coming from the table variable from, from within the function here.

But if we look at, um, if we look at the properties of the table valued function call from the, this part of the plan, you will see, uh, this handy dandy little thing that says, is interleaved executed true? True. True. It is true. It is not false. Uh, I think when it’s false, that, that node just isn’t there, which is crappy.

It’s like, I don’t know. Just make things, makes things confusing for people. So I don’t want, what I don’t want you to think is that, um, uh, disabling interleaved execution is what made this, this call, this call slow.

These calls both run for just about the same amount of time. There is, there is a slight difference in the parallel versus non-parallel plan. What interleaved execution does is, and the reason why this thing doesn’t show accurate cardinality is because what interleaved execution does is it, it, during like query compilation, it populates the table variable to come up with a cardinality estimate.

So like, it sort of like happens in these like two phases where like, like that happens and then the rest of the query runs using that, uh, using that estimate. So like, this query gets a better estimate and actually uses a parallel plan. So like, this one should be, this, that’s why this one is like a few seconds faster.

But these things both run like we, like we’ve seen because, because when I click on this and I go and I look at the query time stats, uh, this runs for 42 and a half seconds and this runs for about 45 seconds. So the parallelism helps a little bit for this plan because we got a better act. We got a better cardinality estimate, uh, coming from the function.

Uh, but in this one, like, so for this one, you know, we got, uh, a 30 row guess and we got one Oh eight, six, five, eight, six of 30. Uh, and this one up here, uh, we, well, I don’t know. I don’t know if that’s really better.

One Oh eight, six, five, eight, six of four, three, seven, eight. So, so I don’t know. Uh, sure. The parallel plans a little bit better. Um, I don’t, I don’t know if three seconds is really that great of a difference here.

Uh, I, I, if I, if you, if you or your boss or whoever were paying me to tune this query, I wouldn’t be like, Oh, just make the multi-statement table value function parallel. Cause three seconds, that’s as good as we can do. Uh, we, we would, we would do much better than that.

You and I, you would look at these query plans. We would ponder the majesty of interleaved execution. And then we would click quickly throw the whole thing in the garbage and rewrite this in a different way. Cause, uh, there’s, there’s really like, I love when Microsoft does this.

It’s so cute where they’re like, Ooh, we have this thing. That’s going to fix this crappy feature we have. And then it, it’s just so mediocre. It just does so little.

And you’re like, okay, wow. You’re real. You didn’t really move the needle there. Uh, it didn’t, didn’t work. Didn’t really do anything. But anyway, I don’t know. Uh, that’s, that’s enough of that.

Thank you. I love you. You’re, you’re wonderful, beautiful people. Uh, I, I appreciate you watching my videos. Uh, I appreciate everyone who, who, who throws me likes and who throws me comments, whether, no matter what the comment is.

Uh, and I also appreciate, uh, the, the nearly 4,000 data darlings out there in YouTube land who, who, who have subscribed to this channel and, and, and, and find it necessary to be notified, uh, when I, when I do these things.

So that’s all there. Uh, I’m going to prepare a different demo and probably record something else. So, uh, uh, I, yeah, thanks. Thank you for watching and I’ll see you in, in, in another video next time.

Goodbye.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

A Little About Intelligent Query Processing Limitations In SQL Server

A Little About Intelligent Query Processing Limitations In SQL Server



Thanks for watching!

Video Summary

In this video, I dive into the complexities and limitations of Microsoft’s intelligent query processing features in SQL Server. Erik Darling shares his experiences working with clients where these advanced features often fail to kick in when they should, leading to suboptimal query plans. I illustrate this through examples involving batch mode on rowstore and adaptive joins, highlighting how various factors such as the version of SQL Server, compatibility levels, and even scalar UDFs can impact performance optimization. The video also delves into using extended events to track down these issues, offering a glimpse into the intricate heuristics that determine whether or not these features are utilized.

Full Transcript

Erik Darling here with Darling Data. I have no funny introduction to this video at all. Nothing. I got nothing. I am recording this video shortly before dinner, so two things are going to be true about it. One is I’m going to really want to get to the end. And two is I might be a little distracted because I’m thinking about some steak here. Once, once Erik Darling has steak on the brain. It’s tough, tough to get it, tough to get it off the brain, really. Unless you have a, unless you use a lot of butter. I apologize to any vegans in the house. I am not one of you. It’s, it’s cool that you’re, you’re, you, just not my thing. So we’re going to, I want to talk about some of the other things. I want to talk in this video a little bit about the fickleness and the limitations of some of Microsoft’s intelligent query processing features. You know, Microsoft loves to put its, put its hands on its tips and stick its chest out and act like it has this, you know, fully automated performance fixing database system. But in reality, there are a lot of limitations to these things. And a lot of the times, man, it’s, they just don’t kick in. These things don’t kick in when they should.

So, uh, we’re going to look at two things. Uh, we’re going to look at, uh, an example of where batch mode on row storm should have kicked in, but didn’t. And an example of, and then I’m going to show you like some of the extended events you can use, uh, just a short, um, a short list of things you can use to try and like, just to give you an, show you like the, the, the amount of stuff that has to go on in order for SQL Server to actually engage these intelligent query processing features. So, uh, for the first demo, I have, uh, I have a votes table indexed in two ways. Once on vote type ID and creation date, and then once on creation date and vote type ID. And I’ve hinted three queries to use, uh, the clustered index and then those two nonclustered indexes up there. And the idea of the query is, uh, there’s the index hint. That’s the only thing that’s different about the three iterations here, is to find the min and max creation date grouped by vote type ID, right? Right in there. And so we look at these query plans, uh, there’s going to be one, the, the one in the middle is different. And the one in the middle is different because, uh, it did not use batch mode on rowstore. The first one right here, uh, this is in batch mode, right?

You can see the batch right there, batch mode on rowstore. Good for us. This hash aggregate also executes in batch mode. And just as a sort of like SQL jeopardy note here, uh, in up until batch mode came along, the only way that you could do a global aggregate like this was with the stream aggregate. You couldn’t use a hash aggregate for it. Um, so that, that was written in like a whole bunch of SQL Server literature for like, you know, a very long time because most of it, or actually all of the good, all of the good SQL Server books were written way before batch mode came out. Actually, no, that’s not true.

The last good one was 2012 when batch mode first crept out of existence in SQL Server. It wasn’t, but batch mode wasn’t very good then. So, um, yeah, 2012 was about the cutoff for good SQL Server books with the notable exception of Great Post, Eric, available on Amazon. Um, I forget how much it is. It’s like, whatever. Don’t, it’s a good book. Buy it. If you buy it, I’ll sign it. How about that? Uh, but anyway, uh, if we look at these plans, uh, and this isn’t the end of the world for this query, right? It’s not, this doesn’t, it doesn’t make a huge difference here. Uh, the, what the query that doesn’t use batch mode on rowstore, uh, takes 2.1 seconds. The queries that do use batch mode on rowstore take about 1.8 and 1.7 seconds.

So it’s not a dramatic end of the world here. It’s just one of those crappy, it’s one of those crappy things where like, you know, you do your job and you create a good, useful index and SQL Server, like, because of, you know, various costing things, uh, does not choose like, you know, like what would be a, you know, overall like a time saving query plan. This is a very small example of this happening. Uh, I work with clients all the time where there are much, much bigger examples of, uh, these intelligent query processing features not kicking in when they should. And, you know, like me having to like, like to play all sorts of tricks and do all sorts of weird stuff in order to get SQL Server to, you know, intelligent query process something.

Right. Which isn’t a good thing because I dropped out of high school the day I turned 16. Uh, I, I, I’m turning 44 at past summit this year. It’s been a long time. This year I’m celebrating my birthday at past summit when I was 16, I celebrated my birthday by not going to high school anymore. Um, so it’s not a good sign when this guy has to go in, in, in an intelligent query process SQL Server because, uh, it shouldn’t be that way. I should be giving SQL Server wedgies and smoking cigarettes in a gas station parking lot. What should be happening?

So, uh, this is just like a, you know, kind of a crappy limitation here, uh, or kind of a crappy example, but SQL Server not, you know, doing something that it should because like we’ve got cost it out. Right. So like the, for the second query that doesn’t use batch mode on rowstore, uh, we use the even Steven index, right? And the even Steven index, if we look up here has vote type ID first and then creation date. And what having vote type ID first in the index does, of course, because you watch all my videos, I don’t want to say religiously, um, dutifully. No, no, that sounds gross too. Uh, uh, enthusiastically. Yeah. There we go. Um, cause you have a bad day if you didn’t spend it with me, wouldn’t you? Okay. Uh, so because this, this index puts vote type ID in order, right? The orders, the vote type ID column, leading column of the index vote type ID is in order because this column is already in order. It’s, it’s, it’s SQL Server thought it would be really cheap to just stream it right into the stream aggregate because the stream aggregate expects ordered data.

That’s why things were a little bit less things were a little bit choppier on these ones. And we didn’t choose batch mode or rowstore because it was just like, well, I, I, I, I, I stream aggregate. I have to sort 53 million rows. Not that SQL Server has been smart enough in the past to not sort 53 million rows. Uh, there was that merge join video from a little while back, but anyway, a digression here. Uh, so that, that, that, that’s this thing, right? And there are all sorts of, um, extended events related to, uh, to the intelligent query processing features to either tell you like sort of about what they do or like why they weren’t used or, you know, what, what needs to happen for, uh, for them to, uh, come into play. And you can usually track down these extended events. And, uh, you know, so this top one here is bat is the batch mode on rowstore heuristics.

And there are a bunch of reasons. There are a bunch of things that SQL Server looks for. And holy cow. What? This is when every other thing is true, but you still dinner. Ah, the Welsh keyboard kicked in, uh, uh, did not cross the, did not cross the road.

Oh, did not cost the, cross the cost threshold. Oh man. You should have stapled it to a chicken. Anyway, uh, so this is the list of stuff where, um, you know, that, that would lead SQL Server down the path to choosing, um, to choosing batch mode on rowstore.

Uh, the second query result is a list of, list of reasons why, oops, sorry. I touched the wrong thing there. Oh, dear me.

All right. Now, this is a list of reasons why SQL Server might skip using an adaptive join. Uh, these are not particularly, uh, legible to human beings. Um, I do have, uh, a DBA, uh, sorry, I have a DBA.

I have a, uh, Q, self-answered Q and A on Stack Exchange where I do give examples of a lot of the, what these things mean. Uh, because at the time I was quite fascinated by them. Um, and then, uh, this bottom section, uh, just talks a little bit about memory grant feedback stuff and, like, the steps it goes through.

Uh, and, oh, look, we have deprecated things in here. Um, replaced with current execution count and adjustment count. Okay.

So, so these things replace those two other things. So, don’t pay attention to these things, apparently. These things, no good. Uh, pay attention to the other things. They are not deprecated. Great.

Uh, but this, like, talks a little bit about, you know, the, like, like, the, the process that memory grant feedback goes through as it’s just in query memory. So, there’s a lot of stuff in here. And the, and these, these heuristics and limitations are, of course, hinged on, uh, or predicated on a bunch of other stuff.

Just, like, for example, you have to be on SQL Server 2019 or better or some cloudy equivalent. Don’t use, don’t use managed instance. Uh, some cloudy equivalent where, like, you’re on an enterprise-y thing.

You need, uh, to be in a modern compatibility level, like 150 or, uh, 160 for a lot of these. Uh, some of them are earlier, uh, like, adaptive joins are, like, 140 for SQL Server 2017. Uh, there might be a couple of few from 2016, but you do have to be in, like, 130, 140, 150, 160.

Something rather new. And each, and for each version of SQL Server that introduces new intelligent query processing features, you have to be in a higher and higher compatibility level in order for those to be unlocked.

That’s why for a lot of queries, when I want to differentiate things, you’ll see me use the, the use hint that sets the, the, the, the query compatibility level to a higher compat level so that I can experiment with the batch mode on rowstore stuff and see, like, the, like, the before and after. It’s a lot easier to do it that way than it is to, um, do it in the reverse, at least for me. So there’s all sorts of, like, so, yeah, back to what I was saying.

Compatibility, enterprise edition compatibility levels, version of SQL Server. So there’s, there’s, like, a lot of stuff that has to, you know, already be in place before any of these intelligent query processing features will even think about, thinking about anything. So if you’re, if you’re out there in the world and, um, you know, you’re, you’re having a hard time with SQL Server and you’re, you’re wondering why all of these, like, awesome intelligent query processing things aren’t just making your life so much easier and better.

It’s probably because either Microsoft doesn’t think you spent enough money on SQL Server, which, you know, if you’re on standard edition, you get what you get. You don’t get upset. You pick the cheap one, right?

Not me. You may not be on a modern enough version of SQL Server. You know, 2019 and 2022 have the most up-to-date intelligent query processing features. And then lastly, you, your database might not be in a compatibility level that allows for any of these intelligent query processing features to be used.

And all of these things are going to hold you back. And there are various reasons why you might be in that, that condition, right? It could be a vendor thing where they’re like, no, you can’t use a newer version of SQL Server.

No, you can’t use a higher compatibility level. We haven’t certified, blah, blah, blah. You know, there’s just all sorts of crappy reasons in the world for that stuff. You know, you might be able to affect some things by using query hints like, like I have on these queries to use a different compatibility level just at the query level.

That might be useful enough for you. That might get you across the finish line in a few places. But, you know, there are times when I see entire workloads where everything is right.

The SQL Server 2022, Compat Level 160, you know, the Enterprise Edition, not managed instance. And still, these heuristics, these intelligent query processing features are either, or rather, the IQP features do not meet the heuristic goals that they need to, to kick in. Or there is some limiting thing in the query itself that prevents any of the, some of the IQP features to kick in.

I think, you know, adaptive joins, we looked at a bunch of reasons there. But, you know, I think, you know, probably the biggest one for most people is the scalar UDF inlining, where the list and the number of restrictions on it seems to grow with every cumulative update. And that, you know, that’s one that I think a lot of people in SQL Server world had very high hopes for, because scalar UDFs really are the damn devil when it comes to performance tuning.

And, you know, having those not be as big a devil as they are would be nice. But, you know, we can’t have nice things. So, anyway, that’s about that.

I do, I do hope that if you, if you, if you need help with these sorts of things, you will, you will, you will think of your, your humble, young, handsome consultant, Erik Darling. Because I, I do, I do, I do tend to help pretty well with these things, despite, despite being a high school dropout. So, because they’re tough problems, and it seems like every time Microsoft adds a knob, 17 things become far more complicated to track down and pinpoint.

And it gets, it gets to be rather cumbersome. So, anyway, I’m going to go eat dinner now, so I can get the steak off my brain and into my belly. And then I can record stuff that makes probably more sense and is more coherent.

And, uh, no, I’m going to go eat dinner now. A little more focused. That’s the word, focused. So, thank you for watching.

I hope you enjoyed yourselves. I hope you learned something. I hope that you will like me. Love me.

I hope that you will like this video and subscribe to my channel. And join the nearly 4,000 other data darlings who have, who have, who have done that. Who have subscribed to my channel.

It’s probably the same, like, five people who like everything. That one, one, one person who, who dislikes everything. It’s okay. I still like you. I still like you.

I will still, still give you a hug. Just for, just for showing up. Just, just for giving me the view. Even though you gave me a thumbs down. Even though you gave me the view. I’ll, I’ll give you a hug for that.

Nice, big hug. So, anyway. Yeah, it’s steak time. I’m out. Goodbye. Thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Building Reusable Queue Tables and Procedures In SQL Server

Building Reusable Queue Tables and Procedures In SQL Server



Thanks for watching!

Video Summary

In this video, I delve into the intricacies of building reusable queues in SQL Server, a topic that is both fascinating and crucial for maintaining efficient data processing workflows. I explore various queuing strategies, including different ordering methods like FIFO (First In, First Out), LEFO (Last Entry First Out), and FAFO (First Available First Out). Additionally, I discuss the importance of choosing appropriate data types for your queue table, particularly emphasizing the use of `bigint` as an identity column to avoid potential scalability issues. The video also covers how to efficiently manage processing eligibility and track duration using computed columns, along with detailed explanations on locking hints and isolation levels to ensure smooth operation in high-concurrency environments.

Full Transcript

Erik Darling here with Darling Data, and in the interest of fiscal solvency, we’re going to talk about building reusable queues. What does one have to do with the other? Absolutely nothing. I just blurted that out and went with it. And I don’t feel like starting this thing over. So you’re going to just have to cope with that mentally somehow. So there are two aspects to building. There are a lot of reasons for building. queuing processes in SQL Server. And there are all sorts of different ways to do it. There’s like first in, first out. There’s, you know, first, last in, FIFO, LEFO, FAFO, last in, first out. I’m going to skip over the third one. And there are all sorts of like, you know, different ways you can choose to dequeue something. And by dequeue, I don’t mean Dairy Queen. I mean to like remove something from the server. Eligibility for queuing. For me personally, you know, it really does depend on why I am the intent of the queue. There are times when I would want to, you know, just delete something from a table when I have just when I have started processing it. That’s a completely legitimate way to do it. And this in this example, I’m just updating stuff. And sometimes the update thing is nice because it like rather than just delete an item from a queue table, you can have the queue table retain some information about the queue, the queuing process, right? So it’s like you can have some, you know, generally useful stuff in there. And that’s like, that’s the way that I kind of like to do it. You can kind of see that in the setup of my queue table here.

So I am going to do what everyone, every good little SQL Server DBA does. And I’m going to have an ID column. Mine’s going to be a bigint, because I believe strongly in using bigints as for identity columns. The process of having to fix that is a real nightmare. And I wish Microsoft would make it less of a nightmare. To me, that’s a pretty major scalability issue that they just ignore. And what do you call it there? Yeah, if like, I’m saying for like domains, like things that you understand can never go past a certain number, like, like, you know, dispositions of things like, you know, like, like, constants of the Stack Overflow database, vote types, or post types, you know, you can never hit, like, over 2 billion types of posts, you might hit over 2 billion posts, that’d be a lot of posts, that’d be a lot of questions, but you’re never going to have like more than like, you know, an integers amount of types of posts.

So for when I’m when I’m making a table with an identity column, I’m gonna I’m gonna suck up, I’m gonna suck up it, suck it up and use those four extra bytes. Oh, my God, the four extra bytes. And I’m gonna I’m gonna use a bigint because it’s the funny thing to me is that, you know, when when you start talking about data types, you have the absolute worst people on Earth will start talking to you about bytes.

and especially when you talk about in some big ends are like, well, integers are four bytes, big ends are eight bytes, you know, disk, disk, disk, like, well, okay, fine, it is legitimate. The thing is, about the point where you’re going to start maybe caring about the four extra bytes that you’re storing, is probably going to be right about the time that you’re starting to run out of integers.

And when you have those four extra bytes is available to you in bigint form, you’re going to be so much happier that you don’t have to fix that problem. So, you know, that’s that’s my thing. Right there. And so for this queue table, since we’re not doing a delete queue, we’re doing it when doing an update queue, we need we need a column like this to tell queries when when a row is being processed.

In this in this case, the thing that I’m using for the processing point is just the reputation column from the user’s table. It’s really just completely senseless. And then I have some this is this is what I was talking about where I said it’s nice to have some information about how long it took to process a thing.

So what we have is a couple couple three columns in total start date and end date and computed column that calculates the duration in milliseconds between the start date and end date. So, you know, I’m using milliseconds here because like this particular queue brought would process very quickly. But in real life, you might, you know, want to do like seconds or minutes or something, you know, get get more more human understandable numbers out of it.

And then the two two important things are when you for processing this queue table, this one probably the most important because what we what we need here is which will make more sense when you look at the queries that actually hit the table. What we need here is a really efficient way to find things that are not being processed or skip over things that are being processed and a way to order the results after that in a way that we can process the queue in order. If we were if we were going to process these things differently, for example, the lovely, talented Canadian Bertrand, Canadian Aaron Bertrand, this Canadian Bertrand is going to be his name from now on, has a series of posts about snake sorting.

And snake sorting is a way to is sort of a way to divvy up work equally amongst threads. So they all sort of finish around the same time. The context of his post was like distributing backups to threads. So like you wouldn’t have, you know, like, you know, one thread. Oh, I got an eight terabyte backup.

Oh, now I got a 10 terabyte backup. They would be like, I got an eight terabyte backup. Now you take that 10 terabyte backup and the other two threads. You take like the five, four terabyte backups and we’ll all chug through these things more equally.

So if you have that kind of knowledge about like the size of a process or the size of a thing that has to be written, you might not want to do something as naive as just say, like, you know, get just give me the next one in sequential ID order. You might want to use some additional logic like Aaron does to split things up to make the thread each each thread do a more fair amount of work. So depends on how you’re dealing with that stuff.

So here’s here’s the store procedure that I will the sort of a template store procedure that I use to run the queues. And it’s just a perpetual loop until we run out of work. So and something is pretty important in here.

I’m going to split this up into two parts. The locking hints in here are particularly important, regardless of which isolation level you’re in. Because, you know, if you’re using I mean, just like option wise, if you’re using read commit, just really, if you’re using pessimistic versus optimistic, using a pessimistic option up isolation level like read committed, you’re going to have a blocking nightmare if you don’t put stuff like this in.

Right. And if you’re using an optimistic isolation level, you’re going to have a pretty hard time with a time of things if you don’t add in some additional locking so that so that like, you know, in some cases, like you’re going to want read queries to not see version data. So you have to be just be be careful with that.

If you are using an optimistic isolation level, you will probably want to use the read committed lock hint for queuing processes that goes for queuing processes. Or if you’re doing anything like building a sequence table, Paul White has an excellent article on building sequence tables where the he uses the read committed lock to great effect in there. And I think that’s probably something that you would want in a queuing process like this.

If you were doing if you were using an optimistic isolation level and if you are a good little data darling out there in data land and you are being real smart, you are using an optimistic isolation level because pessimistic isolation levels are the turds. They are nothing but problems. So, yeah, there we go.

And you know what I had known I accidentally deleted something in here that I shouldn’t have deleted. I just realized it now because I saw some red squiggles at ID integer. No.

Reputation. Sorry about that. Integer. No. No. No.

That should be. No. Like that. And then. So what this code does is it’ll look for things that it needs to process. Right.

So while we have at least one row in the table that needs processing, we’re going to enter a loop that looks like this. And this is where the index thing makes a little bit more sense. So we’re going to use a CTE here just because it’s a little bit cleaner than using a sub query or any really.

I mean, if we could use a drive table here and it would be the exact same thing. But, you know, for this, it just kind of, it does make the, it does make the sort procedure a little bit more readable. When I use a CTE here.

So what we’re going to do is just select the top one row. And we’re going to hold on to that one row. We’re going to say row lock up D lock. We’re going to hang on to that one row.

And we’re going to, again, you know, like I said, this is just very naive sorting to just get me the next, next available thing in sequential order. You might need to do this a little bit differently depending on what your process is. And then we’re going to use kind of a neat thing where we’re going to use an update to not only set a couple columns in the table equal to something.

So what we, this update will start by changing in, in the in process column to one for whatever row that we find. And it will also update the start date to the time that we started this. So that when we’re done at the, at the end, we can add a, we can update the end date column.

So we know how long something took. And then we’re going to do something that I think is particularly clever. We are going to, in the same update, we are going to update our two parameters to be equal to the columns in the queue table that we found up here for that, for the one row that we pulled out up here.

We’re going to set those equal to the column values that we pulled out there. Then, you know, just what we’re going to do is we’re going to use those as part of our queue process thing. Uh, my pretend example is just selecting a count from the users table where reputation equals whatever reputation we pulled out there.

And we execute that SQL down there. And then we use the ID column that we fetched. If this zoom, it will listen to me.

Zoom, it will ever listen to me. I’m hitting escape like every three seconds and zoom it. It’s just like, I don’t care. Uh, come on, zoomie. There we go.

All right. That took a little bit more wrestling than I cared to do. And then the final thing that we’ll do down here is update the queue table to just set the end date to when the, when this thing finishes and the ID equals the ID of the queue item that we pulled out. And that would, that will be, uh, that will be really quick.

Um, there’s also just a little safeguard, uh, there, um, to make sure that we only, uh, would only ever update a role that’s in process and not a role that is not yet being processed. Even though that’s sort of technically impossible with this. I just like to be extra safe when I’m, when I’m running these things, when I’m automating tasks like this, because if you’re not, you can run into some really weird bugs.

Um, so all of, all of the code and, uh, a somewhat deeper explanation of some of these things is available on my blog. Um, I, there are two posts that will be in the, in the show notes, as they say, about building reusable queues. I originally wrote it as a two part series.

Uh, I forget why, probably cause I needed an extra day of blogging to, to, to cover something. Uh, it’s hard, hard to tell sometimes, but I, I think it’s a very interesting, uh, problem when you have to tackle this sort of thing and doing it correctly and doing it in a way that, uh, is, is effective for high concurrency and not having a bunch of awful blocking problems, uh, is especially interesting. And that’s where, uh, table design is really important and, uh, making sure that, you know, your table is set up in a way that allows you to find the rows that you need to process in the order you need to process them as efficiently as possible.

Uh, so like, you know, if you’re doing something where there’s going to be like, you know, four or five worker threads and they’re going to like chug along on like huge tasks for a long time and then come back, the locking and blocking thing probably isn’t going to be all that important. But if you need to orchestrate like really high concurrency queuing, uh, and distribute those queues without running into a blocking nightmare, that’s where this kind of stuff becomes much more important. So anyway, thank you, you, especially you, the good looking one sitting right there, uh, for watching.

I appreciate you more than words can, words could ever possibly describe. Uh, I hope you enjoyed yourselves a little bit. I hope you learned something.

Uh, if you like this video, um, thumbs ups and helpful comments are, are always nice. I do enjoy those. Uh, if you like this sort of SQL Server content, uh, please subscribe to my channel.

You can join the nearly 4,000 other data darlings out there in YouTube land who, uh, who, who, who have done, who have done that and, uh, who, who make me feel like a, a special, helpful, useful, good person by, uh, allowing little bells to go off in their head every time I publish a video. So, uh, yeah, you can, you can, you can like, and you can subscribe. You can do, you can do them both.

And, uh, then, then you’re, then you’re extra, extra data darling-y. And I owe you a hug. So, anyway, uh, I got some more of these to record.

I got to bulk up things, uh, before I go away for a little bit. So, uh, this one’s getting canned right about here. Thank you.

Thank you.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Join Me At Data Saturday Dallas Sept 6-7

Spring Training


This September, I’ll be presenting my full day training session The Foundations Of SQL Server Performance Tuning for Data Saturday Dallas.

All attendees will get free access for life to my SQL Server performance tuning training. That’s about 25 hours of streaming on-demand content.

Get your tickets here for my precon, taking place Friday, September 6th 2024, at Microsoft Corporation 7000 State Highway 161 Irving, TX 75039

Here’s what I’ll be presenting:

The Foundations Of SQL Server Performance Tuning

Session Abstract:

Whether you want to be the next great query tuning wizard, or you just need to learn how to start solving tough business problems at work, you need a solid understanding of not only what makes things fast, but also what makes them slow.

I work with consulting clients worldwide fixing complex SQL Server performance problems. I want to teach you how to do the same thing using the same troubleshooting tools and techniques I do.

I’m going to crack open my bag of tricks and show you exactly how I find which queries to tune, indexes to add, and changes to make. In this day long session, you’re going to learn about hardware, query rewrites that work, effective index design patterns, and more.

Before you get to the cutting edge, you need to have a good foundation. I’m going to teach you how to find and fix performance problems with confidence.

Event Details:

Get your tickets here for my precon!

Register for Data Saturday, on September 7th here!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Date Math Annoyances In SQL Server Queries

Date Math Annoyances In SQL Server Queries



Thanks for watching!

Video Summary

In this video, I delve into common pitfalls and best practices when working with dates in SQL Server queries. Erik Darling from Darling Data shares his insights on avoiding implicit data type conversions by using explicit date values instead of relying on zero as shorthand for 1900-01-01. He also highlights the importance of performing date math operations on parameters or expressions rather than columns, which can significantly improve query performance and accuracy. Additionally, Erik discusses the nuances of the `DATE DIFF` function and emphasizes the need for precise assumptions when using it to ensure correct results. By sharing these practical tips, I hope viewers gain a deeper understanding of how to handle dates effectively in SQL Server queries.

Full Transcript

Erik Darling here, Darling Data, doing Darling Data stuff with SQL Server. Apparently I have to get on an airplane today. What day is today? I guess we’ll find out because we’re going to talk about dates. I’m dumb. So, I’ve talked about some similar stuff on this, on the Chan before. I hate myself. But, I’m going to talk about things in a slightly different way here. Mostly because I still see a lot of screw-ups and weirdness and sort of buggy-ness with date math and date function queries. And I just want to kind of talk through a little bit of what goes on with date math and date functions. So, first and foremost is a giant pet peeve of mine, at least in recent years, because for a long time I kind of had no idea what what an issue this sort of thing could cause. But, more recent years and experiences have taught me that relying on implicit conversion of data types can really cause strange incorrect results bugs. And you don’t want to do that. For you or anyone, really. Why would you do that to a person? Be nice. Be kind. So, I see a lot of date math queries where people use zero as shorthand for 1900-01-01. And of course, there’s a lot of data in very simple circumstances. But, in other circumstances, you can run into some big issues with this. It would be a bit too much to gin up a demo to show you what those issues are. Just trust me, do yourself the favor of not being lazy and using this shorthand of zero to replace 1900-01-01. And we should probably make a note here and say, just so there’s no confusion, do this instead. Yeah.

There we go. That looks better now. So, these two queries will do about the same thing. They’re going to give me adding 45,477 days to 1900-01-01, which gives us the exact same value for both of them. We get the same results back here. But, like I said, there are more complicated circumstances where you might hit some weird bugs with relying on this implicit conversion from zero to 1900-01-01-01. So, always, please explicitly do this. Maybe even, like, say, convert date or date time 1900-01-01 to make extra sure. I think I’ve gone through all of my GitHub repo and replaced any place where I may have made that error in the past. But, if you see anything in there, please let me know and yell at me.

So, another thing that I also see quite a bit of is when people need to do things like compare, like, you know, they want to go back a certain number of days or months, and they end up doing some really shoddy date math like this. Now, this won’t cause incorrect results, but, you know, in the presence of a half-decent index, performance is not going to be great with this because you’re performing the date math functionality on the column, and you’re comparing that to just, like, this scalar thing over here.

And your life for the SQL Server query turns out a whole lot better when you do something like this, where you compare, like, the column in the table to the result of date math operating only on whatever parameter or variable you pass in. So, if we just look at the results of these two queries, and I stuck recompile hints on there so we don’t have to deal with the local variable effect with anything. So, you know, the first query where I do the date math on the column, that takes about two seconds.

We have to scan the whole column. No one is happy. On the second one where I do the date math on the, right, there’s a column compared to the date math. This does an index seek and runs for about 100 milliseconds.

So, that’s a much better way of doing things. Where it gets a little tougher is when you need to compare columns that are in different tables, right? So, saying where p.creation date, the date dip between, let’s say this query, well, this query specifically is looking for posts where, like, someone came along, like, a year later and commented on them, right?

So, someone, like, just wandered in drunk and was just like, this is out of date now. Something like that. And, like, if these were just two columns in a single table, you could, of course, what do you call it, create a computed column on them, index it, and be fine.

But, since they’re in different tables, really the only way to do that without, like, you know, creating a new table and storing both sets of data or completely screwing up the design even more of the post table and somehow adding comment dates in there, which blows my mind how little sense that would make. Like, give me a twitch. You know, like, the only realistic thing you could do would be to create an indexed view that would, you know, give you the results of this back all at once.

Now, this, I get it. I’m with you. This is not a very useful indexed view, right?

Like, because we’re just, unless you just need the count, we should probably involve some other things in here, like, probably, like, the ID column from the post table and the comments table, so we could, like, look those things up in the base tables, something like that. But this is just one quick way of showing you that, you know, an indexed view can be a pretty good way of making the non-sargable a little less painful. Another thing where date math gets sort of weird on people is the date diff function.

So what a lot of people are surprised by with the date diff function is that it does not actually look for, like, the duration necessarily. Like, the date diff in years does not look for, like, a 12-month difference. The date diff in month does not look for, like, a 28 or 30 or 29 or 31-day difference.

The date diff in days does indeed look for a one-day difference, but that’s the, but, you know, if, you know, we were to get into a situation with, like, you know, let’s say that we were looking at, like, 12-31 at, like, 11-59. And 2020-01 at, like, 0-0-01. Or even just, like, 0-0-0.

Actually, I don’t even know if that’s going to work. We’re going to find out together. Yeah. So, like, you know, there’s a one-minute difference between these, but SQL Server’s like, no, it’s a full day. And there’s, like, you know, a one-day difference between these, but SQL Server’s like, nope, that’s a full month.

And, you know, again, there’s a one-year difference between these, but SQL Server’s like, no, it’s a full year. So, the date diff function is a little weird in that way. And so, you have to kind of be careful about being precise about what it is you’re looking for.

All right. So, if you look at, like, this query, and you’re trying to figure out, like, all you want to do is get, like, this count that we were talking about. That’s about 20,000 rows.

If you wanted to write that using different logic, right, if you wanted to write that in a different way, you would have to do something like this. Where you would have to add a year minus one and then flatten the year, flatten the creation date column to the year. And it just gets very, very complicated to try and replicate that logic with, like, explicit query syntax.

Like, just doing something as simple as this just doesn’t work, right? If you look at this, the results of this, that’s a whole bunch more rows than we got from either one of those. So, this isn’t, like, you know, depending on what exactly you’re looking to count up and return in these things, you need to be really careful with how you write these queries.

The performance, of course, doesn’t matter, but the results here are different, and they do matter. Now, if we were to look at some of this stuff in here, the results wouldn’t look too weird for the most part. But if you kind of start scrolling down a little bit in the results and you look at sort of really down at the bottom is where the interesting stuff is.

So, where SQL Server is telling you that there’s a one-year difference between things, it’s also telling you that there’s a 23-month difference. So, there’s, like, almost two-year difference there, right? And there’s the 723, and I’m not going to figure out how many 30s go into 723, but it might start to creep up above these things.

So, you know, I realize that these things are documented, and, you know, if you’re keen on reading the documentation, you might think this video is stupid and be like, oh, the video is documented stuff. A lot of people don’t get that involved in this, not until someone tells them that stuff gets weird that they start to worry about stuff getting weird.

So, at the end of the day, when it comes to doing date math, one, please use explicit dates. Do not use numbers like zero to sub in for dates and rely on an implicit conversion for that. Please don’t do date math on columns as much as you can avoid it.

Do date math on, like, whatever parameters and expressions you need to and compare the column to that. If you need to do date math across tables, index views can be a good way of making that a little bit less painful. And if you need to use the date diff function to figure out what span of time exists between things, just be very careful in what your assumption is about exactly how much time has to be between two things in order for date diff to tell you that, like, you know, there’s a month or a year and a day between them.

Because, again, going back to this, you know, there’s a whole bunch of stuff in here where, you know, you know, like, yes, I agree. Go away, GitHub Desktop.

There is, you know, 2020 is a year ahead of 2019. But, like, there’s really a one-day difference between these dates. You know, and, you know, there is a, the first of the year, the first month of the year is a month ahead, is a month different from the last month of the year.

But, again, there’s only a one-day difference. And, you know, I agree that, you know, the first of the year is indeed a day ahead of the last day of the year. But there is only a one-minute difference between these.

So just be careful, like, be very specific about exactly how much time has to exist between two things before it is considered a one, like, a month or a year or a day difference. You might need to use, like, seconds to figure out it.

Like, you might have to do the date diff in seconds and then figure out how many seconds are in the span of time you care about, whether it’s a day or three days or a month or, like, how many days in a month you care about or, you know, like, the number of seconds in a year.

You might need to do a lot of that math to figure out exactly what it is you need to be precise about it because date diff on its own is not very precise. Anyway, thank you for watching.

Hope you enjoyed yourselves. I hope you learned something. I hope you read the documentation in the future so you can call me stupid for explaining parts of things that, how things work or something like that.

Of course, I thank you very, very deeply for watching. This is my last day on Earth. This is the last thing I record. Gosh, that’d be sad.

So I think I better not. Maybe I should record something else real quick after. Anyway, if you like this sort of SQL Server content or you like this video, you can like and subscribe to me and my channel, and you can get notifications when I drop the SQL Server content, and you can leave me likes and comments when I do, and then everyone’s happier, right?

It’s cool stuff. Anyway, thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.