In this video, I delve into an intriguing and somewhat tricky aspect of SQL Server functions: making them wait for a specific amount of time. Erik Darling from Darling Data explores how to implement such functionality within user-defined functions (UDFs), which isn’t straightforward due to the limitations imposed by SQL Server. However, with some clever workarounds and insights from Alexander Kuznetsov’s blog post, we can achieve this through while loops that essentially create a delay. I walk you through creating these functions and demonstrate their use in practical scenarios, showing how they can be used for various sneaky and interesting purposes. If you’re curious about the full extent of what can be done with such functions, mark your calendars for Seattle to attend Past Data Summit where we’ll dive deeper into these and other advanced SQL Server techniques!
Full Transcript
Erik Darling here with Darling Data. In this video, we’re going to talk about something exceedingly tricky that you can do. It has to do with having functions wait for a specific amount of time. We’re going to talk more about this. It’s really hard to do a pithy intro here. But we’re going to talk about it. But before we do, we have some things to talk about. Like this channel. If you would like to sign up for a membership to this channel for as low as $4 a month, you can do that by clicking the Become a Member link in the video description. If you don’t have $4, even for one month, perhaps that just cuts into the ramen budget a little too heavily. You can do all sorts of wonderful free things that let me know you care. You can like my videos. You can comment. on my videos. And you can subscribe to the channel. I do like seeing all those things. It brings me a very specific type of joy. If you need help with your SQL Server, probably not anything that we’re going to be talking about today, but I am a consultant and I do some things with SQL Server very well. I don’t set up availability groups. I don’t really sit there and mind your backups. I don’t want to talk about capital R replication. But I can tell you, if your SQL Server is healthy, I can tell you if your SQL Server is as fast as it could be. The answer is no. And I can do all sorts of other things like make it faster. I don’t know. Some people enjoy that. Some people prefer that.
I can even reduce your cloud bills. How about that for a sales pitch? You want to give less money to Microsoft or Amazon? Call me. We can do that together. If you need some high quality, low cost training, you can get all 24 hours of mine for about $150 USD by going to that link up there and then using the discount code springcleaning. There is, of course, a link to automate all of that wonderfulness in the video description as well. So, and probably at this point, this might be past Data Summit. I don’t know. Maybe a little bit before, but you can still catch me there, November 4th and 5th. If it is past November 4th and 5th and you didn’t go to Seattle to pass Data Summit, you missed it. Sorry. Can’t do anything to help you there. But runner-up prize is if there is a SQL Saturday or Data Saturday or whatever Saturday event near you that is in search of a pre-con speaker, let me know. I will do my best to get pre-coned there.
But with all that out of the way, let’s talk about what I want to talk about, which is how you can get a function to wait for you. So, I realize that the logic in this function is not complete. Right. It just says, if delay is greater than zero, do this thing. Otherwise, we’re going to have sort of whatever in there. We could, of course, fix that with like, you know, putting 0000000 in there.
And then it would only change if delay was greater than zero. Otherwise, we would wait for zero seconds. Maybe that is enough. Actually, this function is now Turing complete. We’ve done it. Good job, us. But the problem is that if you try to do this in a scalar UDF, we’ll get an error message. It’s saying the invalid use of a side-effecting operator wait for within a function. That’s no good, is it?
We seem to have hit a wall here. Hmm. What can we do? What can the clever and devious mind do?
Well, a very clever and devious mind, long before I started thinking about this, actually had an example of what you can do. Smart guy named Alexander Kuznetsov. Kuznetsov. Something else. Probably I was pretty close on both of those. Left SQL Server for Postgres around 2013 or 14. Hasn’t been heard from since. Just kidding. He’s doing his thing.
But actually, maybe now that Pass has a Postgres corner, he’ll be back. I would love to give him a very big hug, probably out of nowhere and terrify him. But anyway, a long time ago, he wrote a blog post about scalar UDFs and he actually did the hard work for me.
And all I had to do was find a link to the hard work because everything is in the Wayback Machine now. But what you can do in a function is a while loop that for a, you know, you declare all this stuff and you set a delay in the function input. And this is just the default. You can change this, of course, when it actually runs.
But then you say while the current date time is less than that thing, you just, you know, run this stupid loop thing. And all you’re doing is setting a bit to null over and over again. So it’s very, very little work. But we can test that out and we can say, let’s just make sure this function is actually in there and created.
Sometimes weird things happen. Who knows? Who knows SQL Server? But if we say wait for three seconds here and we keep an eye on the clock that’s sort of next to me over here, you’ll see that that waited for exactly three seconds and then returned our column, right?
That’s pretty cool. And we can also have that, we can also have that act as input from a select list. So if I say select one, union all, select two, union all, select three, this function will wait for one second and then two seconds and then three seconds.
And we can, we can actually test that out by running this. Did I run that? No, I didn’t. I just hit R. Good job. Finger was off by one.
But don’t worry, this function will run for exactly, sorry, six seconds, right? To return those three rows. Now what can you do with something like this? All sorts of interesting, sneaky, outrageous things.
But you’re going to have to come to Seattle. You’re going to have to come to Past Data Summit in order to see all of those sneaky, outrageous things in action. So I suggest you buy your plane tickets now because it’s getting kind of late in the day.
It’s time to boogie. So anyway, thank you for watching. I hope you learned something.
I hope that you will be titillated to the point of travel by what I’ve discussed here. And you’ll be looking forward to seeing just how many awful things you can do with a function like this in SQL Server. Because trust me, there’s a lot.
Anyway, thank you for watching.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.
How To Use A Numbers Table To Replace WHILE Loops In SQL Server Functions
Thanks for watching!
Video Summary
In this video, I dive into the world of scalar UDFs that contain while loops—something I frequently encounter when working with older applications on SQL Server. These functions, often used in legacy systems from a time when developers didn’t fully grasp set-based thinking, can wreak havoc on performance as databases grow and evolve. To address this issue, I demonstrate how to rewrite these functions using numbers tables or tally CTEs, which allow for more efficient set-based logic that significantly improves query performance. By sharing practical examples and tips, I aim to help you tackle similar challenges in your own projects. Whether you’re a seasoned SQL Server professional or just starting out, this video offers valuable insights into optimizing function performance through modern techniques.
Full Transcript
Erik Darling here with Darling Data. And continuing with this week’s theme where I will no longer have to apologize for the length of my videos, we’re going to have another short one about how to rewrite functions with while loops in them. Because this is something that I end up having to do a lot for clients who have, you know, these, you know, I don’t know, 10, 15, 20, 25 year old applications. that have been on SQL Server since it came on floppy disks. And back then, people just didn’t know any better. I guess data was all small and easy enough that, you know, functions didn’t really do much of anything. And, you know, now that their databases are all growns up, performance is a wreck and scalar UDFs are often the case and cause for why. And we’re going to, we’re going to talk about that. So, if you want to join, like 20 other people who have memberships to this channel, click the link in the video description, it’s like four bucks a month on the low end and like 10 bucks a month on the high end, you can, you could spend that money in much worse ways. But I understand not everyone is charitable. And with the holiday season coming up and inflation being what it is, if you have to choose between buying your kids a new sock and giving me four bucks, well, I understand.
If you don’t feel like doing that, like, comment, subscribe. It’s all fun. Especially the commenting part makes me feel a bit less lonely. Of course, the comments are usually where I have to apologize for the length of my videos. If you need any SQL Server consulting help, this is the stuff that I’m great at. If you need anything else, I can probably do that too. And as always, my rates are reasonable. If you want some very low cost, very high quality training, you can get about 24 hours of it for about 150 USD at that link with the combination of that link and that, that coupon code, which is also available in the video description.
It’s your lucky day, I suppose. This November 4th and 5th, I will be doing full day pre-cons with Kendra Little at Past Data Summit. That is coming up very soon and it’s going to be a lot of fun. And I hope to have lots of interesting content from Past Data Summit for you. Live! Coming to you live from Past Data Summit. Anyway, let us get on with our party here.
Let us not accidentally click one more time and fade to black like amateurs. Let’s make sure we do this right. Now, a lot of what I end up having to fix for clients with scalar UDFs is some kind of, I mean, it could be a multi-statement table diode function too. They both have sort of the same problems when it comes to wrecking query performance.
You know, a lot of the, a lot of the multi-statement table diode function ones that I have to fix are string splitters, which are, you know, always a fun challenge. Some of them are string concatenators. There’s all sorts of fun things that happen when, when you start getting into that stuff and the, and the particulars of that.
But, you know, a lot of it too is, you know, stuff like this. So I’m just going to actually just show you because it makes things easier. In this case, you know, like a lot of these functions might be used to like strip characters out.
And if you go to my GitHub repo, which there’s a link to it somewhere, I’m sure, you’ll find some functions from me that can strip letters, strip numbers, or match a pattern and strip that pattern out. So there’s a few good things there.
But this is just kind of an abridged version of that where, you know, there will be some function from the, you know, from the beginning of SQL Server time that has a while loop in it. And it’ll iterate over every, you know, character in a string and, you know, figure out if the character in the string matches some pattern or whatever.
And, you know, back in the day, a lot of people found that a very easy and approachable way to do things. Because, you know, not a lot of people were hip to the whole, you know, databases, think in sets, don’t write procedural code and functions and then expect it to perform well thing.
You know, there were a lot of people who were missing that part of their brain because it hadn’t been invented yet. It was just an empty space in their head where they were just like, oh, we’ll get a part in here someday. But, you know, here’s the result of the function.
And I just want to point out, if you’re ever, like, rewriting functions, this is the wrong way to benchmark them. Because, you know, if you’re just, you know, fixing one single string, you’re not going to see an appreciable difference. You really want to incorporate the function into the queries that are calling it.
Not just, like, you know, do stuff like this to, like, unit test it. Test it for correctness. But this isn’t a good performance test right here. We’re not going to do a performance test here because I’ve done a million function videos about performance testing.
This is just an example of how you can get out of, like, how you can rewrite functions with while loops in them so that they are less awful. So if we run this, we will get a count of five and five. One for the count of the letters, which is, we’re saying, where everything is not a number zero through nine.
And then one that is a count of the numbers where everything is a number zero through nine. And, of course, we’re using pat index to do that pattern matching. Because what else would you do to match patterns if set views at the pattern index function?
I don’t know. I don’t know. You’re crazy. But, like, an easy way of doing it is to get a numbers table involved in your database. It does not have to be a particularly large one.
It does not have to be a bajillion rows. You could have a simple, you know, like, 10,000, 20,000, maybe even 100,000 row numbers table of just the numbers one through 100,000. And you could go a really long way with doing this sort of thing, right?
You can do a lot of, you can get through a lot of characters with a 100,000 row numbers table, specifically about 100,000 characters. If you have strings longer than that in your database, obviously, you’re going to have to compensate for them somehow. But with a larger numbers table, perhaps.
But, you know, for most people, having more than 100,000 characters that you would have to do this sort of thing on is perhaps a bit much. So what we’re going to do is we’re going to use a numbers table to our advantage. And we’re going to use this function to completely replace the while loop that we had before.
And let’s just make sure that this thing is in there. And, of course, I’m going to write the query that calls this a little bit differently, where I’m going to put the patterns that I care about in this values clause and then cross-apply the function with the values of the values clause over here.
And we will get, excuse me, for each of these, we will get the correct answer. So for the ones that are numbers, we had five. The ones that are not numbers, we had five.
And, of course, these things both finished instantly, just working on a single string. Pay no attention to what you saw down there. That was something else that I was working on for a different demo for a different day. Pretend that didn’t happen.
All right. It’s our secret. Doesn’t it make you feel special that now you and me have a secret together? You’re honor-bound until death to keep that secret, you and me.
So no tattling, as they say. Anyway, if you find yourself having to rewrite functions that have while loops in them, numbers tables are very good for that.
There are great examples of trading numbers out there in the universe. I believe Aaron Bertrand has a bunch of posts about it. There are a lot of people who use them for a lot of good things. They’re not always an awesome performance boon when used directly in queries, sort of like date tables and date dimension tables or whatever.
They can be very useful for things, but they can also be involved in weird performance issues because joining to those is often kind of awkward. But in this case, where we just have to use that numbers table as a sort of utility, like fake row number type thing, it’s pretty easy to do this here.
If you’re not allowed to create a numbers table in your database, you can, of course, use CTE and nest those to create a large row set of the numbers, one through whatever, in order to do this instead.
This just makes the code a lot nicer and a lot more compact. So this is one way that I help people when we need to rewrite functions with while loops in them by using either a numbers table or a tally CTE, as it’s sometimes called, so that we can do what we need to do with set-based logic rather than with procedural looping logic because that usually helps query performance a whole lot.
Anyway, thank you for watching. I hope you enjoyed yourselves. I hope you learned something. I hope that this video was a satisfactory length for everyone. Not too big.
Not too small. It’s the Goldilocks length of the video. Because I don’t think Goldilocks ever had to apologize for her length. But, I don’t know. That’s a different sort of fairy tale.
Anyway, thank you for watching.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.
Finding Bad Density Vector Estimates In SQL Server
Thanks for watching!
Video Summary
In this video, I share a script that I use in the Stack Overflow demo database to identify columns where density vector estimates might be inaccurate due to local variables. As a SQL Server consultant and trainer, I often need to create compelling demos to illustrate common issues like bad cardinality estimates, particularly those arising from local variables in WHERE clauses. This script helps me pinpoint such problematic columns, ensuring that my demonstrations are both accurate and effective. If you’re interested in supporting this channel or getting high-quality, low-cost SQL Server performance tuning content, consider becoming a member for as little as $4 a month—there’s a link in the video description to join now.
Full Transcript
Erik Darling here with Darling Data. I’m going to remember to flip the switch on my thing this time and we’re going to get started. I had a whole funny thing to say, but that took the, took really took the wind out of the sails there. So screw it. We’re just going to keep going. In this video, I’m going to show you and share with you a script that that I use locally in the Stack Overflow demo database to find columns where the density vector estimate would not be good. And the reason why I, I had to, I did this is because, you know, I, I am a SQL Server consultant, trainer extraordinaire, and I need to come up with good demos. And sometimes when, you know, I need to either show clients or I need to put a link in the description, put together training about things that cause bad cardinality estimates, probably the most common one that I see, you know, aside from like table variables or, you know, non-strangable predicates is when people use local variables in their where clauses. I’m not going to go into all that because I’ve got a post about that. If, if you are just so anxious to see that post, it’s on my website, erikdarling.com. The title of the post is yet another post about local variables. There will be a link to that in the description.
video description, just in case you have some contrary urges to Googling or whatever that. And, you know, I have to come up with stuff that proves out my point that, you know, for the gen in general, local variables are not a best practice to replace parameters with. You don’t want to fix parameter sniffing that way. It’s not, you won’t have a good time. And so I wrote this script to do that. But before we go look at the magnificent majesty that is that script, let’s talk about how you and I can get closer. If you would like a membership to this channel for as low as $4 a month, you can click the link that says like join now or something in the video description. And that should bring you right to where you need to go. So, uh, as usual, all of this content is free of course. Uh, and if you just want it to, you know, keep leeching off my hard work, uh, you could at least like comment and subscribe so that, uh, I feel a little bit less lonely in this crazy mixed up world. Uh, uh, I don’t know. That’s good enough there. Uh, if you need SQL Server help, if you’re in the market for consulting, I am pretty good at all of these things.
Actually, I’m very good at all of these things. I’m better than pretty good, very good at all these things. And, uh, you know, if you, if you, if you decide to do any of this with me, you don’t have to do the other stuff. Uh, if you would like some high quality, low cost SQL Server performance tuning content, well, what do you know? As a SQL Server consultant is noted by the last slide. And trainer extraordinaire is noted by this slide. You can get all of mine for life for a 75% off, which means about 150 USD at the end of the day. Uh, discount code there link up there.
Of course, all of that in the video description. Uh, I don’t know how many more of these videos I’m going to actually have this information in. Cause at this point I sort of forget where I have these scheduled out on the blog and where I have these scheduled out in the month. So, um, November 4th and 5th past day to summit me and Kendra little two days of performance tuning, train, performance tuning, pre-cons, not performance tuning train wrecks.
Uh, I mean, you, you have the train wrecks. We have the performance tuning. Uh, so that’s, that’ll be fun. Um, uh, hopefully by the time you watch this video, there’s still time to buy a ticket. Okay. So with that out of the way, let’s get on with the show and look at this fantastic script that I wrote. Um, I forget when I wrote it, but anyway. Um, so full caveat here, I, I run this script specific to the stack overflow database.
And because I’m not starting with any statistics, uh, I actually drop all the statistics and indexes to, before I run this, I have to do some initial stuff in here to create, uh, statistics on all of the columns that I care about. So I have some preamble stuff in here that will create statistics on everything. Please review the script carefully. If you already, if you already working with a database, you’re not going to want to, uh, run drop indexes to drop indexes and statistics. You probably hopefully don’t even have that installed on your production server slash database. Uh, and then you’re also going to want to skip the part that does the create statistics stuff because you don’t need to create a whole bunch of extra statistics in there.
Um, so that’s the first part that kind of goes and does that. And then after the statistics, statistics get created, boy, oh boy, we’re having a great tongue day today. Aren’t we? I just flip all over this thing. Uh, I create a few tables to hold the output of DBCC commands. Um, I know that there are built in DMVs and DMFs that do some of the stats stuff now, but, uh, I like the way that these things work a little bit better. And the way that they, you know, some of the information that they give a little bit better. So I stick with the old fashioned DBCC commands.
And, uh, then I go and I cursor over the statistics that I care about and I run, um, the DBCC show statistics with stat header. And I put that in a table and then I have to do some updates to make sure that I have the right stats names and stuff in there. Uh, and then I, oops, uh, I hit the wrong button outside of the VM. And that looked funny. That looked funny locally. You probably didn’t see anything. Uh, and then I do the same thing, uh, for the density vector part of DBC show statistics.
And then I do the same thing with the histogram. So I get three different DBCC show statistics components separately. And I put those into table variables because performance does not matter here. I can use table variables. It’s wonderful. But then I take all that stuff and I put all of that into a temp table and then, um, right there.
And so that’s the results of the header and the vector and the histogram. And then I run some queries to show me what comes out of that. Now I’ve already run this, so you don’t have to do most of it, but, um, there’s, there’s a few different results in here. And the one where I found really the best, um, the best demos from, and if you’ve ever watched my videos, you might recognize some of these.
This first result shows me where the, um, the guess that I would get from an equality predicate wildly messes up how many rows would actually come back from that. So, uh, this chunk in here, uh, of course, most of it on the post table, but this was all really good. Um, this one here on the votes table, uh, was good for, well, I mean, the user ID column in the votes table is all null.
So, uh, this one was questionable at best, but, uh, you know, figuring this sort of stuff out, uh, in the script was a little bit, uh, you know, a little bit more difficult than I would probably want to get into. But, uh, this top one up here on parent ID, uh, in the post table where, um, SQL Server guesses 120 rows, um, but we get 6 million rows back. That was a very, very good one.
Some of the ones for nulls are good for showing different stuff. Like if someone compiles a store procedure with a null parameter, that was good for something different than the local variable stuff. But, um, the, the local variable guest stuff for especially the parent ID one, uh, and the accepted answer ID one, those were excellent.
Uh, and those have spawned a lot of great demos. So, um, I don’t, I don’t know who is going to be interested in this code. I don’t know if it’s going to be maybe someone who also has to write demos for SQL Server.
Um, maybe you have a demo database where you want to figure this stuff out. Um, you could do the same thing there, or maybe you, in your database, you know, maybe you have the code with a lot of local variables and stuff in it. Then you want to figure out maybe where, um, you know, your local variables might be causing bad cardinality estimation and performance problems.
You could do that with this if you wanted. Um, quite frankly, if I were trying to find poorly performing code, I probably wouldn’t start with this. I would probably just start to find queries that have a high CPU and or duration.
Then I would try to figure out if local variables or bad cardinality estimates or whatever else are the cause of that. So, um, really this is probably mostly a tool for presenters who want to find good demos. Um, I wouldn’t recommend, again, running this in production to do anything because, um, I don’t want to be responsible for whatever happens.
In there from doing all this, right? Running those unlicensed DBCC commands. Anyway, uh, just sort of a fun video with a fun script.
Again, this will be on GitHub. This will also be a link in the, in the video description. Um, if you feel like giving it a spin in your demo database or your non-production database, uh, it might be fun for you. Just remember, if you’re running this in an actual database, you’re going to want to skip the create statistics part because, um, you’ll, you might spend a very long time creating statistics on a bunch of columns so that you, you have things to look at.
Uh, but demo database wise, this is a lot of fun. Anyway, um, I don’t, I don’t know. You know, they, they, they, they can all be in-depth SQL Server performance tuning stuff.
Sometimes people like to see how the sausage gets made. And at Darling Data, we make a lot of sausage. All right.
Recently voted by Beer Gut Magazine to be the, the sausage king of SQL Server. So, got a lot going for us here at Darling Data. Anyway, thank you for watching.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.
Performance Pains With NOT IN And NULLable Columns In SQL Server
Thanks for watching!
Video Summary
In this video, I delve into the performance pitfalls of using NOT IN and nullable columns in SQL Server queries. Erik Darling from Darling Data shares his expertise on why you should be cautious when employing NOT IN, especially when dealing with null values, as it can lead to unexpected results and complex execution plans. I explain how using NOT EXISTS instead can simplify your queries and avoid the performance overhead associated with NOT IN, making your SQL code more efficient and reliable.
Full Transcript
Erik Darling here with Darling Data, the most punch drunk SQL Server consultancy on the planet. Anyway, some days we’re just tired. SQL Server is exhausting, isn’t it? It’s like, man who thought he couldn’t possibly get any more tired got way more tired. Anyway, in today’s video, we’re going to talk about a performance peril with NOTIN and Nullable columns. Now, if you’ve been working with SQL Server for, or databases in general, for like longer than 15 seconds, you’ve probably run into someone on the internet saying if you use NOTIN and you hit a null, your query will bail out and return no results, or return results that you wouldn’t expect. And you should use something else to do, to write your query, or do something else to do. something to handle the nulls, right? Is null, coalesce, something, you know, some, some other crappy idea. In this video, I’m going to talk to you about, like, not only like how logically that works, but also from a performance perspective, how even if you don’t hit that situation, SQL Server has to come up with some really weird execution plans to protect itself when you do run into that. So with that, that, that intro complete, I suppose this is where I beg you for beg you for your, your, your, your alms for the poor. If you appreciate my channel content, and you would like to donate $4 a month to keeping Erik Darling alive and somewhat well fed, you can do that by clicking the link down in the video description. It says become a member. You can become a member. It’ll cost you $4 a month. If you don’t have $4 a month, well, if you’re just some freeloading, weird, I don’t know, college student or something, you can like, you can comment, you can subscribe.
You know, whatever floats your boat, strikes your fancy, blows your hair back, keeps your powder dry, whatever, however you, however it makes you feel, you can do those things. If you need help with SQL Server, if you need consulting, well, golly and gosh, you found a young, handsome consultant with reasonable rates right here on YouTube. You can get in touch with me. I can do all this stuff. I can do more stuff. This is just the stuff that I really like doing. So if you have something else, well, I don’t know, suppose we can talk about that. My rates will remain reasonable. If you would like some high quality, low cost SQL Server training that lasts for as long as you live, you can go to that link and then you can enter that code and then you can get all of mine for about $150. It’s a pretty good deal.
If you want to catch me live and in person, I will be at Past Data Summit. November 4th and 5th, I will be delivering pre-cons. Of course, the Past Data Summit itself goes on much longer than that. I even have a regular session on isolation levels. It’s really going to twist your melon, bake your noodle, you know, all that stuff. So you should come to that and you should see me live because it’s even better in person. I promise.
So with that mundane nonsense out of the way, let’s talk about not in and the problems that it can cause. All right. So I’m going to run this, not that it does anything, and I’m going to run a couple of funny looking queries. It’s very, the reason why they’re funny looking and the reason they might strike you as funny looking is not because they’re standing next to me, but because there is no from clause in these.
All right. It’s just select one, X equals one, where select zero is not in one or two, and then there’s select one, where select zero is not in one, select null. Right. I guess there’s a union all in there, too, for good measure. But when we run these, this is sort of what I was talking about from a logical standpoint, where if you have a null, SQL Server is like, I don’t know, can’t possibly figure that out.
You get a blank result rather than what you would expect, which is this. Right. You would expect to see a one come back because zero is not in one or null. Zero is also not in one or two, but we got a result back from that one, whereas this one we didn’t.
Right. So that’s not that’s not a very good time. Now, the point of this video is not that. Right. That is obviously a logical sort of I mean, I’m not going to call it an inconsistency, but it is it is sort of strange.
Right. Because in you have no problem there. So if I were to give you a general rule of thumb, if you’re going to use in or not in, then you should do that only when you have literal values that you write into your in or not in.
Right. Whether there are numbers, strings, dates, something like that. If you are typing the actual values you care about in in or not in, then you’re probably all right. For not in, you do have to pay attention to if the column in the table is nullable or not.
But, you know, that that’s up to you to figure out. But if you need to supply a sub query or anything like that, then you really should be using exists or not exists, because you will you do not have to deal with the same sort of weirdness that you end up with when you use in and not in or rather specifically not in.
But what I what I did is what I’m going to do here is I’m going to prep a couple of tables and I’m going to stop there. Well, I’m going to run that and then I’m going to come over to the query plan. So this is the not in version of the query plan right here.
Right. So this is the thing that I ran with not in after populating these. The thing that you’ll notice is that for both of these tables, these columns have been declared as nullable. Right. This this column is nullable.
Where’s that other one? There it is. This column is nullable. But we have specifically put a whole bunch of not null values into both of these tables. Right. We’ve only put values into the comment table or user IDs from the comment table where user ID is not null.
And we have only put the owner user IDs from the post table where owner user ID is not null. So we don’t have any nulls in there, but the columns are both nullable. And since we define the table that way, that can happen under all sorts of other circumstances, too.
If your column is nullable and you do select into or you create a table and you don’t specify null or not null, you might be surprised at what what the actual table definition is. But this was the query plan that that ran for the not in version.
Right. You can see the not in up here. Select this stuff. That is exactly what I showed you in the other tab. This thing runs for a little over 13 minutes.
For that little over 13 minutes, you’re going to notice some extra stuff. Right. So the query itself, of course, we are just saying from this. Where this column. I didn’t frame that up well at all, did I from this where this column is not in select this column.
Right. That’s all we got there. Right. But we look at the query plan. We see comment post comment comment.
We have three references to comment and one reference one one reference to post. Now, what ends up happening is SQL Server has to do a whole lot of gymnastics to figure out which columns might be nullable or might not be. So SQL Server uses this row count spool over here to start figuring out if any if any rows in the comment table are null.
And it has to count all of these to figure out if it’s going to just bail and give you nonsense. Then we also have SQL Server doing all this crazy stuff where for 13 minutes we have a top above a scan where SQL Server just keeps going in here and doing an anti semi join to the post table to figure out some of the not in. Right. But when we do this, this is when we’re also trying to figure out if there’s any nulls in there that might mess things up.
And then finally, way over here, we have the result of all this work right anti semi joined to the base table itself. So SQL Server had a real tough time and did a whole lot of extra work trying to figure out if there were any nulls in there. Like what’s going to happen? Should I bail early?
Like like like can we even fit? Can we figure this query out? That doesn’t happen with with not exists. So the not exists version of the query and this is, you know, something that when I first started writing SQL, I messed up because I, you know, I was like, holy crap. In and not in. Oh, man. Stuff is weird with them. Like not fully knowing the whole story.
So I just started taking in and not in and writing like exists and not exists. But I would always forget this part of the exists, like the correlating part. So it would just be like we’re not in select something from this table.
And then that that that doesn’t work. Right. Because with exists and not exists, you don’t project out anything from the select list. That’s why I can stick one divided by zero in the select list and nothing happens.
I don’t get an error because SQL Server doesn’t evaluate this. Right. You can put whatever you want in there except an aggregate. It doesn’t matter.
A lot of people think that if you put top and it exists or not exists, it’s somehow faster. It’s not. There is an implied top in there already. Ah, crazy. Anyway, it’s called a row goal.
You should learn about it. When I run this query and say select count from whatever where not exists, this thing, oops, I didn’t turn on query plans. This does not take 13 minutes to run.
This runs pretty quickly. And we just have one simple hash join, a right anti-semi join, which from the comments table to the post table. And we don’t have all that extra work.
We don’t have that row count spool where SQL Server is trying to figure out if there are any nulls in there. We don’t have that weird top above a scan where SQL Server is trying to figure out if it can keep getting rows. If like, should I bother?
Should I bother? Should I bother? What’s this? What’s this? What’s this? Pass it along over here. We don’t have all that. SQL Server just takes care of all this with the not exists in one simple join. And this finishes a lot quicker, right?
It doesn’t take 13 minutes. It takes 1.7 seconds. So when you’re writing queries, do be very careful with not in. Do make sure that if you are going to use not in, like I said, sort of a general rule of thumb, use literal values. Make sure the column that you care about is not nullable.
And, you know, if you are unsure, if you just want to be safe in general, exists and not exists are a far easier way to deal with subqueries and other stuff like that. Because you don’t have all these sort of execution plan oddities that you can end up with when you start using not in for these things. So my general preference is exists and not exists.
But as usual, or, you know, this is your first run around the block. Don’t forget that correlation in the exists or not exists. Otherwise, you could be very surprised by the results.
They are. They might be shocking. You might find that you have found everything or you have found nothing. Right?
Go figure there. Anyway, thank you for watching. I hope you enjoyed yourselves. I hope you learned something. I hope that you will avoid nullable columns. I hope you will avoid nulls. And especially mixing all of that stuff with not in because the results you get will not be happy ones.
All right. Anyway, thank you for watching. Goodbye.
Have a nice day. I hope it’s the best day of your life. Which is secretly, I guess, a curse because every day after that would just pale in comparison. Huh.
Well, maybe you live to be a thousand in interesting times. Life’s funny.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.
In this video, I revisit the topic of automatic tuning in SQL Server, addressing some of the feedback from a previous discussion. After Brent commented that he felt I wasn’t being fair in my critique, I decided to delve deeper into his example and provide a balanced perspective. The video walks through setting up the environment with Stack Overflow 2013 data, creating necessary functions and indexes, and altering database compatibility levels to observe automatic tuning in action. By using SQL Query Stress for multiple executions, we were able to generate enough data to trigger automatic plan recommendations, highlighting that five executions are indeed insufficient for meaningful results. This exploration demonstrates the importance of thorough testing when relying on automatic tuning features, ensuring that your queries receive optimal performance over time.
Full Transcript
Erik Darling here with Darling Data. And in today’s video, we are going to revisit automatic tuning again. Because, obviously, I talked about automatic tuning in, well, now yesterday’s video. And, you know, Brent was having problems with automatic tuning, Jessica. I’m just making fun of the transcript. The way that things get transcribed in YouTube, I’m sure things look dumb in mine. But then, Brent commented on YouTube, that I’m not being fair. And also on LinkedIn, that I’m not being fair. And so I had a little chat with Bats Maru. And Bats Maru said, be fair. So in today’s video, we are going to be fair. Because if there’s one thing we care about here at Darling Data, it is fairness. So let’s walk through Brent’s example, which involves, I’ve just I’ve just reformatted things a little bit because it’s not specific to Brent. Everyone else’s query formatting gives me a headache. So I just reformat, move things around a little bit. But this is, you know, we’re using Stack Overflow 2013, not the full Stack Overflow database. Apparently everyone who says disks are cheap has never bought disks from Lenovo. So we’re using a slightly smaller version of the database here. But I’ve created the function and I’ve added the isValid to the table.
And that’s not my typo. Don’t yell at me. I’ve created an index on reputation and I’ve created the getTopUsersStore procedure. Well, actually, maybe I cut that off when I was moving stuff around. Anyway, that thing’s in there. I promise. Otherwise this thing would just throw errors, right?
And so what I’m going to do is follow along from here where we alter the database compat level to 110. We change the database scope configuration and we mess a little bit with QueryStore. And in Brent’s example, he had only executed the store procedure five times, which is an inadequate amount of sampling for the automatic tuning feature. Now, automatic, no, I went hunting through the Query, not QueryStore, the extended events GUI because I wanted to figure out if there were any events that would fire around automatic tuning. And there are a whole bunch of them. So I just created a session with all the ones that I thought looked interesting in there. And that’s what, that’s the live data that we’re watching over here. All right, cool. So we’ve got this thing watching our query, watching our server very carefully. And I’ve also got the script from yesterday fired up, ready to go.
So the first thing I’m going to do is rather than just execute the procedure five times, we’re going to go a little bit, we’re going to go a little bit harder than that. And we’re going to use SQL query stress if it’ll actually show up on the screen. Where are you, SQL query stress? There you are. No, that’s a new window. Oh, apparently that other window was frozen. Okay, good. Let’s load settings. Thanks. Thanks. Thanks for making me look like an amateur SQL query stress. It’s real cool.
Let’s put that in there. And then I’m going to just for this one, I’m going to do 100 and 100 to make it an even a lot of executions. And that all does, I don’t know, pretty quick, right? And if we chop off a zero there and a zero there, we’re going to get ready to do the next one.
Now, right now, we don’t have any data in here, right? Nothing is showing up here, and nothing will be showing up in here. But what’s really funny is that in Compat Level 110, we can’t even run this query to check on things. Even if I add an optimized, like a Compat Level hint to this query for 150 or 160, we can’t run this query with JSON in it because it’s not available under Compat Level 110. Would that I had a big enough hand to smack everyone at Microsoft who makes these decisions? I would gladly do it. It would just be an endless slap for, I don’t know, probably 10 years.
Anyway, you’re just going to have to take my word for it that there’s nothing in here and there’s not, well, obviously nothing in the extended event. So that’s great. So now let’s flip the Compat Level to 160. And we’re going to just do this for 10 by 10. And this is going to be significantly slower. But if we come over here and watch this, eventually we’ll get some data in here. It takes a little bit though.
This thing takes a lot longer to run under Compat Level 160. And of course, you know, the actual feedback for the event takes a somewhat significant number of executions before it starts thinking about regressions and making guesses and figuring out if things need to be changing. But there we go.
Miraculously, around 20 executions now, we have a regression check. And now since we are in Compat Level 160, we can run our JSON query. And we have some advice in here. All right. Average query time changed from 7.56 milliseconds to one, well, 15.2 seconds. And we have some stuff over here where just like with my example query, there was, you know, some information about which query we should force and which query IDs were involved. So let’s take that and let’s put that in there and let’s stop SQL query stress so that we don’t have another weird crash thing going on in there. I’m not sure why SQL query stress had a problem. But if we look in query store, we will see query ID one had two plans. And since this is sorted by average CPU descending, this will be the slow one. And this will be the fast one. We got 50 executions out of that and 10,000 executions out of that. So yeah, there’s actually stuff in there. Now, sort of interesting, maybe, I don’t know, vaguely interesting is if we flip compat the compat level back to 110 and we add some zeros back in here.
I don’t know exactly how interesting this is. And we run this a whole bunch of times. The live data view from here will actually show this automatic tuning check abandoned thing for our query. Like, see, there’s query ID one. That’s the one we are looking at. So at some point, this thing does, see, SQL Server does sort of give up on this one, because there are a lot of errors in there. Now, that finished. And now you can see that I use a Lenovo and Lenovo did a software check. It was very interesting stuff in my life. And now if we flip the compat level back to 160, so I can run the JSON query again. Sometimes this will say that the query is too error prone. It’s not happening here, but at least, I don’t know, I ran through this a few times. And like, there was one time where it said for the under reason, it was like, this query is too error prone, we can’t we can’t handle doing this. It only happened to know, like I said, every once in a while, over here, it says it is not error prone. But at least one time, maybe probably actually at least two times, it did say it was error prone.
I’m not sure why SQL Server changed its mind. Maybe I just ran this thing enough that it’s changed its mind about that. I’m not sure I couldn’t tell you. But anyway, the moral of the story here is that five times is not enough, not enough executions to get the automatic tuning stuff to kick in.
But if you run stuff a lot using I don’t know, you could use O stress if you’re feeling command liney. But SQL query stress does a pretty good job of executing stuff enough to trigger the automatic tuning, at least recommendations. And if you have the automatic plan forcing stuff on, it’ll force the plan for you.
Anyway, I think that’s about that. So we can probably stop this here. Me and Bats Maru are achieved peak fairness for the day. So we’re going to go celebrate now. I don’t know what we’re going to do to celebrate. Bats has some crazy ideas.
Bats has some crazy stuff to say. But anyway, that’s that. Execute the stored procedure more. Something will happen eventually. Anyway, thanks for watching. Hope you enjoyed yourselves. I hope everyone learned something.
And as always, my rates are reasonable.
Video Summary
In this video, I dive into the world of automatic tuning in SQL Server, addressing some concerns that arose when my friend Brent struggled to get it working properly. I walk through a detailed demonstration using SSMS and a sample query provided by Microsoft, showcasing how to set up and test the feature while emphasizing the importance of having sufficient data in Query Store for the system to make meaningful recommendations. By exploring real-world scenarios and potential pitfalls like parameter sniffing, I aim to provide clarity on when and how automatic tuning can be beneficial, as well as offer practical advice for those looking to implement it effectively.
Full Transcript
Erik Darling here with Darling Data. And in today’s video, we’re going to talk about automatic tuning in SQL Server. And the reason why is because I’ve had a couple people tell me that my friend Brent ran into some trouble trying to get this feature to work. And I want to make sure that we can alleviate Brent of his worries and sorrows. So Brent released a video about not being able to get this thing to work. And it says, I struggle with this. I struggle with automatic tuning Jessica. Well, that’s, that’s worrisome enough isn’t it? It’s terrifying enough on its own. But then I, I posted a video about, uh, cardinality estimation feedback and Flagstar seven days ago said, this reminds me of Brent’s recent video. With him being unable to get automatic tuning to work. I have no idea what he was doing wrong. Okay. Fair enough. And then I opened a, a, an issue on my, my GitHub repo for SP Quickie Store. And then I opened a, a, an issue on my, my GitHub repo for SPQuickie store and then I opened a, an issue on my, my GitHub repo for SP Quickie store and it says, I don’t know what he was doing wrong. But then I opened an issue on my GitHub repo for SP Quickie store and it’s not And I’ve thought that maybe, you know, I don’t really do anything with this dynamic management view, but, you know, I thought maybe it might make a good addition to Quickie Store.
Maybe we get some additional query feedback stuff on it. And Reese Goading said, does Sys.dmdb tuning recommendations even work? Brent recently tried his best to get anything out of it.
As I recall, he failed. Oh, gosh. This sounds like real trouble. This sounds like a job for Darling Date. So let’s come over to SSMS.
And I’ve cleared out Query Store here because I don’t want anything interfering. And I’ve set CompatLevel to 150 because the parameter-sensitive plan optimization makes stuff in getting this demo to work really confusing.
You don’t want to be in CompatLevel 160 for this. And I’ve got this query over here that Microsoft provided. As you can see, it returns no rows currently, which is apparently Brent’s experience as well. But don’t worry.
We will get at least one row out of this. I promise. Now, I’ve already got this index created. So everything is good here. We’ve got that index.
And we’ve got a store procedure here that will get the total sum of scores based on a parent ID and a post type ID out of the post table. No, it’s not much to look at. But it’ll get us where we’re going.
It’s just enough to make this whole thing work. Now, let me show you why this feature will kick in for this procedure. If we run these two executions of the procedure back to back, the first one returns a result almost instantly, which is great.
Everyone wants instant results. Isn’t that nice? All right.
And the second one takes a little bit longer. All right. So here’s the first one. It takes exactly one millisecond there. That’s pretty good. And this one down here, this takes eight seconds. That’s not so good.
That sounds like parameter sniffing to me. Could be a big problem, couldn’t it? All right. So let’s make sure that we have a reasonable baseline sitting around in our database here for SQL Server to work with. The big idea here is to have enough of a baseline in Query Store for it to be able to do something.
Now, I’m going to kick this one off to execute. And this is going to take around about 45 seconds to a minute. And while this thing executes and does its thing, let’s come over here and let’s look at this wonderful free script that I got from Microsoft’s page on the Sys.dmdb tuning recommendations dynamic management view, where, you know, we got all this stuff.
We got a JSON value and we’ve got some ifing in here. That’s quite nice. And then down in here, we select from the, that didn’t frame up too nicely.
So we’ve got to select from the DMV and we’ve got to cross apply to some open JSON stuff. Why? I don’t know.
Some real big wrinkly brain over there thought, I have an idea. JSON’s new. Let’s cram a bunch of crap into JSON. Let’s make it, let’s make it as hard as possible for people to get reasonable information out of this.
All right. Yeah. Great idea. Why don’t, why don’t you just use some more XML? Why don’t you just follow query plans and, uh, and extended events and, uh, the block process report and the deadlock XML report. So I could at least reuse some of my XML knowledge here.
No, just jam it all in JSON. I’m sure it was three nanoseconds faster to do that. So anyway, of course, this thing runs and it finishes. Look at that.
44.8 seconds. I was almost right about 45 seconds. Now, with just those two things having run, we still don’t have a row in here. We don’t have a row in here because we still only have one plan. We’ve got nothing for SQL Server to compare it to.
So we’re going to come back over here and we’re going to hit SP recompile here. And then we’re going to run these two store procedures back to back. All right.
And this time, do they both run faster? Yeah, sort of. I mean, the one that was really slow before runs a lot faster. We went from eight seconds down to 650 milliseconds about. But the one that ran really quick before in like one millisecond now takes 600 milliseconds.
Okay. Fair enough. Let’s hit, let’s, to not interfere with, with query store or the plan cache or anything. Let’s hit SP recompile there.
And now I’m just going to run this one 30 times for the one that was kind of slow before. This is going to do some more. This is going to do more work a little bit more quickly than the last time around. It’s not going to take 45 seconds this time.
Excuse me. Because we’re using that clustered index scan plan, this should finish up in around about 20 or so seconds. Oh, 13 seconds. Oh, you know what? I’d rather under promise and over deliver than anything else.
But now with 30 executions of that around, when we run this, magically we have a row. Don’t worry, Brenty. I gotcha.
So what is this telling us? Well, average CPU time changed from 56 milliseconds to 7.689 seconds, right? 7,600 milliseconds.
And if we scroll all the way over here, we’ll have some advice from this DMV. And it will be that, oops, I, I messed that whole thing up. Ah, there we go.
Let’s slide that back. That we should force for query ID 10 plan ID 9. Okay. Well, here’s query ID 10. The regressed plan ID is 11.
The recommended plan ID is 9. Okay. Well, let’s see. Let’s take query ID 10 right here. Let’s copy that. And let’s use my free, amazing, immaculate store procedure. Oh, that didn’t, what the hell?
Oh, it must have copied the, must have copied the column name. I forgot that that’s a default thing you can set. So let’s look for a query ID 10. That is the right one, isn’t it? Query ID number 10.
Let’s just double check and make sure. Query ID 10. Wonderful. We’re going to run that. And now let’s, let’s, let’s, let’s refresh our mind of what SQL Server was telling us. The regressed plan ID is 11.
So let’s go look at the regressed plan ID, right? That’s going to be plan ID number 11 right here. And this is the clustered index scan. Okay. Now the, the good plan ID it says is nine, right?
That’s this one, the one that says we should force right here. If we come over here and we look for plan ID nine, that is the clustered index seek with the key lookup. Now, would you want to force this plan?
It’s a good question because for one set of parameters, that’s a pretty good, that’s a pretty good plan to force. For another set of parameters, that’s a pretty bad plan to force. So should you follow this advice?
Probably not. Probably not a great idea. Now I will, I will, I will grant SQL Server a little bit of grace here that in the query plan that it says was regressed, there is a missing index request that would probably help things out here, but it doesn’t, it doesn’t tell you to create the index.
It tells you to force the plan. And I don’t really know that I love that so much. So what should you do here? Well, what I would recommend is not, maybe not listening to the advice of the tuning recommendations query. Because if you force that plan for that query, you will have some that are very quick and some that are very slow.
What I would probably recommend you doing is tuning that query. Maybe that missing index request would have been the thing to do it. But I think what’s generally valuable about this view is not necessarily the advice that it gives you about which plan to force, but really that queries are ending up in there with some discrepance, with some, with some obvious signs of regression that you might want to address.
Right? So this is sort of a typical parameter sniffing scenario where something about the query plans that it generates for one set of parameters and another just don’t get along so well. So what I would recommend you looking at the two different query plans, figuring out what’s different about them, figuring out what you can fix in order to make that plan as shareable, as simpatico as possible across as many different parameter variations as you can think of, and then going from there.
So as usual, Microsoft wanted to do something like AI, ML, really, you know, a self-tuning database that we’ve been reading about since 2000. Nonsense.
Didn’t, doesn’t quite deliver. But we got kind of a cool new DMV that can help us find queries that could use our help. You, it’s, we still need to do the helping. And I’m sure that there actually might even be cases where forcing a query plan would actually do you some good.
It might happen. It just didn’t happen for me here. So, uh, it does work. And I hope that, uh, Brent can sleep well now.
I hope that we can all just move on from this, this, this tragic, tragic chain of events in the SQL Server community. I hope that we can begin to heal and process the devastation that has befallen us. Brent couldn’t figure it out.
Well, anyway. Uh, thank you for watching. I hope you enjoyed yourselves. I hope you learned something. And I hope that you’ll watch other videos of mine. Because who knows what else I’ll figure out.
Alright. Goodbye. Goodbye. Bye.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.
It brings the total cost on 24 hours of SQL Server performance tuning content down to just about $100 USD even.
Good luck finding that price on those other Black Friday sales.
It’s good until I wake up and remember to turn it off, so hurry along now.
Thanks for reading!
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.
It’s always tricky to time these holiday posts. Time zones and traditions make these things difficult.
Being of the American persuasion and tradition, I tend to be that-centric. If I could invite all of you over for a holiday celebration, I would.
But since that just wouldn’t be practical (and we are practical practitioners of these here database thingies), the best I can offer is my sincere hope that you Have A Great Day.
It brings the total cost on 24 hours of SQL Server performance tuning content down to just about $100 USD even. Good luck finding that price on those other Black Friday sales.
It’s good until I wake up and remember to turn it off, so hurry along now.
Thanks for reading!
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.
When Profiling SQL Server Stored Procedures Goes Wrong
Thanks for reading!
Video Summary
In this video, I delve into the intricacies of using extended events to profile stored procedures, specifically focusing on a common issue where multiple quick queries add up to significant overall duration. I introduce SP_HumanEvents, a tool I developed to simplify working with human events in SQL Server, and demonstrate how it can be used to identify performance bottlenecks in scenarios where individual queries are fast but collectively take longer than expected. By sharing this practical example, I aim to provide insights for query tuners faced with complex stored procedures that require fine-grained optimization at the micro-level.
Full Transcript
Erik Darling here. Does any of this surprise you at this point? Darling Data, alive and well, thriving, writhing, grinding away. In today’s video, we’re going to talk about extended events profiling ics. There’s actually really only one ics, but it’s happened to me so many times I’m going to pluralize it. So, I’ve got a stored procedure. You may have heard of it. You may have used it. You may have it installed on your server, but it sits there collecting dust while you don’t update anything, called SP underscore human events, events, events, events. And of course, the point of it is to make working with human events a lot easier than Microsoft has made it, because as usual, Erik Darling cares about you where Microsoft does not. Erik Darling wants you to succeed. Erik Darling wants you to figure out your problems. Erik Darling wants you to succeed. Erik Darling wants you to live a long, happy life. Microsoft cares not, except for anything other than maximizing shareholder happiness. So, we’re going to look at that today. Not shareholder happiness, but that thing up there. Anyway, before we do that, let’s talk about my happiness. If you really like this channel content and you find me worth $4 a month, there’s a link in the video description where it says become a member, which is sort of a strange way of putting it, but I guess that’s what you become when you get a membership. You become a member. You can do that for $4 a month. If $4 a month is just far beyond your financial grasp, I get it. Not everyone can make these lucrative YouTube videos for money.
I am up to nearly 20 members at this point. So, if you do the math on that per month, I can almost pay one month of a cable bill. Almost. You can, liking, commenting, subscribing, all free. I’ll open things that you can do to make me a happy man. If you need help with your SQL Server and you are looking around the internet saying, gosh, all these consultants look stupid and goofy and they say, they write schlocky things about leadership or whatever.
I’m very good at all these things. I’m much better than the rest of them. So, if you need help with that at a reasonable rate, you know how to reach me. I’m very reachable. Reach anywhere and find me. Up, down, around. Pretty much any preposition or proposition. Anyway, if you want some SQL Server training of the low cost, high quality variety, if you go to that link up there and you put in that coupon code, you can get all of mine for the rest of your life for about 150 US dollars. It’s a pretty good deal. It’ll make you happy.
It really, really tickles the dopamine. If you want to catch me live and in person, I will be alongside Kendra Little for two days of performance, tuning, wonderfulness, this November 4th and 5th in Seattle, Washington.
If there’s an event near you that you think Erik Darling would make a good component of, a good member of, you can let me know what that event is and I can talk to the organizers and maybe get a pre-con there so that, you know, I can pay my hotel bill or something, which is always nice when that happens.
So, anyway, with that out of the way, let’s talk about extended event ics, or rather an ick that I have with extended events. Now, I’ve got a store procedure here called eventually. And the whole point of this store procedure is that there is a one second delay and then a very fast query and then a one second delay and a very fast query.
And what I’m going to do is I’m going to use spHumanEvents. Well, I’ve already used it. We’ve already got this thing over here, which is we’re watching live data. If you know, but I’ve set up, I’ve run spHumanEvents with this set of parameters to set up this session to look at query performance data.
And then I came over here and I right clicked on this and I hit watch live data. Now, when I do that and I click run here, this is going to run for about six seconds because there are about one, two, three, four, five, six, wait for delays.
Now, the way that I set up spHumanEvents to run is I put a query duration filter on here of five seconds, which means anything that runs over five seconds, we’re going to capture information about. Usually a pretty good starting place when you’re profiling an entire store procedure because, you know, something that runs over five seconds, you can usually do something about that.
If you have stuff that runs for way longer than five seconds, like if the whole thing runs for an hour and like there’s three queries, you might want to set that a little higher so that you figure out, you know, we don’t know, probably maybe one of the queries takes a second and the other one takes like three seconds and the other one takes like many minutes.
Well, you don’t really need to see query plans and stuff for the other ones. That’s kind of boring. But sometimes in unprofiling store procedures, there’s a lot of stuff going on. There’s lots of tiny little queries.
There might even be dynamic SQL. There might even be other store procedures getting executed. There might be all sorts of stuff that happens. I can’t possibly predict all of the insane things you people will put into your store procedures. So I usually start off by doing something like this so I can just capture the sort of higher value stuff and figure out what of that stuff I need to tune.
The thing is, all of the queries in here ran pretty quick, right? So we had the wait for delays that bumped up the total duration, the wall clock time, but nothing in here took a whole lot of CPU time or took a lot of wall clock time individually.
It happened as a group. Now, so when we come over here and we look at the output in extended events, we’re only going to see a couple lines.
We’re going to see module end and statement completed. So we see where the store procedure finished. That’s module end. And we see SQL statement completed. That was the query that I ran in SSMS saying we’re all done, right?
And they each had a duration of just about six seconds of wall clock time. There’s a little bit of difference in microseconds there, I guess, for whatever reason. Not really terribly important.
Not something we have to worry about. Maybe it was just probably just the difference between like the store procedure ending and SQL Server being signaling back to SSMS, hey, this is done. This can happen when you have code where sort of like what I was talking about before, you have a whole bunch of queries that run very quickly individually, but they add up to a lot of time in the aggregate, a high duration in the aggregate.
You might have loops. You might have, you know, cursors. You might do all sorts of weird stuff in a store procedure that makes the store procedure run for a long time.
But you don’t have an actual single individual query that you can go in tune very easily. It’s not a very easy situation when you’re dealing with that because now you have to figure out, you know, like at a very, very small scale, very small improvements you can make so that, you know, each individual step finishes faster than it did before so that in the aggregate, things are faster.
You know, like with a big query that takes a long time to run, you might be able to make, you know, let’s just say it takes 30 seconds to run, you might be able to make a whole bunch of changes to that one query to bring that one query down to like, I don’t know, two, three seconds or something.
But when you have a query that runs, say, you know, a million times and you need to get that query to produce results in the aggregate faster, you’re looking at really micromanaging a lot of different individual things.
That’s when you have to start getting query plans for much smaller bits of SQL and say, taking something that takes 300 milliseconds and getting that down to even fewer milliseconds, like three or two or five milliseconds or something crazy.
And going from 300 down to that smaller number is where you start, you know, seeing the bigger results again in the aggregate. That’s a little bit beyond what I can do in this short video, but it is something that it is an eventuality that a lot of query tuners need to prepare for where, you know, you can even compare this to like, if you have a big, gigantic, massive query, and let’s say it finishes in 500 milliseconds, but there are 500 operators in there and they all take very few milliseconds.
You know, sometimes a query like that is just the sum of its parts, right? Like you, like the only way for you to make something faster is either to take parts out of that or really start breaking down where like any amount of time is spent and trying to improve that.
Not always the easiest thing to do for, especially for big queries like that, you might even have significant compile time on the query itself. But anyway, just something that you should be aware of when you’re profiling. I mean, if you’re not using SP human events, you’re, you’re, you’re screwing up, but something to be aware of when you’re profiling stuff is that you might have a store procedure that takes five or six seconds to run, but none of the individual parts really contribute heavily to that.
So that’s when you would have to start taking the query duration filter and putting this down to a much smaller number, like, you know, like either one second or 500 milliseconds or something else in order to find query tuning opportunities. So that for each individual execution of that thing, you can speed that individual execution up and bring down the entire wall clock time of the whole shebang. So anyway, thank you for watching.
I hope you enjoyed yourselves. I hope you learned something. I hope that you will continue to view this channel, comment, subscribe, like, maybe even become a member. Anyway, that’s good for me here.
Thank you for watching.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.
In this video, I dive into the intricacies of rewriting scalar UDFs in SQL Server, a task that can quickly turn from routine to headache-inducing. Specifically, I explore an interesting edge case where a scalar UDF that was guaranteed to return a result ended up as an inline table valued function (TVF), which altered its behavior unexpectedly. This change not only affected the results but also introduced new challenges in how the function needed to be called and handled. I walk you through the process of rewriting this function, highlighting the pitfalls and solutions encountered along the way, such as dealing with null values and ensuring that the rewritten function returns the expected results. By sharing these insights, my goal is to help you navigate similar scenarios smoothly and avoid the frustration that comes with unexpected changes in SQL Server functions.
Full Transcript
Erik Darling here, looking, trying to look tough, trying to not look crooked, trying to hopefully work out that scoliosis with Darling Data. And in today’s video, because I feel like I’ve had to write a lot, write, I feel like I’ve had to record a lot of these videos lately because as I go back through my notes and things, I realize just how annoying rewriting scalar UDFs can get. Today we’re going to talk about a really interesting edge case where a scalar UDF that was always guaranteed to return a result ended up as an inline table valued function, which even though it would always return a result, would not always return exactly what the scalar UDF did. And if you want to be the person who looks heroic doing this sort of thing, you need to make sure that you return the correct or the expected results when these things run.
If you like my material and you have four extra dollars a month, you can subscribe to my channel with a membership using the link down in the video description. I’m trying not to, it’s probably over that, somewhere in there. If four dollars a month is too rich for your blood, if you’ve, you know, got better things to do with your money, then I understand.
Liking, commenting, subscribing, all perfectly wonderful free things you can do to keep me motivated to keep creating these videos. Because we’re hanging on for dear life here. If you need help with SQL Server, these are all things that I am better than everyone else at.
So if you need help with any of this stuff, you can hire me for a reasonable rate and get better help than you can get from anyone. Anywhere. On the planet.
Beer Gut Magazine said so. If you would like some high quality, low cost content for about 150 US dollars for the rest of your life, you can get that from me. By going to that link and using that coupon code.
And guess what? There’s a link for that too. Sorry. Link for that too. Down over that way, somewhere. Not in my area, in that area.
Of course, for as long as humanly possible, I will be in Seattle for Pass Data Summit. November 4th and 5th with Kendra Little tag teaming two days of performance tuning, titillation, and top notch shoe things. Anyway, let’s get on with the show.
So, let’s look at function rewrite annoyances. And what I’m going to do is show you the initial function or a close enough approximation of the function that I was dealing with. And, you know, just because I have a newer SQL Server version, and I don’t want scalar UDF inlining to kick in because that will break the demo.
I’m just declaring this thing in here so we have a date time. Right? So, it’s a scalar UDF.
What this thing does is it declares B, which is a bit, which is false, which if you ever run this, it will be zero. Excuse me. And what we do is we select B equals case when all this stuff, blah, blah, blah, blah, blah.
And then we return B. The thing is, if we select this here and no row gets returned, B doesn’t change from false to null. It stays false.
So, when we run this query, what we’re going to get back is a whole bunch of zeros from our scalar UDF. All right? So, you look at this query here. Thing zero equals the scalar UDF that I just showed you.
Okay? Simple enough. This is also a simple enough UDF to rewrite. Or so it seems. Bah, bah, bah, bah. It’s terrifying.
Now, if we run this, which rewrite, or rather writes a new function. It doesn’t rewrite that function. This is the rewrite, not rewrite.
See? It’s different. This will return a table, which is the type of function that does not cause performance to throw up. And if you look at what this does, it is nearly the same thing, but without declaring a variable or anything else like that.
So, this is where things change, right? Because now we’re not setting, we don’t have a variable called B. We can’t declare a variable called B in an inline table valued function.
So, now we’re just setting B to this. And if B doesn’t turn out, B is null in here. Then, B is null in the results.
All right? So, I think I already did this, but, you know, just to make sure. We do this. And then, since this is an inline table valued function, we have to call the results a little bit differently. All right?
So, this is, we’re going to select the B column from our rewrite function. And when we do this, the results will be null. And this is not what users expect. And this is what will make users freak out. Now, I know what you’re thinking.
You could just put an is null on that. And then, if it returns null, it’ll get replaced with zero. But you’d be wrong.
That does not work out. Is is null broken? Hmm. No. No.
We just have to put wrap is null one step further out. But this is, don’t worry. This isn’t the final fixed result. This is just me showing you how annoying this can get when you’re trying to figure out what’s wrong. You could put an is null around the entire function column.
And this would finally get you a zero where you expected a zero. Right? So, just wrapping the return column from the function in is null doesn’t get you anything.
This does. But that’s not good. Right? That’s not, that’s not what we’re after. We don’t want people to have to remember to wrap an entire.
They already have to remember that this isn’t a scalar UDF. And they have to call it in like the select list like this if they want it to get used. Painful enough.
Right? Microsoft couldn’t budge on that. Right? It couldn’t, couldn’t do any better. This is what we get. Thanks. Real pal. Can’t just rewrite a function and have it get called the same way.
You got to do all this crap. So, anyway. Annoying. What I found the easiest way to get this to work correctly is, is to wrap this inside the function. We have our normal function call.
And then we have a union all to just selecting B equals zero. Right? So, this is sort of the equivalent of setting that local variable in the scalar UDF to zero. And then we get the max out here from that internal, that B column in our derived select.
Right? So, if the max is zero from the union all. Because zero is going to be bigger.
The max, zero is the max of null. Right? You have a zero and a null. Zero is the max. If we have a one because it returns a true. Or if we actually get a false back. Then we’ll get a zero back here.
But if we use this version of the function, we can just call this normally. And we will get back our expected zeros in this column. So, when you’re rewriting functions, you know, apart from the fact that, you know, making sure that performance is better.
Right? With an inline table valued function versus scalar UDF. Or even a multi-statement table valued function.
That’s really important. Right? Performance needs to get better. But you also need to make sure that the results you return match what users were getting back before. So that they don’t have this, like, new surprise result back.
Because you might have QA or unit testers or unit testing or, I don’t know, people who care about this sort of thing. And they might look at your results and say, that’s all null now. It’s not zero.
And it used to be zero or one. Now it’s null. Now it messes up this other thing. Like, maybe they were inserting this into a table. And that table doesn’t allow nulls in this column. What’s going on?
Why is it broken now? And they’re going to look at you and they’re going to think you’re an idiot. But since you’re smart, you stick with darling data, that won’t happen to you. Because now you know how to rewrite scalar UDFs into inline table valued functions and not break everything.
So good for you. It’s amazing. Isn’t it? Aren’t you glad you spent this time with me?
Aren’t you glad you invested this time in your SQL Server learning journey? I sure am. Anyway, thank you for watching. I hope you enjoyed yourselves.
I hope you learned something. And I hope to see you again in the next video, which I will be recording, I don’t know, any minute now. I suppose we’ll find out, won’t we? Maybe God will finally strike me dead.
I never can tell what’s going to happen while I’m uploading these things. Anyway. All right. Let’s end on a cheery note. I think you’re pretty. 3, 2, 1.
3, 1.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.
In this video, I delve into advanced query profiling techniques using SQL Server’s built-in dynamic management views and actual execution plans to gain insights into query performance in real-time. By examining two different versions of a query—one serial and one parallel—I demonstrate how to use these tools to identify potential issues such as skewed parallelism and inefficient resource usage. This approach allows you to pinpoint where queries might be getting stuck or experiencing delays, providing a deeper understanding of query execution that can lead to more effective troubleshooting and optimization strategies.
Full Transcript
Erik Darling here. Yeah, we are going to do a Darling Data today, buddy, you and me. So in today’s video, we’re going to talk about advanced query profiling. Now, I realize that this might bring up painful memories for people of a profiler or trying to get words, extended events to work, or maybe even just running SP Who is active. All very valid ways to profile queries. But this one’s a little bit different. So with newer versions of SQL Server, have these built in views or dynamic management thingabobs that actually like when you look at query like actual execution plans. Now, you see all the per operator stuff like times and threads and all this other cool stuff. And there are dynamic management views in the background that back that stuff up. So you can actually look at a query while it’s running. This is part of why in my other video about what to do if a query executes for too long to get a query plan. Part of what I said was, if you like click to turn on actual execution plans, and you run SP Who is active with get plans equals one, you can actually see like the in flight actual execution plan is like rows and time accumulating that you can start to figure out where things get stuck. We’re going to use some of the dynamic management views that fuel that stuff to look at a couple different queries while they execute. So with that, out of the way, let’s talk about money, like four bucks a month worth, if you want to become a member, there’s a link in the video description that says join. And if you do that, you can give me $4 a month for producing all of the SQL Server content. You can also give me more if you’re feeling like Daddy Warbucks over there. If money is no object, well, you can always like comment or subscribe. And then money definitely won’t be an object. If you need consulting help with SQL Server, I am very good at all of these things. And my rates are very reasonable. You can hire me. If you would like some training on SQL Server, I offer a lot of it for a very low price, about $150 USD with the 75% off coupon code, you could spend your money on far worse things like like other people’s Black Friday sales. If you would like to see me live and in person, I will be well, you can see like all of me and some of Kendra Little to or half me half Kendra, we will be at past data summit in Seattle for two days doing pre cons about TQL Server performance tuning, November 4th and 5th. I think I already said in Seattle, but there you go. Anyway, let’s talk about this advanced query profiling stuff. Woohoo, it’s fun, right? That’s what everyone everyone always wished for. So I’ve got a couple of queries here. And let’s just make sure we are everything free in the of the database. I recorded a video where I had to do some other stuff. So you know, things might get weird. But anyway, let’s get an estimated plan for this. And this is a demo query that I’ve shown in other videos. And this video, or rather this, this query in this video is going to run just about as crappily as it did in the other video. But we’re going to look at two different versions of it. One of them is a serial version. And the other version is a parallel version or a non serial version. So we have a non parallel and a non serial version. Or a parallel and a serial version. However you want to however you want to phrase that however you feel comfortable with that.
I’m cool with that. But this is the way the the parallel plan the non serial plan looks. So what I’m going to do is talk about this query a little bit. So this query hits some different DMVs, like sys.dm os tasks and workers and worker threads. And then the one that that shows us the sort of in flight query progress stuff is this sys.dm exec query profiles thing. This is the one where we’re going to see the different operators and like progress and stuff. And then we’re going to look at sys.dm os waiting tasks. And we are also going to look at sys.dm exec session wait stats to get the top weights for the query while it runs. This is going to be if there were any weights going on here, this is going to be if the top weights for the session. So if I run this, there’s nothing going on. And if I come back up, I should have highlighted this query before I did anything else. And we start running this. This will start returning results. Now since this is a serial plan, we only get one result back for every operator. Remember, if we think about the plan that we just saw, all these operators were the ones that we saw in the plan.
What we’re going to see is this this thing is sort of finished the rows on this don’t change. But the rows on this this clustered index scan, those are those keep going up. And these are going to go up to about 17 million. And so right now things actually look a little weird in this column because I’m reusing this window. So maybe this this demo, you know, could you could use a little bit of a fresh window start in here. But that’s really the sort of unimportant thing. Just being able to get this stuff while a query is running is pretty awesome. So that query finished and we got to see the clustered index scan make progress here. So now let’s take a look at this query while it runs because this is where the cool CPU query thing gets very interesting. So now we have a whole bunch of we have a whole bunch of results for each query operator. If we scroll over here and look, we’re going to have this parallel resource description thing. And we’re going to have a whole bunch of things for each operator, right? We have a whole bunch of entries for each operator because we have parallel threads working on them.
Now, I’m going to scroll down here and look a little bit because it’s going to what I want to figure out is when this thing gets close to the end, which is about 17 million. So I’m going to give this one more run. And that should get us close to 16 million, right? So that query is about to finish. But what I want to show you here is where something like this can be useful. Now, I’ve talked before about parallel thread skew and how when you build an eager index spool, even in a parallel plan, only one CPU thread gets used to build that spool. That sort of situation with skewed parallelism can happen just about anywhere.
Now, if we look up, I should probably frame this a little bit better. If we look over here, we’re going to have this clustered index scan. This clustered index scan where we have, we can see the different durations and stuff. But look at the row count here, right? If we come over and we look at this query plan, we will see from SQL Server that we spent 59 milliseconds scanning the clustered index of the user’s table, right?
And that happened pretty evenly, right? About 300,000 or so rows ended up across all of the threads. If you look at the thread ID and the node ID, right? So for node 4, that’s that clustered index scan. We had our parallel threads, 0, 1, 2, 3, 4, 5, well, 6, 7, 8, sort of a little bit backwards in there.
But that’s okay. But we had all eight threads accounted for with rows on them. Now we’re going to scroll down to the other clustered index scan. And what we’re going to look at in here is the same thing, right?
If we scroll, like this is all the threads for that clustered index scan. But look what happened in here. This is way different, right? Up here, for this clustered index scan, when we look at the thread and, oh, sorry, that was up there.
We look at the thread and row count stuff. These were all spread evenly. Down here, right, if we look at the thread ID and then the node ID for node ID 7, right, that’s the clustered index scan over here. And we look at this. All of the rows are ending up on a single thread.
And that’s exactly what I was talking about with the eager index pool being built single threaded. So, like, really one thread was responsible for all of that. Now, if we go over and look at the final, like, actual execution plan, we can verify that when we look at the actual number of rows.
Where are you? There you are. We found you. So all 17 million rows ended up on thread 6 right there. All right. And if we come back and we look at the DMV query, that’s thread 6 for node ID 7.
And that’s where all of the threads were. So sometimes when I’m really troubleshooting a very difficult query, you know, I’ve talked about other ways where I might, you know, get the in-flight execution plan and see where stuff goes. If I just want to do something, like, quick and dirty to kind of get a sense of where things are stuck, what we’re waiting on in different places, this is a very good way to do it.
Now, the weight types and stuff for this query can get a little weird because multiple operators will say they’re waiting on something when really we’re only waiting on doing one thing. For example, in this query, waiting on building the eager index pool, that’s the exec sync weight. In a parallel, like, in a parallel execution plan, building an eager index pool like this will result in that exec sync weight.
That’ll really pile up. We ended up with almost 40 seconds of it. 30, 30, oh, I should zoom in on that a little bit. Almost 38 seconds, well, 38 seconds minus 5 seconds for the clustered index scan.
But you can kind of start to get a sense of where things are stuck, what you’re waiting on. You can see if, you know, you’re having problems with lopsided parallelism. You can see if threads are doing more work than others.
You know, you can see kind of the spread of work in there. And it’s just sort of a different way of looking at a query that’s actively executing to figure out where things are stuck, where things are, you know, how long you’ve been waiting. You know, you have the full wait duration for all of this stuff in here.
And so you can just kind of start to get a good sense of, like, where a query might be stuck at various points in the plan. So I will, of course, put this script on GitHub because I don’t expect you to remember it or to, you know, follow me scrolling around in the video and copy word for word and everything because that can get dangerous. So I’ll put this one on GitHub.
But anyway, thank you for watching. I hope you learned something. I hope you enjoyed yourselves. And I hope you got a tiny little insight into just how awful it is to sometimes be troubleshooting SQL Server problems. So, hmm.
And apparently I have a BIOS update from Lenovo. Hopefully this fixes some of my Intel issues. We’ll find out, won’t we? Maybe this will be my last video because my laptop will brick after this.
I don’t know. Knock on wood. It doesn’t. All right. Cool. Thank you.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.