Finding Bad Density Vector Estimates In SQL Server

Finding Bad Density Vector Estimates In SQL Server



Thanks for watching!

Video Summary

In this video, I share a script that I use in the Stack Overflow demo database to identify columns where density vector estimates might be inaccurate due to local variables. As a SQL Server consultant and trainer, I often need to create compelling demos to illustrate common issues like bad cardinality estimates, particularly those arising from local variables in WHERE clauses. This script helps me pinpoint such problematic columns, ensuring that my demonstrations are both accurate and effective. If you’re interested in supporting this channel or getting high-quality, low-cost SQL Server performance tuning content, consider becoming a member for as little as $4 a month—there’s a link in the video description to join now.

Full Transcript

Erik Darling here with Darling Data. I’m going to remember to flip the switch on my thing this time and we’re going to get started. I had a whole funny thing to say, but that took the, took really took the wind out of the sails there. So screw it. We’re just going to keep going. In this video, I’m going to show you and share with you a script that that I use locally in the Stack Overflow demo database to find columns where the density vector estimate would not be good. And the reason why I, I had to, I did this is because, you know, I, I am a SQL Server consultant, trainer extraordinaire, and I need to come up with good demos. And sometimes when, you know, I need to either show clients or I need to put a link in the description, put together training about things that cause bad cardinality estimates, probably the most common one that I see, you know, aside from like table variables or, you know, non-strangable predicates is when people use local variables in their where clauses. I’m not going to go into all that because I’ve got a post about that. If, if you are just so anxious to see that post, it’s on my website, erikdarling.com. The title of the post is yet another post about local variables. There will be a link to that in the description.

video description, just in case you have some contrary urges to Googling or whatever that. And, you know, I have to come up with stuff that proves out my point that, you know, for the gen in general, local variables are not a best practice to replace parameters with. You don’t want to fix parameter sniffing that way. It’s not, you won’t have a good time. And so I wrote this script to do that. But before we go look at the magnificent majesty that is that script, let’s talk about how you and I can get closer. If you would like a membership to this channel for as low as $4 a month, you can click the link that says like join now or something in the video description. And that should bring you right to where you need to go. So, uh, as usual, all of this content is free of course. Uh, and if you just want it to, you know, keep leeching off my hard work, uh, you could at least like comment and subscribe so that, uh, I feel a little bit less lonely in this crazy mixed up world. Uh, uh, I don’t know. That’s good enough there. Uh, if you need SQL Server help, if you’re in the market for consulting, I am pretty good at all of these things.

Actually, I’m very good at all of these things. I’m better than pretty good, very good at all these things. And, uh, you know, if you, if you, if you decide to do any of this with me, you don’t have to do the other stuff. Uh, if you would like some high quality, low cost SQL Server performance tuning content, well, what do you know? As a SQL Server consultant is noted by the last slide. And trainer extraordinaire is noted by this slide. You can get all of mine for life for a 75% off, which means about 150 USD at the end of the day. Uh, discount code there link up there.

Of course, all of that in the video description. Uh, I don’t know how many more of these videos I’m going to actually have this information in. Cause at this point I sort of forget where I have these scheduled out on the blog and where I have these scheduled out in the month. So, um, November 4th and 5th past day to summit me and Kendra little two days of performance tuning, train, performance tuning, pre-cons, not performance tuning train wrecks.

Uh, I mean, you, you have the train wrecks. We have the performance tuning. Uh, so that’s, that’ll be fun. Um, uh, hopefully by the time you watch this video, there’s still time to buy a ticket. Okay. So with that out of the way, let’s get on with the show and look at this fantastic script that I wrote. Um, I forget when I wrote it, but anyway. Um, so full caveat here, I, I run this script specific to the stack overflow database.

And because I’m not starting with any statistics, uh, I actually drop all the statistics and indexes to, before I run this, I have to do some initial stuff in here to create, uh, statistics on all of the columns that I care about. So I have some preamble stuff in here that will create statistics on everything. Please review the script carefully. If you already, if you already working with a database, you’re not going to want to, uh, run drop indexes to drop indexes and statistics. You probably hopefully don’t even have that installed on your production server slash database. Uh, and then you’re also going to want to skip the part that does the create statistics stuff because you don’t need to create a whole bunch of extra statistics in there.

Um, so that’s the first part that kind of goes and does that. And then after the statistics, statistics get created, boy, oh boy, we’re having a great tongue day today. Aren’t we? I just flip all over this thing. Uh, I create a few tables to hold the output of DBCC commands. Um, I know that there are built in DMVs and DMFs that do some of the stats stuff now, but, uh, I like the way that these things work a little bit better. And the way that they, you know, some of the information that they give a little bit better. So I stick with the old fashioned DBCC commands.

And, uh, then I go and I cursor over the statistics that I care about and I run, um, the DBCC show statistics with stat header. And I put that in a table and then I have to do some updates to make sure that I have the right stats names and stuff in there. Uh, and then I, oops, uh, I hit the wrong button outside of the VM. And that looked funny. That looked funny locally. You probably didn’t see anything. Uh, and then I do the same thing, uh, for the density vector part of DBC show statistics.

And then I do the same thing with the histogram. So I get three different DBCC show statistics components separately. And I put those into table variables because performance does not matter here. I can use table variables. It’s wonderful. But then I take all that stuff and I put all of that into a temp table and then, um, right there.

And so that’s the results of the header and the vector and the histogram. And then I run some queries to show me what comes out of that. Now I’ve already run this, so you don’t have to do most of it, but, um, there’s, there’s a few different results in here. And the one where I found really the best, um, the best demos from, and if you’ve ever watched my videos, you might recognize some of these.

This first result shows me where the, um, the guess that I would get from an equality predicate wildly messes up how many rows would actually come back from that. So, uh, this chunk in here, uh, of course, most of it on the post table, but this was all really good. Um, this one here on the votes table, uh, was good for, well, I mean, the user ID column in the votes table is all null.

So, uh, this one was questionable at best, but, uh, you know, figuring this sort of stuff out, uh, in the script was a little bit, uh, you know, a little bit more difficult than I would probably want to get into. But, uh, this top one up here on parent ID, uh, in the post table where, um, SQL Server guesses 120 rows, um, but we get 6 million rows back. That was a very, very good one.

Some of the ones for nulls are good for showing different stuff. Like if someone compiles a store procedure with a null parameter, that was good for something different than the local variable stuff. But, um, the, the local variable guest stuff for especially the parent ID one, uh, and the accepted answer ID one, those were excellent.

Uh, and those have spawned a lot of great demos. So, um, I don’t, I don’t know who is going to be interested in this code. I don’t know if it’s going to be maybe someone who also has to write demos for SQL Server.

Um, maybe you have a demo database where you want to figure this stuff out. Um, you could do the same thing there, or maybe you, in your database, you know, maybe you have the code with a lot of local variables and stuff in it. Then you want to figure out maybe where, um, you know, your local variables might be causing bad cardinality estimation and performance problems.

You could do that with this if you wanted. Um, quite frankly, if I were trying to find poorly performing code, I probably wouldn’t start with this. I would probably just start to find queries that have a high CPU and or duration.

Then I would try to figure out if local variables or bad cardinality estimates or whatever else are the cause of that. So, um, really this is probably mostly a tool for presenters who want to find good demos. Um, I wouldn’t recommend, again, running this in production to do anything because, um, I don’t want to be responsible for whatever happens.

In there from doing all this, right? Running those unlicensed DBCC commands. Anyway, uh, just sort of a fun video with a fun script.

Again, this will be on GitHub. This will also be a link in the, in the video description. Um, if you feel like giving it a spin in your demo database or your non-production database, uh, it might be fun for you. Just remember, if you’re running this in an actual database, you’re going to want to skip the create statistics part because, um, you’ll, you might spend a very long time creating statistics on a bunch of columns so that you, you have things to look at.

Uh, but demo database wise, this is a lot of fun. Anyway, um, I don’t, I don’t know. You know, they, they, they, they can all be in-depth SQL Server performance tuning stuff.

Sometimes people like to see how the sausage gets made. And at Darling Data, we make a lot of sausage. All right.

Recently voted by Beer Gut Magazine to be the, the sausage king of SQL Server. So, got a lot going for us here at Darling Data. Anyway, thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.