Learn T-SQL With Erik: Indexed View Serializable Locking

Learn T-SQL With Erik: Indexed View Serializable Locking


Chapters

  • 00:00:00 – Introduction to Index Views and Blocking Issues
  • 00:03:45 – Isolation Level Promotion to Serializable
  • 00:06:51 – Blocking Analysis with SP_WHOISACTIVE
  • 00:09:28 – Trace Flags for Detailed Locking Information
  • 00:11:56 – Summary and Next Video Preview

Full Transcript

Apologies for the low audio in this video — I was testing a new microphone setup that didn’t work out. Should be back to normal volume in future videos.

Erik Darling here with Darling Data, now with a fully turned on microphone. In today’s video, we are going to go through a little more snippity-tidbitty tiny chunks of material from my larger Learn T-SQL with Erik course. You can buy that in a very cool way by looking down in the video description.

If you look on down yonder, what you’ll find are all sorts of helpful links. Helpful to both you and I. Helpful to you because you can get help, or you can get training.

Or you can support this channel so that I don’t turn it into an Amiga repair channel. And that gives me money and it gives you SQL Server stuff. Fair trade.

You can also ask me office hours questions. And of course, if you enjoy this content, please do like, subscribe, tell a friend, family, foe… I don’t know, tell your foes too. At least, maybe they’ll get so into this channel they’ll just become completely distracted.

You’ll no longer have to worry about them attacking you in some way. But the training in question does have a coupon code linked down in the video description if you are the type of person who needs, wants, requires, or enjoys high quality SQL Server performance training.

For free, also linked down in the video description, my SQL Server performance monitoring tool. It’s almost like if you hired me to come work for you.

And you said, hey Eric, we got these slow SQL Servers. How can we figure out what’s slow and how can we make them faster? And this is the stuff that I would look at. I might even bring my own monitoring tool in and be like, hey, we’re going to start looking at this stuff.

And if you need help with looking at your monitoring data, I have enabled you to opt in to turning on some MCP tooling servers. And so you can have your robot friends look at your, collected performance data and only your collected performance data in a read-only way so that you can figure out what’s going on with your SQL servers.

We are down to two. Because this is publishing while I’m in Poland in Wroclaw for SQL Day. I’m sure I’m having a great time.

I’m sure that I haven’t been arrested and I’m sure that I am very healthy and not hungover. All things that I’m sure are true at the time of this thing. Going live. But I still have some other things that I’m doing out in the world.

Data Saturday Croatia, June 12th and 13th. I have an advanced tickling pre-con. If you…

I mean, maybe I should. There’s probably better money in a tickling pre-con than a T-SQL pre-con. But I have an advanced T-SQL pre-con at Data Saturday Croatia. And I will be attending PaaS Data Community Summit in Seattle from November 9th to 11th.

And some manifestation. We’ll see what happens. But for now, it is May.

And I hate Star Wars. So we’re just going to keep going here. All right. Because that day happened. I am much more of a Cinco de Mayo person.

National Margarita Day, also a personal favorite in the Darling Data household. 100% of SQL Server monitoring tool moguls prefer… National Margarita Day to dumb Star Wars days.

Anyway, let’s go look at IndexView stuff. Now, IndexView maintenance is a very, very complicated topic. It requires a good bit of caution when discussing all of the different aspects and behaviors that get involved here.

For a single table IndexView, locking is pretty normal. You take some xlocks. And the transaction isolation level is left alone.

The query plans can get interesting. But, you know, that’s just IndexView maintenance for you. Paul White, friend of the chain Paul White, has an excellent post called IndexView Maintenance in SQL Server Execution Plans.

Written back in year of our Paul, year of our Kiwi 2015. But things get much more complicated. And something that I would probably caution you against doing is putting multiple tables into an IndexView.

Because when you do that, that’s when things start to get funky. Both from the perspective of, well, I mean, the complexity of maintenance for that. The performance of maintenance when it comes to that.

And also, what happens to locking and isolation levels behind that. So, let’s look at an example. I’ve already pre-created this because it takes a little bit to get set up.

But I’ve got this IndexView called UserPostScore. And it is joining the user’s table to the post table in the StackOverflow 2013 database, StackOverlord. And I have created myself a unique clustered index on it.

So, if you want to, oh, yeah, that’s the thing. If you want to reproduce this code, this little function that I’ve got here. What’s up, locks?

This is in my GitHub repo, the darling data one. You’ll find it under, like, helper, use, functions, things in there. But let’s start this running. And what I want to show you first is when we update the user’s table, right, we’re going to look at locks for the user’s table, the post table, and our IndexView, right?

So, if we run this, this takes about five seconds because I haven’t done any tuning work on it. It’s sort of intentionally bad. We’ll see that we have regular xlocks on users.

We even have some regular xlocks on the IndexView UserPostScore. We do not have any locks on the posts table, right? But notice this third line down here.

We have these range xxlocks taken on the IndexView. These are serializable locks, right? So, our isolation level has been silently promoted up to serializable. And that means, of course, that, like, you know, under the default index view, it’s going to be serializable.

It’s not read committed isolation level for most SQL Server people who are, you know, just refuse to listen to me about using read committed snapshot isolation. That means that if we try to run a select from the IndexView, we’ll get some locking that is sort of expected, right? We’re updating rows in the user’s table that affect rows and columns that are in the IndexView.

So, getting blocked here is, like, pretty acceptable. What’s weird. Well, rather, it’s acceptable under the unacceptable garbage isolation level read committed.

But if we also run this and we try to update the post table, the score column in the post table, this gets blocked, too, right? So, let’s come over here and let’s look at SP who is active. We’re going to use my favorite personal way of, why does that say execute?

There are more letters in that word. One of my personal favorite combinations. Of parameters for SP who is active.

Get additional info and get task info. This is a much more lightweight way of seeing blocking. There is the get locks parameter as well. So, if we say, let’s get some locks here.

Get some else socks. If we say that, why are you trying to fight in here, Management Studio? So, if we say get locks equals one, we will, of course, get this locks column.

And this locks column will tell us quite a story. For the query that, oh, that’s the next video. Sorry about that. Now we’re all spoiled.

For the query that’s doing the blocking or doing the locking, you know, there’s not a, maybe, there’s some interesting stuff in here, right? Some of the intent exclusive in the xlocks. And we see the range locks in here.

But this can often turn into sort of a lock for SP who is active to enumerate when there’s a lot of locking going on. So, that’s why I prefer get additional info. And get task info.

Because if you scroll over to this additional info column, for the query that’s doing the locking, it’s not going to show you anything all that interesting. But for the queries getting blocked, you will see what type of lock is being taken, the object that it’s getting hung up on. And so, the select query is, of course, blocked trying to read from the index view.

And the update to posts is now getting blocked on the user’s table. Well, that’s weird. All right.

Why does updating the post table get blocked on the user’s table? Well, it’s all about that index view, right? So, let’s get rid of these things. Let’s make sure we are good citizens and roll back our transaction.

Remember, kids, if you begin a transaction and you just hit cancel, it doesn’t roll back or commit the transaction, right? You have to actually tell SQL Server what to do. And let’s also kill this one here.

We don’t need to do that. We don’t need to keep running that. So, let’s also. We should roll this back as well because we don’t need that thing sitting out there in the breeze. So, the way that you can see what happens is by using a bunch of trace flags.

I believe that if these aren’t in Paul’s post about index view maintenance, they’re in another post of his. So, of course, you know, I take no credit for the discovery of this, but I do want to make sure that it continues to be discovered. So, it’s going in here.

So, if you turn on these trace flags, 8606, 8607. And you run this update, you will get in the messages tab a whole bunch of stuff, right? There is a lot going on in the messages tab.

I’m not going to try to scroll to find all this stuff here. But there are a few points in here that you will see if you run this and scroll through everything. Most importantly, these two lines.

So, on the post table and on the user post score index view, SQL Server internally applies these hints to the query. You can’t see them. They’re not in any of the block process or deadlock report XML.

They’re not even in who is active because, again, the optimizer, the engine, let’s call it, adds this stuff in when the queries execute. But you’ll see the serializable, internal. Detect snapshot conflict in case you have snapshot isolation on.

And then the one on user post score is updelock serializable detect snapshot conflict. So, SQL Server, this is the case where it’s silently upgrading your isolation level to serializable from whatever in order to make sure that the index view maintenance maintains referential integrity. So, very cautionary thing against using multiple tables in an index view.

I believe I’ve said it in other videos. Index views should be on a single table at a time and should be very generic in their aggregates. Typically, you don’t want to gear them towards just one single query because it’s far too specific.

They won’t get used a lot. And the impact that they may have on the entire database ecosystem can be sort of outsized for the amount of benefit that you’re getting from them. In other words, the cost of maintaining that index view is not amortizing or helping many queries.

Run fast. Anyway, thank you for watching. I hope you enjoyed yourselves. I hope you learned something.

And in tomorrow’s video, we will be looking at something a little bit more query performance indexy. So, we’ve got that going for us. Anyway, goodbye.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

SQL Server Performance Office Hours Episode 64

SQL Server Performance Office Hours Episode 64



To ask your questions, head over here.

Chapters

Full Transcript

Erik Darling here with Darling Data, and today’s video, we are going to do a whole bunch of office hours. Won’t that be fun for everybody? Because, you know, I like answering questions, and sometimes I can’t think of all the good questions, so I leave some of the questions asking up to you.

And so you, the fine people of the SQL Server, the greater SQL Server community, the greatest SQL Server community, can send them to me for free. Let’s talk about how you do that. Down in the video description, there’s all sorts of helpful links for you to do things.

So, one of those things is asking me office hours questions. There’s a link down there where you submit a question. It’s wonderful. And there are other links in there, too, that do require a little bit more, let’s just call it, money. Like, you can hire me for consulting, you can purchase my training, and you can even become a supporting member of the channel if you decide that, you know, you’re into, you know, my channel that much.

Like, four bucks a month or something. It’s very, very inexpensive. You won’t even notice it. It might even be tax deductible, because I am a charity. The darling data home for little wanderers.

Anyway, also down in the video description, another free thing you can do is download my open source SQL Server monitoring tool. It does all the performance monitoring that I would do if I worked at your company, and I was like, hey, we need to fix this. We need to fix this.

We need to figure out what’s wrong with these SQL servers. This is what I would do. And you can have it there for free. Again, like, you know, if you have a big expensive monitoring tool right now that isn’t helping you figure out your performance problems, try this one. Again, I make the same amount of money either way.

Zero. But you, you might actually get a good monitoring to a lot of it, which is a pretty good deal. Again, the link for that is down below as well. But, hey, come on.

Why didn’t you work? Anyway, so, like, actually, no, actually, I will be home from Chicago and in Poland when this publishes. So, forget those.

You know, time travel is weird. But coming up, I will still be at Data Saturday Croatia, and I will still be, as far as I know, you know, all things in the world considered, be at Past Data Summit in Seattle, Washington. So we’ll have, well, I mean, let’s just say I’m definitely going to have an advanced T-SQL pre-con here.

Here? I don’t know. Might happen, might not. We’ll just have to see what happens in the world.

Anyway, it is May, and ChatGPT thought May was a real gothy database rainy party, I don’t know, singing hymns to skulls. That’s… GPT got dark. I don’t know.

Anyway, let’s answer some questions. We got questions to answer here. Let’s see, is ZoomIt going to cooperate? How many layers of ZoomIt being uncooperative will we have to deal with? All right.

First question. How risky is plan forcing and QueryStore realistically? Have you seen it go badly? Oh, boy.

Doing what I do, I see everything go badly. So, like, I think the biggest risk that you run with plan forcing and QueryStore is, like, let’s say you’re in the GUI, right?

Because that’s the way most people approach QueryStore. I don’t know why, because I have, you know, SPQuickieStore, which does things a lot better than the GUI. But anyway, let’s just say you’re in the GUI, right?

Because one thing working with SQL Server is it’s a GUI mess. But you’re in there and you’re looking at, like, regressed queries or whatever. And you’re like, ah, when this query ID uses this plan, it’s always really fast and uses very few resources.

And when this query ID uses this plan, it really uses a lot more resources. And then you force the query that usually uses very few resources. And then you come to realize that the query plan that used very few resources did so because it was doing very little work.

And the query plan that was tied to that query ID that used more resources was, of course, doing a lot more work. And now you’ve just got, like, this awful parameter sensitivity problem. So, you know, what I say is always test carefully in those cases.

If you’re going to force a query plan, make sure that you understand what plan you’re forcing it for. Perhaps as far as you can get the compile time values for each plan. If you’re using, like, optimize for unknown or local variables or something like that, you’re not going to see them.

Other things might mess you up, like temp tables, table variables. You’re not going to know what data was in those. You know, if you use option recompile, you get the literal values.

You don’t need to know what data is in there, which you can figure out. But you can also use my free store procedure, SP Query Repro Builder, to help you figure that out. It’s another free tool in my darling data repo.

So if you go there and you get that, you can give it, like, a query ID or a plan ID or just about any other plan identifier in Query Store. And it will try to build a reproducible, executable query for you to run and look at the query plan for and all that good stuff. So, I don’t know.

That was more than I thought it would be there. Anyway. All right. All right. Zoom it. All right. Zoom it. You’re going to be like that today. Wow.

Okay. Is parameter sniffing actually a bug or just misunderstood behavior? I mean, look. I was going to say, Microsoft has had some pretty long-standing bugs. But I’m pretty sure if this were a bug, they would have done something about it by now.

In fact, they kind of did with the parameter-sensitive plan optimization. I don’t have my beefs and grievances with that thing. But, you know, we don’t need to get into that here.

At this point, you know, if anyone misunderstands the behavior, I think that they are just being willfully ignorant. There are 10 billion years of blog posts and informations out there. And, you know, especially now in the modern day of the robot, if you, like, robots love pointing at, I think, well-known problems, right?

Like parameter-sensitive. Ah, classic bug. Ah, classic parameter sniffing issue, right?

They love talking about that stuff. So, you know, I think it depends on how you look at it a little bit. There are other well-known databases.

Let’s say, like, Postgres where there is not really plan sharing. I know there’s, like, a reusable session thing. But, like, if you have, like, 50 different sessions all running the same query, they’re not going to, like, cache and reuse a plan.

There’s not really, like, the concept of that. Right? There’s not really a plan cache the way there is in SQL Server. So other databases do approach this differently.

You could consider that good or bad depending on, you know, what problem you’re dealing with. I generally consider parameter sniffing a good thing. I consider parameter sensitivity a thing that I like to fix.

But, you know, you can make the case for either one being a good approach depending on the workload, depending on the characteristics. I do have one friend of the channel. Who I will not name.

Be decent about this. Who was experimenting with getting his application migrated from SQL Server to Postgres. It’s called Extreme OLTP.

Make him happy when he hears that. But the cost of the connection and compilation of these tiny little queries on every connection for Postgres is a very big penalty for them. Where in SQL Server, it is, you know, like, almost instantaneous because we have the plan cache and we have plan reuse and stuff like that.

Anyway. What kinds of performance problems only show up under heavy concurrency and never in test? Geez.

So many. You know, I think probably the first one that I can think of is, like, deadlocks. Right? Because, like, not every deadlock is just a predictable, like, table A, table B, table C. Right?

Table B, table A. We got in a fight. Like, some of them really are just, like, bad luck and bad timing. And they only show up under, like, you know, actual heavy, like, you know, real world concurrency. You know, ditto some blocking problems.

Though I think blocking problems are somewhat easier to immediately forecast because you’re like, well, this modification query takes forever when I test it. What could go wrong? So there’s that.

But there are all sorts of other ones, too. You know, primarily amongst sort of, like, resource exhaustion. Resource semaphore would be one of them. Thread pool would be another one. Resource semaphore query compile when you have all the queries in the world trying to get through those compile gateways.

So there are a lot of things that show up. And even, you know, before you get to anything that’s, like, monstrously catastrophic, like, you know, like resource exhaustion, big blocking, deadlocking stuff. You know, you’ll run into little things, too.

Like maybe even queries. Maybe even queries will just be a bit slower because there is just resource contention for things, right? Whether it be objects in the buffer pool, whether it be access to CPUs, things like that. You know, something I talk about a lot in my training is, you know, the difference between tuning one query to be as fast as humanly possible or as fast as SQL Serverly possible.

And, you know, sort of tuning a query so that it is a good citizen in a workload, right? Because, like, you know, let’s put it into two pieces. Let’s say you have a very big query, like something reporty, data warehousey, right?

But you’re reporting off OLTP because you’re that kind of crazy person. And you might, you know, look at that query and that query plan. And you might say, oh, well, I have, like, you know, maxed up is set to four.

All right. Well, I have four threads working in this query. Then each processing, like, I don’t know, let’s say two billion rows apiece, right? And you’re like, yeah, that’s not good.

If I run this query at maxed out. All right. I’m going to stop eight. Then I can cut that in half because, you know, we’ll have twice as many threads for the queries to sort of divide the rows to divide up on. So that’s one way of looking at it.

But then the other way of looking at it is sort of like, well, like, maybe not every query can get, like, all the maxed up and all the memory it wants, right? Maybe some of these queries are going to have to run a little bit imperfectly so that when we have our larger workload running, the green screen over there is getting real annoying. And I don’t understand why.

It’s wiggly. I’m just standing here. Like, you might have to, like, sort of have them sort of run in a more cooperative way. So, like, you know, some queries may run imperfectly so that the larger group of queries can all run together. But, yeah, you know, there’s lots of stuff that will only that only shows up if you have that only shows up under, like, real user workload. I think accurate load testing is one of the hardest things in the world right now. I mean, I don’t know. I don’t know. I don’t know. I don’t know.

I don’t know. I don’t know. I don’t know. I don’t know. I don’t know. I don’t know. I don’t know. I don’t know. I don’t know. I don’t know.

I don’t know. I don’t know. I don’t know. But, you know, a lot of that just just simplyب shows up if you have that only shows up under like real user work load. I think accurate load testing is one of the hardest things in the database. How do you approach tuning when you have zero control over the application code.

Hardware, Hardware hardwa. Now let’s get sometimes it comes sometimesit is Hardware, but you know that’s that’s more of a generalized thing. Most of the time it would be indexing.

You know. and various other things that you can do in SQL Server that can have queries do a little bit better. Let’s say indexing is the first one that would include index views, that would include filtered indexes, that would even include indexed computed columns.

So if I have zero control over application code, but I have carte blanche for the schema and anything else in the database or that the database is sitting on, then that sort of becomes the realm that you have to work in, right?

Is it ever safe to fully ignore missing index DMV suggestions? Sure, I do that all the time. The thing with the missing index DMVs is like, there’s a lot of gaps and holes in them.

They are very opportunistic. They are part of query optimization. So like SQL Server isn’t spending a long time thinking really hard about the index that it’s asking for. It’s very where clause, centric, will often ask for very, very wide indexes, like the sort of normal spate of complaints about the missing index suggestions.

So yeah, I do ignore them a lot. Where I don’t ignore them is in query plans that I’m tuning.

So the chances are, if I’m looking at a query that is not running as quickly as I want it to, then there’s probably something that I want to do to fix that. One of those things might be adding an index or modifying an existing index.

So what I’ll usually do is I will take the query, I will get an actual execution plan for it, and I will see if the missing index suggestion or any of the missing index suggestions line up with a slow part of the query plan.

And if I think that adding that index will indeed positively affect that slow part of the query plan. But from there, sometimes I will go and look back at the missing index DMVs and say, okay, how often would a query theoretically use this?

I can see how many times a query is executed in various ways, whether it’s the plan cache or query store, but I still need to deal with figuring out, is this index good generally for the workload?

Not just once a day if this query runs, but are there many other things that would use this? Anyway, this is really annoying me, so I’m gonna go try to fix it.

Thank you for watching. I hope you enjoyed yourselves. I hope you learned something, and I will see you in tomorrow’s video where we will talk about something T-SQL-y, I promise. God, green screen is a real.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Learn T-SQL With Erik: Indexed View Matching

Learn T-SQL With Erik: Indexed View Matching


Chapters

  • 00:00:00 – Introduction
  • 00:02:45 – Creating the Table with a Computed Column
  • 00:06:39 – Index View and Filter Definition
  • 00:08:17 – Non-Parallel Plan Reason
  • 00:10:34 – Scalar UDF Not Parallelizable Warning
  • 00:12:07 – No Expand Hint Effect
  • 00:12:57 – Summary and Next Video

Full Transcript

Erik Darling here with Darling Data, and in today’s video we’re going to continue on the Learn T-SQL voyage that we have started, and I’m going to talk today a little bit about indexed view matching, because SQL Server is, let’s just call it a mature, or an experienced database engine, and is quite capable, at least in Enterprise Edition, Standard Edition, you do not pay the Microsoft Friendship Tax, so you will be taxed performance-wise, but is quite capable of matching base queries to an indexed view where the syntax matches in some way between them. So, usually exactly between them, not in some way, usually pretty close to just about what you would ask for.

But that’s why, you know, it’s important when you’re, if you’re going to go down the indexed view path, and I do, I do want to say that if you’re going to start creating indexed views, you know, you should be very careful with them, you know, usually you want to limit them to just involving one table, and you want to write indexed views that are sort of a general purpose variety. And that’simo Scrum, and the more obvious, like the number of times, you want to stay in one table because that’s very common.

And you aren’t really going to send a lot of pub, or a ton of files, or not a ton of files back into the indexed. You’re going to have to deal with a diverse set of Mmmm. Repetitive Data or Web connect devices and all sorts of things, in terms of the three sections.

But a lot of this being ordered is flip side up a lot more often than previous, and I work with native comic books, of course. And I would like to turn it up a little bit now with a full-on summary of some 할게요 and that kind of, cause Iersh entrepreneurial app. CAR N stick?

store still has quite a bit of hobbling on standard edition and so on standard edition you just might need to go down the index view path. There’s also something I think to be said for having the data already pre-aggregated for you in some cases because look as powerful as columnstore is you know you still have to do the work to get the result if you have that data pre-aggregated you’re skipping a lot of work in a lot of cases but you know anyway I’m going on too long without showing you anything interesting rambling babbling like a brook let’s let’s talk about some stuff and then look at some management studio goodies down in the video description you can hire me for all sorts of helpful things for you mostly well I mean it helps both of us right like you get expert SQL Server help and or training and I get money so that’s that’s that’s that’s that’s that’s that’s that’s that’s that’s that’s that’s that’s that is a really fair deal I think you can also become a supporting member of the channel if you’re like wow Eric thanks for doing all this YouTube and for us you sure sure put in some hard work you can support this channel monetarily again links are all down down down below that’s why the fingers are pointing this way and of course if you’re just I don’t know if you’re you’re too poor for words or I don’t know maybe you just don’t like me that much you can always say well maybe someone else will like them better and you can you can like subscribe and tell a friend who might like me better right because that’s what friends are for even if I mean your friends not where I don’t know about you and me maybe someday what we have to meet and sort some things out I think if you would like absolutely free no strings attached SQL server performance monitoring I have my free open source monitoring tool available on github that’s a link but there’s a link down in the video description below if you want to check that out as well that’ll take you there you know just average ordinary everyday T-SQL collectors brings all the important performance data in you know the stuff that I would care about and look at if I were performance tuning a SQL Server on my own makes them up puts them in all pretty charts and graphs and whatnot and one thing that’s pretty cool is if you want if you optionally would like to use your robot companions to help you analyze summarize or analyze about your SQL Server performance problems you can do that there are a whole bunch of read-only built-in MCP tools that allow you to interact with your performance data and just your performance data for weeding out your problems I think when this gets released I will be in Chicago so it might even be May 7th and 8th this comes up no I think I think it’s probably maybe just the seven uh let’s see um actually maybe not I don’t know whatever uh so uh you will have missed your opportunity for pass on tour uh but you would still be able to get to go to SQL day Poland and see me so that would be cool if you did that I have an advanced T-SQL pre-con there uh I’ll be hopefully wowing my my friends from Poland and the surrounding surrounding climbs or areas or whatever uh with with my T-SQL wit and wisdom um after that uh home for a little bit uh then I will get data Saturday Croatia June 12th and 13th uh I also have an advanced T-SQL pre-con there uh while I’ll be uh wowing my Croatian friends and friends from the surrounding areas uh I hope some Hungarians show up so I can I can talk about being part Hungarian and then saying yeah okay you’re American uh I’m gonna say yeah yeah you’re right but I had a grandma and they’ll say great anyway uh after that uh presumably home for even longer and then it passed data Community Summit in Seattle Washington November 9th through 11. uh where uh undecided events will unfold uh as far as my my speaking goes but for now uh we are still Maying about being real rainy cold May people um bringing out the skull I guess uh ghosts of Springs past haunting us reminding remember remember when this used to be a beautiful time of year not this year damn it anyway um so the the thing that I want to talk about uh with index View matching is uh how like the optimizer even even if it matches your indexed view uh will always expand the view as part of query optimization it just does that um even if very early on it’s like yo index view why it matched to you it still expands that and like looks at stuff and that can have some weird impacts not only on your query plans but also on performance. I want to show you a very simple example of that with some stuff here. So like, you know, like in a lot of videos, I’ve talked about, you know, like, ah, you make a computed column, the filter definition of a filtered index can’t go on it. So you can sort of like work your way around that a little bit with index views. And I want to show you that. But I also want to sort of like prove out a little bit in the full video material. This is, of course, you know, this is I have a terrible salesman. This is, of course, part of the Learn T-SQL with Eric course. So the full like module material, I go way in depth on this with like trace flags and stuff to show you what’s going on. But in this one, I just want to show you a shortened version to hopefully whet your appetite for buying the full version. You know, it’s what you what I made it for so you can buy it, right? So let’s drop a view and a table if they exist.

And what I want to show you in this one is a simple table with something that I would normally find absolutely abhorrent and scream, holler and curse at people about. And that is this table has a computed column in it. And that computed column has a scalar user defined function in it, right?

One of the absolute worst things you can do to a table in SQL Server is this, don’t do this, do not follow this. But sometimes you have very, very big tables, that have these things in them. And sometimes you need to figure out a way to get your queries to not get beat up by them. And you put a scalar UDF in a computed column, or a check constraint, or a default constraint I learned recently from a brave and bold YouTube buddy.

It messes up parallelism for any query that touches that table. So if you have what even even a small table, and you join that small table routinely off to larger tables, where parallelism would be of some unknown benefit, you will not get it, right? Even scalar UDF inlining does not help you with computed column, check constraint or default constraint scalar UDF.

So we don’t we do not like this, we do not enjoy this. But we’ve got this situation here. And I think I created the table, I think instead of just pondering about it, that’s good. So of course, you know, we can’t create a computed column where the filter definition hits that, right? So fine.

But we can create a computed column where that’s the filter definition in our table. Because, you know, we can create a view on our table. And this can make a lot of sense and we can maybe want to try to get around that, you know, we can create in a view on our, on our on our table.

And we could even index that view. But what, you know, further annoying things in our lives is that we cannot create a filtered index on an indexed view, right? So this just extends even further into the things that don’t cooperate with each other in SQL Server.

into the index view, or rather into the view, then create the index. The thing is, you will still find the same problems unless you use a no expand hint.

And this is because SQL Server’s query optimizer, like I said before, really does expand those index view definitions. So let’s start off by looking at a select from the base table, right?

We run this, we actually, we end up using the index view, which, you know, great for us, I guess. I don’t know what that proves. Maybe that, you know, T-SQL’s, or rather SQL Server’s cost-based query optimizer really does do index view matching, but we have this warning, this non-parallel plan reason.

In earlier versions of SQL Server, you’ll just see a generic, could not generate valid parallel plan, but newer versions of SQL Server, we get like a reason, right? T-SQL user-defined functions not parallelizable. Oh boy, right?

Nuts, as they say. Even if we select directly from the view, right? And we say, SQL Server, I bet you’re not gonna do anything stupid now. SQL Server says, oh my God, you’re gonna hate me.

And we get the same thing here, right? So whether or not SQL Server does all that stuff, we still get the bad effect of the scalar UDF. However, if we add a no expand hint to this query, right?

We’re still not getting any results back. Notice, actually, one thing I should do here is I should run all three of these at the same time. Notice in the execution plans for the first two, we have a compute scalar, right?

And that compute scalar, if we’re having a very lucky day, right? Let’s see, is it gonna give it, let’s see. Well, no, it just tells us the column name. It doesn’t give us the full thing.

But the important part here, of course, is that the third query with the no expand hint does not have that compute scalar operator. There is nothing betwixt or between the stream aggregate and the clustered end.

Dot, dot, dot. What are you? I don’t know. But this query, I mean, obviously, there are no rows in the table or in the view. So it doesn’t help here.

But notice the lack of a non-parallel plan reason. It’s always difficult to prove a negative like this. But you’ll notice that we do not have anything in here that says non-parallel plan reason, the T-SQL scalar UDFs, yada, yada, right?

Whereas with these ones, of course, that thing, it does show back up. If we just tinkle back and forth, and then it’s gone again.

So the no expand hint can be quite useful beyond just getting SQL Server to directly match to an indexed view. When you query it, it can also keep SQL Server from doing some of the expansion activities that it is wont to do when you get involved, when you jump in bed with indexed views.

So anyway, that’s about good here. I hope you enjoyed yourselves. I hope you learned something. And I will see you in tomorrow’s video. Oh yeah, I did talk about that there. All right, let’s get rid of that index view.

Actually, no, there is no tomorrow’s video. I will see you in Tuesday’s office hours video. Forget. I’m taking long weekends now. It’s great. I get to rest, relax, do my hair, tidy up my face a little bit.

It’s beautiful. Drink more. It’s a good time. Anyway, goodbye.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Dynamic SQL vs OR Clauses in SQL Server

Dynamic SQL vs OR Clauses in SQL Server


Chapters

Full Transcript

Erik Darling here, Darling Data, and today’s video, we’re going to talk about a little technique that I sort of, I don’t know, I don’t want to say invented, because saying you invented something is, well, it’s a bit much, right? It’s quite presumptive. But it’s just something that I sort of started, I did, and I want to share with you. And it’s about using dynamic SQL to sort of trim down or clause queries. Like, you know, you get those queries that are like, where something blah, blah, or blah, blah, blah, blah, blah. Anyway, so like this, this is sort of another take on like the unpredictable search thing, but it’s not like the kitchen sink catch all one. So stick with me. Because if you’re expecting that you’re wrong. And I’m going to show you in a minute. Before we do that, down in the video description, if you would like me to invent query tuning techniques on your servers, you can hire me for consulting. Isn’t it your lucky day? You can also buy my training if you would like to learn how to do these things on your servers. Wouldn’t that be great? The Learn T-SQL with Erik course down in the video description below. Wonderful, wonderful repository of information and knowledge that I have put together that the robots have not quite stolen yet. They’re still pretty dumb about a lot of this stuff. If you want to become a supporting member of the channel, you can also do that. You can buy me one half of some kind of coffee drink.

A month, four bucks. Not too bad. You can also ask me office hours questions that remains free. For some reason, I don’t know, I feel like maybe I ought to start charging a nickel on that. Make sure they put down a deposit. I don’t know. Anyway, if you if you are, I don’t know, either really smart, or really broke, or you don’t feel like talking to me or something like that. You can always just like subscribe and tell a friend because maybe maybe a friend of yours will feel differently and perhaps want to interact or engage with me in some form of wacky hijinks or misadventure. You can get my free open source SQL Server performance monitoring tool from GitHub. There’s a link here. There’s that link is also cleverly woven into the video description. Totally free, totally open source, no email, no phone home. It’s better than all the third party tools out there that charge you an arm and a leg and a foot and an eyeball and maybe a couple unmentionable parts. And, you know, it’s, it’s all me, you know, kind of just the stuff that I care about monitoring performance wise and SQL Server, putting it out there for everyone to use a, you know, beautiful charts and graphs. And if you are a robot friendly person, it has built in optional opt in MCP servers that you can flip on.

And have your own personal robot companion, look at your performance data and just your performance data. And maybe help you figure out some of the problems you’re having. Coming up. Well, I mean, that is this week, isn’t it? I will be at Pass on Tour Chicago. We’re going to set a little bit of a theme going here, where I’ll be teaching an advanced T-SQL pre-con. That’s May 7th and 8th. After that, I will be dragging my behind to Poland to this wonderful SQL Day conference. May 11th through 13th, where, lo and behold, I will also be doing an advanced T-SQL pre-con. After that, I will be recovering from what will most likely be a very heavy drinking, what do you call it, trip to Poland. Because I think that’s what you do there, as is customary. And then I will be going to Data Saturday, Croatia, June 12th and 13th, where, shockingly, I have an advanced T-SQL pre-con. After that, maybe, just maybe, I will have an advanced T-SQL pre-con at Pass Data Community Summit in Seattle, Washington, November 9th through 11th. We’ll just have to see. All right. With that out of the way, May continues. The weather in New York has barely improved. Today was actually quite a nice one, but that is a rarity in my neck of the woods lately. Mostly it is just too cold and very windy, and I think we should bomb Canada. Anyway, let’s go to SQL Server Management Studio. And what I want to start showing you here is the sort of beginning pattern of the query that I ended up tuning. I’ve sort of fitted things to the Stack Overflow database as best I could. It’s not quite as great a demo as the one that I actually tuned was, but I think you’ll get the point, and you’ll appreciate it. So, you know, there was sort of like a static part of the WHERE clause, you know, filtering on some stuff. And then there were a bunch of sort of case expressions in the WHERE clause. And it was like, well, if the minimum score equal, where the score is greater than, well, if the minimum score is zero, then we just say p.score, which, you know, since it’s greater than or equal to, it always finds itself. Otherwise, we will go to the min post score in, you know, some table thing up here. I think in the original one, there was not, it was just, it was like, there was like two tables. There wasn’t a temp table involved. I had to cheat a little bit because I couldn’t quite get it right otherwise. But it was sort of like, every time this ran, it kind of did the same thing. Now, I’ve created for this, you know, a bunch of single key column indexes, because that was, you know, part of what was going on in the, the, the real situation. And then one wider index that encompasses all of the columns that we would be or causing on down here. So if you run this for a bunch of different permutations of searching different things, so like putting different numbers into all this stuff, you kind of get the same execution plan over and over again, like performance on this one isn’t terrible. But I do want to show you that you can make a difference, even in queries that already look like they are pretty fast. But in real life, you might find this query pattern and be like, oh, this is dreadfully slow. Why? Why would anyone do this to a server? So every single one of these has basically the same plan, right? It doesn’t matter if you search for almost nothing, or you search for everything, this all does roughly the same thing, right? You know, you grab the data that’s in the temp table, you nested loops joined to the post table, and you spend two 300 milliseconds in here, seeking on various things, depending on what you might find. And but like one of the real annoying things is that like a lot of those predicates, especially from the case expression end up at the join condition, right? So like, you’re not fully filtering out all of the rows that you could, when you’re touching the base table. And that’s, I find that personally offensive. Right. And so that’s what all those query plans look like. Your first instinct might be to rewrite the query to something like this, and say, well, you know, I think that like, or rather, I know, because I watch Erik Darling’s YouTube channel about SQL Server performance tuning, you’re a smart cookie, you might say, I get a bad feeling about case expressions and join clauses and where clauses, and I feel like I should take that case expression out, and perhaps write the query in a way that is maybe SQL Server will smile upon, right? So you might take out the case expressions and rewrite them as sort of or expressions, right? So you might say, and the min score is zero, or the score column in the post table is greater than or equal to the min post score that is in some other table, right? In this case, it’s a temp table, it doesn’t have to be an attempt table. But this all ends up in roughly the same condition, right? In the, you know, you join off to the post table, you are able to seek into an index here for like the at least the begin and end dates. But, you know, you still got in the nested loops, all this stuff, right? It’s all the or you can see the ors, and the and all that stuff sprinkled in throughout. So, you know, that doesn’t quite get us where we need to go. So what I what I what I did, and what you could do to if you felt felt so inclined, is to write some dynamic SQL, to figure out what data actually exists in, you know, your starting place, and figure out if based on that, you need to tack on any of those where clause elements to expand that and ors.

right? Because you could take out the and or sort of like, is it this or is it that? I don’t know what’s going on, you can take some of that element of surprise to SQL Server out, and you can get better queries and plans, right? So what we’re going to do here is say, well, like, we don’t really need to search on this, if you know, we don’t have, if we don’t have a minimum post score that’s greater than zero, there’s no point in asking if the score column is greater than equal to itself, right? And, you know, same thing with the view count column, the same thing with the profile, the min answer count column, and same thing with the min comment count column, right? So if we run all those, we get a much tidier execution plan, right? Because we don’t even need to bring in bring that temp table in. And we can just seek to if we’re just certain if like, we don’t have any of the other data available in there, we can just search on the creation date column. But as we get other other columns involved, we can we can start filtering out the data from the base object as we’re touching it, rather than like get only filtering out some of it, bringing a bunch of stuff into the join and having to do all that filtering there. What I think is particularly interesting about this scenario is if we come up to the indexes, and we drop the wide index, right, so that this this index here called everyone, we drop that because that was across all of the columns, and we rerun these, like, like the plan itself, like it changes a lot.

a little bit for these, right? And just instead of using that one wide index, we hit the clustered index, scan that this is about twice as slow. You know, again, it’s the same problem where like a bunch of stuff gets evaluated at the join, rather than when we touch the table. At this point, we’re consistently just apply those creation date predicates there. And it’s the same deal with the or clauses, right? If we come over here, and we rerun all of these with that dropped, these all get the same query plan too, right? So they’re all doing just about the same thing. I’m not going to belabor that point too. hard. But what I think is neat is if you use the dynamic SQL version of this, SQL Server can actually do like some index intersection stuff, right? Like for this one, well, all we do is seek into that creation date index. And as long as the demo gods are smiling upon me, this will all work. And this one, we use two of our nonclustered indexes, right? We seek into the one one index on creation date. And then we seek into another index on score, right? Because we can we can do that SQL Server is a smart cookie, it can it can it can use multiple nonclustered indexes to apply multiple seek predicates. And sometimes that’s a good thing, right? Sometimes that cost based optimization works out. And if we look at this, where we applied three predicates, now we have creation date score. And I believe that’s comment count, if I’m, if I’m remembering correctly. And for the final plan, we touch all four nonclustered indexes and seek into all of those. Actually, I think that’s five now, right? Yeah, one, two, one, I can count one, two, three, four, five. Yeah, there we go. So we do all this. And I think this is kind of a neat thing, too. We’re like, as as you start writing queries that give SQL Server less to guess about, you can have some very interesting effects, not only on query performance, but on the query plan to where SQL Server can do a little bit more of a little smarter cost based optimization stuff. I’m not saying that this is always the plan you want. I’m just saying it’s an it’s another sort of interesting artifact of writing smarter, cleaner, queries. Anyway, I hope you enjoyed yourselves. I hope you learned something. Both of those things, right? And I will see you over in tomorrow’s video where I think we will we will talk a little bit more about indexed views. Because we started talking about those, I don’t know, 3000 years ago. And then I think I have some more to say on them. So we will do that. Anyway, thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

SQL Server Performance Office Hours Episode 63

SQL Server Performance Office Hours Episode 63



To ask your questions, head over here.

Chapters

  • 00:00:00 – Introduction
  • 00:02:34 – What Metrics Do Monitoring Tools Tend to Mislead People the Most With?
  • 00:07:46 – Why Increasing MAXDOP Can Help with Spills
  • 00:11:58 – Conclusion and Next Video Topic

Full Transcript

Erik Darling here, your old friend, your old monitoring tool mogul, ErikDarling here with Darling Data, and it is officially a Tuesday now, we are on the Tuesday office hour schedule, and so this is where I go ahead and, well, I do my best to answer five, not one, not two, not any number less than five community submitted questions. For free, because I still have better answers than AI, I think, most of the time. Down in the video description, if you would like some answers that are better than AI about your specific, I don’t know, database, query, whatever, you can hire me to do that sort of consulting.

It’s a wonderful process. I highly recommend it. And you can also buy my training, which is, again, well, not to keep beating that horse, thank that one. No way that one’s winning the Kentucky Derby.

You can also become a supporting member of the channel, and you can find a link to ask me office hours questions, just like the nice folks who have asked me five questions this week have. And of course, if you don’t have a question, or you don’t have any money, you can do something else that’s nice.

You can like, subscribe, and even if you’ve got one or two, you can tell a friend about the channel. And maybe they will also like and subscribe and tell a friend. And you see how that cascades into something just absolutely wonderful for everyone.

If you need free SQL Server performance monitoring, I’ve got it. Believe it or not, that link is also baked into the video description, so crazy times, right? Totally free, totally open.

Open source. Just runs a bunch of good T-SQL to capture a bunch of good data. Puts it into pretty charts and graphs for you. It’s got a whole dashboard and cool stuff.

And if you’re, you know, really into the robots, and no one can talk you out of talking to robots, there are built-in MCP tools in the server so that you can begin to have the robot friends look at your performance data and maybe tell you what to do.

What’s going on. I can’t vouch for whether they will be right or wrong. All I can tell you is that they will have been pointed at the data that I have collected. And, you know, whatever they surmise from that is on them.

I have no control over that part. So, anyway. Stuff coming up. I actually get to take a couple of these off the chart soon.

Pass on Tour Chicago. That is soon. It’s like the end of this week. Holy crap. I have to go do something.

And then I will be in Poland May 11th through 13th. Flying directly there from Chicago from little Poland to actual Poland. Then I will be home for a little bit prior to Data Saturday Croatia.

And then home for a bit longer until past Data Community Summit in Seattle, Washington November 9th through 11th. At least, you know, as far as what I can tell you. I have advanced T-SQL pre-cons at Pass on Tour Chicago, SQL Day Poland, and Data Saturday Croatia.

We’ll just have to see what awaits us for Past Summit in Seattle, Washington. And, yeah. So this is still ChatGPT’s take on May in New York.

I don’t quite get it. But, you know, the robots, huh? Anyway.

Let’s answer you some questions here. Let’s see. Is SOS scheduler yield always CPU pressure or can it indicate other problems? So I don’t want to say that SOS scheduler yield always means CPU pressure.

But it does indicate CPU stuff. Because it is queries voluntarily yielding time on a scheduler. So other queries can get some time on there.

Be fair to everyone. Feel the warm embrace of the CPU collectivism, I suppose. And then, so, like, if this gets up, you know, real high compared to server uptime or, you know, you know, it often, like, when you say that, it quite often does accompany incorrect or at least ineffective parallelism settings.

So high CX weights and that stuff. Because, of course, parallel queries use multiple threads. CPUs harder than single threaded queries.

You know, you could also have a vaguely or, I don’t know, perhaps monstrously untuned workload that puts a lot of CPU pressure on the system. But, yes, if you’re concerned, if you’re seeing SOS scheduler yield as a very prominent weight on your server, then certainly I would investigate CPU intensive queries.

You know, prominent being, you know, either for sampled periods of time. It is quite high. Or perhaps, you know, if you’re looking at the server since startup.

And you’re seeing, like, you know, SOS scheduler yield is, like, near or equal to or higher than, like, server uptime. Then that’s probably where I would start. Let’s see here.

How do you tell? When blocking is actually the root problem versus just slow queries piling up. Well, suppose we could come back to weight stats on this one. If you’re experiencing high LCK weights.

Either from the perspective of total amount of time that queries are spending on those weights. Or if the average milliseconds per weight on those LCK weights are on the high end. Then that would be a good indicator.

If I were you, I would most likely be looking at it. To start, you know, assuming that you don’t want to, like, get emotionally invested in anything. You could start by turning on and analyzing the block process report.

I’ve got a free store procedure that does that. It’s called SP human events block viewer. You could also look at the system health deadlock extended event. Or you could, or sorry, the system health extended event deadlock stuff.

SP blitz lock will do that. And you could also spin up a dedicated deadlock extended event. Which I do prefer because at least when you are able to set one up in a way that creates and stores data in a file target.

Then you have a bit more fidelity in the deadlock data you are collecting. So first stop, look for high LCK weights. Either total hours of uptime or high average milliseconds per weight.

First stop, you know, turning on the block process and XML deadlock stuff. Looking through those. But of course, you know, you could also just maybe, I don’t know, do something crazy.

Like point a completely free open source SQL Server performance monitoring tool at your server. And have all of that stuff get collected for you. Ta-da! Look at that.

Wouldn’t that be nice? All right. Is there ever a case where having fewer indexes actually helps performance overall? Yes, I would say that there are many cases.

As long as you still have good indexes that fit your queries. I’ve got a store procedure called SP index cleanup that will, again, totally free, totally open source. That will help you identify unused and duplicative indexes.

And get those all merged in together so you can clean those up. There are three primary things that, let’s just give it the blanket term over indexing. I’m not saying this, like, because I think there’s a certain number of indexes that will, like, cause a problem.

I don’t want to put any magic numbers in your head. But what I would say, like, under the blanket term over indexing, the primary things you’re going to see are, you know, buffer pool competition. Because you have all these different objects competing for space in the buffer pool.

Most SQL Servers that I see. Do not have an adequate amount of memory to deal with the amount of data that they are lugging around. You would also see an effect on the transaction log.

Every index that gets modified as data in the base table gets modified has to write their changes to the transaction log. So the more of that you have to do, the busier your transaction logs become. And, of course, from the point of view of lock escalation.

Because the more indexes you have, the more locks you’ll have to take. And the more, well, I mean, only attempts at lock escalation. But, certainly, you know, having to lock all those different objects, even if it doesn’t result in a successful lock escalation attempt, can be unpleasant.

So, sure. So there’s all those things. But also, like, you know, sort of consider, like, the workload patterns.

You know, like, anything that’s remotely data warehousey or staging tabley is almost always going to be slowed down to some manifestly, you know. And that’s a thing. dismal degree by having more indexes to insert data into on let’s just call it a bulk load that’s always that you know that’s one thing to consider if your workload is not just standard stock and standard OLTP you know I guess there’s there’s maybe some other like let’s just call it optimizer stuff you know the how it might take its time or perhaps you know run out of its budget for steps to optimize a query maybe not pick the best like a better available index you know the optimizer just like you and me the more choices more chances you give it to screw up the more choices you give it the more likely it is to have an issue there all right what metrics do monitoring tools tend to mislead people the most with that’s an interesting one I don’t I don’t have a a firm answer on this I think my main thing with monitoring tools is that they they tend to sort of like just sort of saturate you with information and expect you to be able to parse it apart I wouldn’t call it necessarily misleading I would I would maybe call it more like misdirecting so like if you’re if you’re looking at a performance issue and your monitoring tool is just like well 1300 people logged in and 1500 people logged out watch out for that you know it’s just like it’s not really getting you to where you’re going you know like you know monitoring is monitoring though right monitoring is just collecting a bunch of metrics storing them and letting you decide monitoring is not analysis monitoring is not you know getting you helping you get to a root cause necessarily it’s just sort of collecting data for you to do something with that’s why when I was building my performance monitoring tool there were some things that I wanted to do with it that would make your life easier you know like putting the MCP tool server stuff in there was one of them because it lets you have you know one of your robot friends look at the performance data and you know help you find stuff in there you can ask whatever questions you want of the data that gets collected so like you know like maybe it’ll look at it and maybe it won’t find the particular thing that you’re interested in but you can always point it to that right prompt engineering I believe they call it right something like that uh and you know that’s also why you know like all the you know over time I built the tool up to really help you like when you see a spike on a chart or a graph be able to get to the queries that were most likely involved with that spike in the chart or the graph so that you have an easier time of just you know like figuring out what was going wrong in the server at times where you might have been concerned about it so again I don’t think it’s necessarily that they mislead you I think that they they tend to misdirect you with over collecting and um stuff like that all right oh let’s see uh finally something this is an interesting one we increased Mac dot max stop not Mac dot Mac duple uh and spills went away this seems backwards why would that help well there’s actually uh I think a fairly simple explanation for this let’s say you have a query that runs at max.4 and I don’t know let’s say one gig of memory right because all plans in SQL Server start as serial plans uh that’s when a memory Grant gets assigned to them and if SQL Server later chooses a parallel plan that memory Grant is divided amongst your dop threads that are executing in the plan um it it doesn’t even necessarily have to like it could be one of two things right it could be like that four gig memory Grant uh was not enough for uh split into the one gig grants across your four threads that that could have been too small uh to begin with right for any one of them but then also you know not all parallelism is perfect not all parallelism is created equal so if one of your threads gets a lot more rows it still gets that same amount of memory SQL Server doesn’t say oh well uh you know you got 10 million rows and everyone else got one million rows we better give you more of that memory Grant you know let’s let’s be fair here uh so uh you know you could either have like all your threads spilling or just one of your threads sort of getting an unfair number of rows and spilling and if you increase max stop you spread those rows out further so that perhaps the same memory Grant um might uh might alleviate some of those spills uh that would be that would be like the first place I go with it um sometimes increasing dot does sort of get you out of parallel skew problems so that’s where I was going with that but anyway uh that would be my guess um you know all of those threads uh having fewer rows on them so the memory Grant goes a little bit further that would be that would be my sort of thing there anyway that is five questions right two three four five six yeah there we go uh thank you for watching I hope you enjoyed yourselves I hope you learned something and I will see you in tomorrow’s video where we’re going to talk about a little kind of a fun little thing that I did with dynamic SQL recently um and so there’s that anyway thank you for watching

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Moving Indexes To A New Filegroup: Microsoft Still Hates You

Moving Indexes To A New Filegroup: Microsoft Still Hates You


At some point you’re going to want to move some indexes to a new filegroup. Maybe you’re separating data across storage, maybe you’re cleaning up after someone who put everything on PRIMARY and walked away, maybe you’ve got your reasons and they’re none of my business.

Whatever the cause, you’d think this would be a solved problem in a database that’s been around since the Clinton administration.

It is not.

How bad it gets depends on what you’re moving. Let’s go from least painful to most painful, because the pain here is instructive.

Normal Indexes


We’ll define normal as an index that isn’t carrying gobs of LOB data around with it. If that’s what you’ve got, life is easier. Not simple, but easier.

The part that surprises people: you cannot just rebuild the index onto the new filegroup. There is no ALTER INDEX REBUILD WITH (MOVE_THIS_SOMEWHERE_USEFUL = ON). That would be too goddamned easy, and we don’t do easy here (unless it’s a RECOMPILE hint).

What you have to do is fully script the index out. Keys, includes, uniqueness, filters, and any particular settings it was created with.

Then you recreate it on the new filegroup with DROP_EXISTING turned on.

CREATE UNIQUE NONCLUSTERED INDEX 
    whatever
ON dbo.SomeTable
(
    column_one,
    column_two
)
INCLUDE
(
    column_three
)
WHERE column_one > 0
WITH
(
    DROP_EXISTING = ON,
    ONLINE = ON, 
    (and all the other stuff you can or might want do, like PAGE compression)
)
ON [NewFileGroup];

Yes, you can make the new FG the default so that you don’t have to worry much about including it on every script, but who knows?

Maybe you created more than one new FG. You’re weird out there. I know you.

Miss a column in the include list, fumble the filter predicate, forget it was unique, and you’ve now changed the index instead of just moving it.

The work isn’t hard, exactly. Plenty of stored procedures and code examples exist to script out all your indexes.

It’s just tedious and unforgiving, which is its own kind of hard. Like a Cormac McCarthy book.

Heaps


If you’ve got heaps, your life is about to get worse.

What sucks is that I typed that and then realized it sounds like something an LLM would say.

Ah, screw it.

You can’t rebuild a heap onto a new filegroup, because there’s no index to rebuild. The data is just sitting there in a pile.

To move it, you have to put a clustered index on the table, which physically relocates the rows to wherever that clustered index lives.

If you’ve been meaning to fix those heaps anyway, congratulations, you get a small hit of satisfaction here. Build the clustered index, leave it, move on with a slightly better schema than you started with.

But if the table is supposed to be a heap, you’ve now got to script out dropping the clustered index you just created. Which turns it back into a heap on the new filegroup. So the move costs you a create and a drop for something that was never supposed to have an index in the first place.

LOB Data


Now we get to the part where I want someone at Microsoft to do this process.

Once. Just once.

There are products that should be experienced by the people who make them, and this is one of them. I get the sense that it often isn’t.

This applies to clustered tables with LOB columns, and it applies to your heaps with LOB data too, because LOB makes everything worse uniformly. Oh, and if you’ve got nonclustered indexes with LOB data in them, well… you, too.

When you do the create-with-DROP_EXISTING dance to move a table, the in-row data moves. The LOB data does not. It just stays where it was, staring at you, refusing to relocate. You can verify this yourself by checking allocation units before and after and watching the LOB_DATA unit sit exactly where it started.

The fix comes from a Kimberly Tripp post that has saved a lot of people a lot of grief over the years (Understanding LOB data). The trick relies on a quirk of how SQL Server handles partitioning: LOB data physically moves when the object transitions from non-partitioned to partitioned, or from one partition scheme to another. So you make the table partitioned, which forces the LOB data to move, even if you have no actual interest in partitioning anything.

The sequence goes like this:

  1. Create a partition function and a partition scheme.
  2. Apply the scheme to the table by creating the index on it with DROP_EXISTING. That moves the data onto the scheme.
  3. Then create the index AGAIN, this time onto the plain filegroup, with DROP_EXISTING once more, which makes the table non-partitioned again and moves everything, LOB included, on your target filegroup.

You read that correctly. It takes two index creates with DROP_EXISTING to move LOB data. The table briefly becomes partitioned for no reason other than to trick the engine into picking up the LOB allocation unit and carrying it along.

CREATE PARTITION FUNCTION pf_temp_move (bigint)
    AS RANGE RIGHT
    FOR VALUES (9223372036854775807);

CREATE PARTITION SCHEME ps_temp_move
    AS PARTITION pf_temp_move
    ALL TO ([NewFileGroup]);

/* Move onto the scheme. LOB comes with it. */
CREATE UNIQUE CLUSTERED INDEX 
    whatever
ON dbo.SomeTable 
    (some_bigint_column)
WITH
(
    DROP_EXISTING = ON
)
ON ps_temp_move (some_bigint_column);

/* Move back onto a plain filegroup. Table is no longer partitioned. */
CREATE UNIQUE CLUSTERED INDEX 
    whatever
ON dbo.SomeTable 
    (some_bigint_column)
WITH
(
    DROP_EXISTING = ON
)
ON [NewFileGroup];

And here’s the kicker: If the table has nonclustered indexes on it, both of those moves rebuild every one of them. Onto the scheme, then off the scheme.

You are reading that correctly too. Every nonclustered index gets rebuilt twice.

Say, have you been meaning to clean up some indexes for a while?

Picking A Boundary Value


The partition function needs a boundary point. You want a single boundary that sits higher than any value that exists or will plausibly ever exist in your clustering key, so that everything lands in one partition and nothing actually gets split up. You’re not partitioning for real. You just need the engine to think you are.

If you’re clustered on something with a sane data type, this is easy. Use the maximum value for the type:

  • int: use the int max, 2147483647
  • bigint: use the bigint max, 9223372036854775807
  • uniqueidentifier: use FFFFFFFF-FFFF-FFFF-FFFF-FFFFFFFFFFFF, but read the note below before you trust that
  • date, datetime, datetime2: 99991231 is the standard pick

On the GUID one, be careful. SQL Server does not sort uniqueidentifier values by reading the bytes left to right the way you read them on screen. It sorts on the last group of six bytes first, then works backward through the groups. It’s a genuinely strange ordering and it trips people up constantly. The good news is that the all-F’s GUID still sorts highest no matter how you slice it, because every byte is already maxed out, so FFFFFFFF-FFFF-FFFF-FFFF-FFFFFFFFFFFF is a safe upper boundary. Just don’t go assuming the rest of GUID ordering matches what your eyes tell you.

If you came in from a heap, you’re free to pick whatever column has the most reasonable data type to build your temporary clustered index on. You’re dropping it afterward anyway (maybe), so choose the one that makes the boundary value easy.

And Then There Are The String Clusterers


I see you out there. Clustered on a string because at the time it seemed like a simple thing to do and nobody was around to stop you.

Now you’ve got extra thinking ahead of you. You need a boundary value that sorts higher than every string already in the column, which means working out how many bytes the column holds and then building a value out of enough z’s, or whatever sorts highest under your collation, to clear the top of your data. REPLICATE is your friend here, padding a character out to the column’s length so your boundary outranks everything.

/* A boundary higher than any value in a varchar(50) clustering key */
DECLARE @boundary varchar(50) = REPLICATE('z', 50);

And even that depends on your collation deciding that ‘z’ sorts above whatever garbage is actually in there. Mixed case, accented characters, and case sensitivity all get a vote. So you don’t just get to pick a max value off a chart like the rest of us. You get to go think about collation sort order.

Go to hell, all of you.

Your Online Operation Isn’t As Online As You Think


You set ONLINE = ON, you tell yourself you’re being a responsible adult, and you expect the move to glide along without anybody noticing. Then you watch shoving all that LOB data around generate a shitload of tempdb contention, and suddenly your nice online operation is causing blocking anyway, just through a side door.

So your “online” rebuild is online in the narrow sense that it isn’t holding a long schema lock on the table itself, but it’s lighting up tempdb badly enough that everything else fighting for tempdb pages gets to wait in line behind you. The blocking didn’t go away. It just moved somewhere you weren’t looking. Watch your tempdb allocation page contention while this runs, because that’s where the pain shows up, not on the table you’re moving.

ONLINE Is A Suggestion, Not A Promise


It gets better, by which I mean worse. ONLINE = ON only loosely guarantees that your operation won’t block anything. It is not the iron contract people treat it as.

Kendra Little wrote up a great example of an online rebuild that ran offline and took exclusive locks the whole way through, with no warning and no error (Ugly Bug: SQL Server Online Index Rebuild Sometimes Happens Offline Without Warning).

Her repro used ALTER INDEX REBUILD WITH (ONLINE = ON) on a table that had previously had a LOB column dropped, which leaves the table in a state where the engine falls back to an offline operation and holds X locks the whole way through.

Joe Obbish also wrote a post about something similar on my site, here. Joe’s post is about column store indexes, though. That may not apply to you.

WAIT_AT_LOW_PRIORITY, the thing that’s supposed to be your lord and savior from the schema lock at the end, offered her no protection against those locks. This is the same LOB ghost haunting you from a different room. Maybe under your bed. Maybe in your closet. Maybe in your fridge.

The broader point is the one that matters even when you’re not stepping on that specific bug. Your online index build still has to take its locks, minimal as they’re supposed to be. And if something is already in the way when it goes to take them, your online operation gets blocked. Now it’s sitting there waiting, and everything that shows up behind it gets blocked too, because it’s holding its place in the lock queue while it waits for the lock it needs. One stuck online rebuild turns into a blocking chain, and that chain can sit there for a long, long time while you wonder why a “no downtime” operation took your application down. This also makes you look like an asshole for saying that “I can do this fully online and not cause any blocking, boss”.

There’s also some version voodoo waiting for you here. Because moving an index to a new filegroup forces you into CREATE INDEX with DROP_EXISTING rather than ALTER INDEX REBUILD, your access to WAIT_AT_LOW_PRIORITY depends on your version.

For ALTER INDEX, that option has been around since SQL Server 2014. For CREATE INDEX, the WAIT_AT_LOW_PRIORITY syntax only showed up in SQL Server 2022, along with Azure SQL Database and Managed Instance.

If you’re on 2019 or earlier and doing a filegroup move, the one saving grace you’d reach for to manage the Sch-M lock at the switch-in isn’t available to you, even though the people doing plain in-place rebuilds have had it for years.

This is exactly the situation I wrote ProtectSession for. If you’re going to kick off one of these moves on a busy server, set yourself up to watch for and deal with the blocking it can cause, rather than finding out from the people whose queries are piling up behind it.

The Short Version


Normal indexes cost you a careful script and one create. Heaps cost you a create and a drop (maybe). LOB data costs you a partition function, a partition scheme, two creates, and a rebuild of every nonclustered index twice. And if you clustered on a string, you also get to do collation homework before you can even write the boundary value down.

None of this needed to be this way. But here we are, and the data isn’t going to move itself.

An additional thing to consider: SQL Server has many different build strategies for indexes. It may choose to build indexes on giant tables single-threaded. It may even choose to build all 23 nonclustered indexes on a huge table single threaded while you’re partitioning on/partitioning off to move LOB data.

The story gets even more tawdry and sordid if you’re using an Availability Group in Synchronous mode. You might see a lot of really nasty pile ups on HADR_SYNC_COMMIT. You do have the option of switching to manual failover and asynchronous commit for a bit, but that’s between you and your RPO goals. If you’re moving a significant amount of data, it may be a long wait.

Get into sports, dummy, as a wise man once wrote on a bathroom wall.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Learn T-SQL With Erik: Indexed View Maintenance

Learn T-SQL With Erik: Indexed View Maintenance


Chapters

  • 00:00:00 – Introduction
  • 00:02:16 – Index Views Considered Harmful
  • 00:04:03 – Common Misconceptions About Index Views
  • 00:05:33 – Disallowed Features in Index Views
  • 00:08:44 – Performance Comparison Before and After

Full Transcript

Erik Darling here with Darling Data, and today’s video should be a fun one. We’re going to talk about making sure that indexed views don’t ruin your modifications, because they sure can under certain circumstances.

Down in the video description, you will see all sorts of helpful links, including a link to purchase the full course material. Remember, these are just tiny little crumbs of the full course material, which you can go get down in the video description below.

Or if you attend one of my upcoming advanced T-SQL pre-cons, you will also get the full course material for free 99. It’s amazing how that works.

You can also hire me for consulting. You can become a supporting member of the channel. You can buy me half of a New York City cappuccino every month if you’d like. Is it $4?

It’s not an incredible amount of money. And if you would like to ask me office hours questions, you can do that. And of course, if you are not feeling monetarily obligated towards me, which I understand many people just are very happy to just take things for free, you can like, subscribe, and tell a friend.

Fill a hole, fill a void in someone’s life that this channel would obviously, it would just complete them in ways that you can’t imagine. Or maybe you can imagine.

Maybe it just completed you. Maybe it just completed you in those same ways. Over on my GitHub repo, I have a free open source SQL Server performance monitoring tool. And it doesn’t suck.

It is all the stuff that I care about monitoring performance-wise on the SQL Server packaged up, given to you. You can point it at your SQL Servers. You can start getting great information about what’s going wrong with them performance-wise. Excuse me.

Spring is springing here, and now I’m starting to get a little allergic to things, so you’ll have to forgive my throat clearing on that one. There’s also a built-in MCP server.

So if you like the robots and you want to have the robots talk to your performance data and give you summaries or some analysis on them, maybe even give you some feedback on what you should do to fix it, you can do that. And they’ll do it, I mean, depending on how you call the robots, it might be free.

It might not be. I don’t know. All right. Tokens ain’t free, I guess. But anyway, this will be the last video that I record before going to… Jacksonville, Florida, May 1st for an advanced T-SQL pre-con.

You can still buy tickets. At least you better be able to still buy tickets, because it’s not for a couple days after this, so…

Or maybe… Actually, this one’s Thursday. The next day, Friday. Hey, you better hurry up before Jacksonville is gone. After I get back from Jacksonville, I will be on my way to Chicago, Illinois, May 7th and 8th for a pass-on tour east.

I will also be doing an advanced T-SQL pre-con. I will be doing an advanced T-SQL pre-con. I will be doing an advanced T-SQL pre-con there. Mmm. Many chances. After that, I will be at the lovely SQL Day in Poland, May 11th through 13th. And I’m flying right from Chicago to Poland.

It’s gonna be crazy times. Boy, I hope the weather’s nice. And in Poland, I know it might be hard for you to believe, but I will also have an advanced T-SQL pre-con there.

After that, I will be at Data Saturday Croatia with, believe it or not, an advanced T-SQL pre-con. And after that, I will be at PassData Community Society. I will be there. PassData Community Summit, the westest of all the summits in Seattle, Washington.

And there, well, I’m just gonna have to surprise you with what I’ll be doing there. It’s gonna be out of this world. Anyway, it is still April-ing outside.

Next week’s video will… Next week I will be debuting the May graphics. It’s gonna be wonderful and fantastic. You’re gonna be just as terrified of it as I am, I think.

Anyway. T-SQL Server Management Studio. When most people think about index views, they rightfully think about all the stuff they can’t do with them.

And I sympathize with that because, man, so many times they’ve been like, oh, if only you could do this, if only you could do that, it sure would be nice. And I realize that all the air has gone out of the room as far as making index views more powerful because everyone’s like, well, you could just use batch mode.

And that’s true in a lot of cases. But it’s also not true in a lot of cases. um batch mode is not always better than a completely pre-aggregated set of data so getting things like a min and a max in an index view you can’t do it and that sucks right this is lazy um but you know like really having those aggregations maintained somewhere uh can make read queries a lot faster especially when you consider that on standard edition um batch mode is still terribly hobbled everything maxes out at a dopp of two so if you still need like a dop above two like say four or eight or even six all right uh you don’t get it right it’s it just sucks and it’s annoying because you didn’t pay the friendship tax to microsoft but uh index use uh you know they have they have many challenges of their own uh we’re going to talk about those a little bit more i have another file queued up for next week’s stuff that we’ll talk a little bit index use in a slightly different way but um you know you know beside from the things that are disallowed in them uh you may you may still want and need no expand hints uh when you query them to keep uh sql server from expanding them into the underlying queries and ruining all the hard work you did to index that view just like filtered indexes and computed columns you need to have some anti-set options lined up correctly so that you do not experience terrible errors or queries not matching to your to your uh your your rocket science query tuning efforts um if you so you got to get this stuff lined up if you want them to work correctly but um we’re just to make things just like nice and compact here we’ve got this view which is not indexed yet but it is set up to be indexed by having a schema binding thingy here and a count big thingy here and of course we have the correct grouping that we need to do here this view already exists the problem is this view still takes 15 entire seconds to run right we are not having a good time with this view look at that well 14.2 seconds close enough uh you know sql server is like uh merge join that’s a good idea when is a merge joint ever a good idea right make that a hash join make that a parallel hash join and this thing would probably be about five seconds all right let’s say let’s try that let’s see what happens let’s do this and let’s let’s come on i’m clicking on you listen to me why don’t you ever listen to me let’s do a option i don’t know if you can hear the sirens outside but that’s another lovely side effect of spring the weather gets nice and i open my windows and new york’s like screw you here’s some sirens in your youtube video all right so let’s see let’s uh estimated plan what do we get look at that parallel hash join isn’t that a thing of beauty oh my word oh it’s gorgeous it’s wonderful let’s see what happens remember the last one was 15 well 14.2 seconds man wow parallel hash join 2.2 seconds actually let’s let’s go let’s go the tape i’ve been lied to by uh ssms before but yeah okay fine 2.2 seconds elapsed that is beautiful why would sql server pick a serial merge join plan when it could have had a beautiful parallel hash join plan i don’t know sql server sometimes i i i want to migrate to postgres when i see what you do but um when when we’re talking about you know uh trade-offs and query tuning and should i do this or should i do that i mean it’s our job to test these things right it’s our job to make sure that the the changes that we’re affecting have positive effects on the workload as a whole not we if we make one query a little bit faster but we completely ruin a whole bunch of other queries we didn’t do a good job right so we can’t have that but let’s let’s take a look right now uh at what an update to the post table currently looks like all right so i’m gonna do a begin transaction and a rollback and in the middle we’re gonna hit this little helper uh in a table valued function here called what’s up locks if you’re not familiar with this it’s available on my github repo with all my other grand stuff um so you can go get it there if you really want it but if we run this and we look at what happens when we update 100 rows uh we get you know we get a few x locks it’s not that big a deal uh the execution plan uh is pretty simple well for the for the update it’s pretty simple for for what’s up locks it’s clearly a complete disaster it’s well not a disaster but it is kind of a nightmare but uh here we have this thing and this you know we seek into an index and we do our update everything is just fine and dandy and even if we update 28 000 rows right if we hit john skeet and we say well it’s like 27 900 and something like you know we this thing does you know uh sort of lock the entire table but there’s no other competing locks so the lock escalation there not really uh any like all that unexpected for updating that many rows the execution plan does change a bit right but it’s it’s still a it’s still a pretty you know um you know still a pretty efficient plan for updating 27 900 and something rows but it can’t be too angry at that right that’s not not so not so bad but now let’s come back to our view here right and let’s let’s create a unique clustered index on our index view right and well this this takes a second to create and it’s not it’s like you know creating any other index it takes it takes a moment but but wait but while you’re sitting here and you’re sitting there waiting for that index to create you get to do all sorts of other things like run sp who is active maniacally and more manically rather and and like you know sit there and stare at your availability group and you’re blocking and whatever else and wait for it to finish but it’s finished now so that’s great and now with this done right with with this with this view indexed this all happens relatively quickly now we do one tiny little seek into the clustered index view uh sql server even suggests another index on here which we’re not we’re not going to add because we can just pretend 500 milliseconds is fast enough but now the the query plan for our update is going to change a bit right it doesn’t really get meaningfully slower for a couple reasons um i mean it gets a little bit slower but not like terrible and we have a lot more complexity in here now all right so because now we have to maintain the indexed view and so now we seek into the post table and we do all the updating we need to have the post table and then we have this sequence operator and the sequence operator says after you happen you happen i’m going to sequence you right and now down here we have to maintain the indexed view or rather the clustered index on the view which requires touching both tables in our case though i have added good indexes to support my index view so this is not a complete disaster right sometimes you do need good indexes in place to support the query underneath your index view to make reassembling the indexed view faster right it’s it’s a crazy world like like like when people say it’s like turtles all the way down it’s indexes all the way down it’s just oh i need an index to tune this and now i need an index to tune this and i’m going to make an index view but now i need indexes to make updating maintain maintaining my index view faster it does take some effort right and it does take some testing and stuff and uh updating 28 000 rows you know again this is half a second last time it it this does slow down a bit right it’s two points per second right again this does slow down a bit right it’s 2.6 seconds now so this is not the perfect world but now you can’t have everything all the time but uh you know we and we have a much more still a like this this plan was already a little bit more complex on top all right with all the sorting and splitting and filtering and whatnot but now maintaining the index view is still well it’s still like the bulk of our effort right that take that’s like you know you know there’s like 600 ah screw you there’s like 600 milliseconds here but two two two full seconds down here in the index view maintenance phase um you know most of it is not um assembling the index view right because we’re only at 325 milliseconds here that’s like 1.7 seconds total just updating the the values in the index view perhaps if we put that nonclustered index on the index view it would be it would be faster but you know let’s not let’s not get ahead of ourselves so if you want index views to work well for you uh you need to consider read queries and modification queries in your workload uh like with anything else modification query wise the more rows you get involved the longer something’s going to take a lot of the times um you know when i’m looking at trying to tune modification queries uh there’s like almost nothing from the read portion to tune everything you need to tune is in the right portion which is when things like batching become so much more valuable and useful and interesting to to get in because uh it turns out updating smaller chunks of work is going to take a lot longer than you think it’s going to take a lot longer than you think it’s going to take a lot longer than you think it’s going to take a lot longer than you think it’s to time well you may take the same amount of time to iterate over the table and update all 100 million rows or something uh you’re much kinder to your server in the process and things get a lot less like nuclear meltdowny when you’re doing that anyway thank you for watching uh yes this if you run a creator alter on an index view it drops all the indexes um but thank you for watching i hope you enjoyed yourselves i hope you learned something i hope that you will use indexes and index views responsibly in your SQL Server.

Thank you for watching. I hope you learned something. I think I may have already said that. Anyway, I’m good now. Goodbye.

Thank you. You are very kind people in the world. All right. Adios.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Learn T-SQL With Erik: Computed Columns

Learn T-SQL With Erik: Computed Columns


Chapters

Full Transcript

Erik Darling here, Darling Data. Feeling my Darling Data dandiest today. Beautiful day outside, I think. I haven’t been outside yet, but I hear, and I’m lying, I walked back and forth to the gym. It was pretty nice. I can’t complain. In today’s video, I’m going to show you a little snippet, a brief, a little whiff of some of the content that I cover in my course Learn T-SQL with Erik. We’re going to talk about computed columns today, and I’m going to talk about, well, we’re going to do two things.

One, we’re going to see if a bug still exists in SQL Server, and then I’m going to show you some stuff around determinism with computed columns, because that struck me as interesting when I was writing it. I was one of those, like, I like this. You’re cute. Anyway, done with the video.

In the video description, you will find all sorts of helpful links where you can hire me for consulting, as that’s my job. You can buy my training, including the Learn T-SQL with Erik course. There is a video. Would you believe, I’m just so well thought out, I put a link in the video for the course that I’m selling.

It’s crazy how on top of things I am. You can become a supporting member of the channel for as little, few, minimally as $4 a month. You can say, hey, Erik, thanks for creating all this video content.

For free, you idiot. You’re feeding the robots. Aside from that, you can also ask me office hours questions. And, of course, if you’re feeling just really kind and generous, but perhaps not monetarily, perhaps you are pinching pennies.

You’re saving up for that big house in the hills or something. You can do stuff for free, like subscribe and tell a friend about my channel so that people keep becoming subscribers. And the little number gets bigger.

And that’s what I like to see. If you are out there in the world and you’re like, gosh, I sure would like some high-quality SQL Server performance monitoring that’s absolutely free. Boy, howdy, have you come to the right place.

Let me tell you, I’ve got that. It’s on GitHub. It is totally free. It is totally open source. There are absolutely no requirements on your part as far as giving me information goes. It is a bunch of really good T-SQL collectors.

Run against your server. Get important information. And distribute them into appropriate beautiful charts, graphs, and other. There’s a heat map in there.

There’s all sorts of good stuff. And if you are maybe not a fan of feeding the robots, but at least enslaving the robots, making them do work for you, there are built-in optional MCP servers so that you can have the robots do analysis on your robots. You can have them do the performance data for you and write up a bunch of stuff about it.

Again, they’re pretty good at looking at the data and telling you stuff that happened, but I’m still a little wary of the advice on how to fix some of these things. I can’t fix that part, but I can do the other stuff. Anyway, that’s good enough there.

Again, these are the last chance to buy tickets for my pre-con in Jacksonville. Day of data. That’s the one in Florida. I know that there are several Jacksonvilles.

At least I would imagine there are several Jacksonvilles. Jackson is a fairly popular last name here in America. So there’s probably many Jacksonvilles. But this one is in Florida.

And I’ll be there May 1st and 2nd. Or at least May 1st. I don’t know. The 2nd, we’ll see. We’ll see what happens. I get really tired sometimes. But I will be there May 1st with the advanced T-SQL pre-con.

So buy those tickets now. Because that’s… I may never come back to Jacksonville again. I don’t know anything about it.

After that, I will be in Chicago, Illinois. May 7th and 8th for Pass On Tour. After that, I will be at SQL Day Poland. May 11th through 13th.

I will be teaching advanced T-SQL pre-cons at both of those. Shockingly, I will then be at Data Saturday Croatia. June 12th and 13th. Where I will also be presenting an advanced T-SQL pre-con.

You might sense a pattern forming here. I really enjoy teaching about advanced T-SQL. It’s kind of fun.

And then, I don’t know. Barring extreme world events. I will be at Pass Data Community Summit in Seattle, Washington. November 9th through 11th.

So you can come… You can go to there. And we can hug in the rain. And I don’t know. There’s that bridge that everyone puts gum on. It’s gross. Yeah.

I don’t know. It’s… Yeah. Anyway. It is the end of April. We have… But… This today and tomorrow’s video, really. And before I debut my image for May.

And I’m going to tell you. It’s a weird one. All right. It’s going to get weird. But for now, let’s go into SQL Server Management.

See, I got a little lost there. I was like, wait a minute. What am I doing? Oh, yeah. Computed columns. Computed columns.

I enjoy them very much. They have very good… They do very good things for very bad queries, I find. People often overlook them.

And even more often, they misunderstand them. Just like every other feature, there are tradeoffs and a bunch of limitations and, you know, some general difficulties that are rather annoying.

One of the biggest things that I get annoyed with when it comes to computed columns is not being able to team them with filtered indexes in the way that I want, where you can’t have a computed column in the filter part of a filtered index.

You can put it in the key or the includes, but you can’t put it in the filter definition, which, you know, I mean, to an extent, I understand. Because that index would just constantly be morphing in strange ways, depending on when you update things.

But, you know, a fella can dream. Maybe Microsoft can have Copilot fix that, right? Where’s all the code? I don’t know. Where’s all that AI development, right?

I don’t know. Anyway. You can run into some other problems. Those are the problems we’re going to discuss today around determinism, often requiring convert to be used with a style specified or a culture specified, depending on how you like to refer to that optional third argument to the convert function.

And, of course, one of the biggest misunderstandings is people seem to think that if you don’t persist a computed column, you can’t do anything useful with it. And that is just a bunch of hogwash.

Even a non-persisted computed column can generate statistics, can be indexed, and is often a path of least resistance to getting that column added, at least in the near term. There are some upsides to persisting computed columns, but we’re not talking about those today.

I talk about those in the full course content. So, you know, you’ll just have to buy the course and do that. Or you can come to one of my pre-cons and get the course for free. I’ll put the link in the description. It’s crazy how that works out, isn’t it? But much like any other column in a database, computed columns do generally tend to become much more useful when you index them, because that’s when SQL Server can, like, have that data in a useful order for seeking and searching and all that other stuff that it does, right?

Even putting things in order has amazing benefits in a database. Who would have guessed? But probably the most obvious use case for computed columns is generally to make some expression sargable, or provide better cardinality estimates for search predicates that would normally cause problems for us.

You can think about one of the most common ones. It’s like, you know, is null column zero or empty string or something like that. And you’re like, I can’t change the code. Crap.

But I can add a computed column to this table, and I can index it. And then SQL Server can say, oh, look, there’s a computed column on the expression is null some column something else.

And I have an index on it. I can use that index. Isn’t it wonderful? So that’s neat and life-granting sometimes. Anyway, just like with filtered indexes, there are some ANSI set options.

There are some rules you must follow in order to not hit errors when things are when queries attempt to modify the table that has computed column or a filtered index applied to it. If you don’t do this, you’ll be unhappy.

Things will start going bad for you. But non-persistent computed columns, of course, the value is calculated on the fly when the column is accessed.

There is almost no locking or overhead when you add them. The values are not stored on disk unless or until you index them. The expression may be well, you know, the expression not maybe the expression does get expanded into the query during compilation and expansion will happen even if the column is indexed.

So if you want to find out what all that means, again, that I cover that in the full course material. Persistent computed columns are a bit different. The value does get stored on disk just like regular columns or irregular column or irregular columns.

The space usage is equal to the data type size. It is updated when the base columns that make up the expression change, assuming that you have assuming that the computed column takes other columns into consideration in its expression because I guess I suppose you could you could not do that at all.

And the expression is still there. It is still expanded unless you use trace flag 176. So if you want to see that in action, again, you can buy the full course material. It’s not not rocket science, not like trace flag 176.

So the first thing we’re going to do is we’re going to see if this bug still exists in SQL Server because I do enjoy showing people this bug. So we have a table here called Ono and Ono has a persisted computed column in it that is just ID times two.

And if we try to create a filtered index on that table, we will get an error and that error will say filtered index C cannot be created on table dbo.ono because the column CC and the filter expression is a computed column.

Rewrite the filter expression so that it does not include this column. That is a very straightforward, very good error message. But this is the part.

This is the fun testing part. Did Microsoft fix this bug yet? If we create a table like this with that computed column and with a filtered index in line in the table definition, can we still do this or will this throw an error?

It still works. Microsoft has still not fixed this bug. I’ve been telling about this for years and they just refuse to do anything about it. And what’s funny is now if you try to query that table, you just get a bunch of indexes, a bunch of errors, not indexes.

You have an index. You have an index that causes the error. And look at this one. Cannot. Oh, there’s more. Sorry about that. Look at that.

It keeps going. It just keeps red texting me. Cannot retrieve table data for the query operation because the table ono schema is being altered too frequently because the table dbo.ono contains a filtered index or filtered statistics. Changes to the table schema require a refresh of all table data.

Retry the query operation. And keep in mind, there are no rows in this table. I didn’t put any rows in here. All I did was try to query.

Even if we just try to get a count from this table, SQL Server will say, invalid column name, cc. You’re like, what? What are you talking about? There’s a column? Cc?

What? You’re crazy, SQL Server. What’s going on with you? But anyway, back to computed columns. Again, probably the most accessible use case for them is to like when you see a query that’s just like function column or column plus column or something like that.

You can read the green text up there. When you see that stuff in a query, you’re like, man, that sucks. And especially like the column plus column stuff or the column plus value or something like that.

You’re like, it’s like substring, left, right, replace, upper, lower, things like that. You’re like, man, that’s annoying. How am I going to index for that?

You can’t. Unless you compute that expression and then you index it. But when it comes to creating computed columns, especially ones with dates involved. Then you have to be very careful and you have to make sure that you create them in a way that is deterministic.

So I’m going to teach you about that today. So we have this index in place on the post table on creation date and last activity date. And we have query plans turned on because we are smart professional presenters.

We don’t screw these things up. We’re not amateurs anymore. We’ve graduated. We got our presenter six sigma delta phi black belt thing. And the first thing we’re going to do.

Is alter the table posts. And we’re going to add this computed column. The computed column is going to say, hey, if last edit date is null, replace it with 1900 0101. And then we’re going to look and see if this column is indexable.

And we can use this kind of neat. Let me actually do this zoomie thing. And once again, thank you, Aaron Stellato and company for making SSMS zoomie wheel things for the results. So I don’t have to do zoom it on all this stuff.

And we can use this. Column property function. And we can feed it the table name and the column name. And we can ask this question of the function is indexable. Well, are you?

And of course, this comes back with a zero because it is not indexable as written. So this is this is not going to work out for us. We do not. We cannot use this column because it is not indexable. And we want our column to be indexable so that it gets all of the attendant superpowers that an index column gets in SQL Server.

Of course, cast. Will not help us here. Cast is the crappiest function in all of SQL Server.

Convert is way better. And this is still not indexable. So we bid you adieu cast. You stink.

And I hate people who use you. Even past past the past versions of myself that have used that used cast. I hate them too. So now we’re going to try this with convert. But notice we only it’s convert with only two arguments.

I mentioned earlier that there is a third. Convert argument that is off that is required for this that will make things work for us. So even just using convert alone does not solve this problem.

It is still not indexable. And if we try to create an index SQL Server will say no. We cannot do that because it is non deterministic. Damn the gods.

All right. So let’s try this again. But let’s do this the right way. So now we are going to say. We’re going if this is null. If last edit date is null.

Please replace it with this string converted to a date time using this culture one one two. And I think I already did this. So this is going to throw an error probably.

Oh no I didn’t do it. Now it’ll throw an error. There we go. But now if we say are you indexable or is indexable. We come back with the one. So the big the big win here was not only using the convert function.

But also feeding it this. This this this culture or style third argument. So now that it is indexable we can create this index. And and all of all of a sudden things things get better for all our queries ever.

That were you saying where is no last edit date greater than nineteen hundred or one or one or something. All right. So whatever you were doing before that stunk and made life terrible.

Now SQL Server can have an easy time with that. Right. Now again one thing that SQL Server is relatively good at. Is expression matching.

Right. So if we say something like this. Right now we can. Now we’re going to actually have a query plan. I don’t know. Did I turn those off. I don’t know what I did.

I lost track. But now we say hey SQL Server. Can you use that. Can you use that index and SQL Server says well yes I can. I can seek directly into that index. Thank you for asking. Is indexable.

Good stuff. We are indexable. But we do not want to be slack with our data types. Right. We do not want. If we if we write queries like this. We deserve all the hardship trials and tribulations that arise in our lives.

We should do everything we can to make sure that when we are converting strings to dates times date times date time twos or whatever other temporal element we care about in here that not only do we use convert but we use convert with an appropriate style so that in a non ambiguous date format so that we do not hit any weird ambiguity issues. What. What we are doing.

What writing the query in this way does is remove ambiguity. Right. And it makes it indexable. We cannot index and ambiguity can we. And now this query will as well pay dividends and remove as we remove slackness and we will seek into the index that we created on that column.

So with that out of the way. Thank you for watching. I hope you enjoyed yourselves.

I hope you learned something. And I will see you in tomorrow’s video. Bye. Where we will talk about something else. Probably T SQL related cuz that’s that’s what we’re that’s what we’re pushing out there lately isn’t T SQL all right. Thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

SQL Server Performance Office Hours Episode 62

SQL Server Performance Office Hours Episode 62



To ask your questions, head over here.

Chapters

Full Transcript

Oh, hello, Erik Darling here with Darling Data, your friendly host for this week’s episode of Office Hours, in which I answer five questions that you, the greater SQL Server community, and what community is greater than the SQL Server community, have asked me since the last time, so we’ve got that going for us, don’t we? Down in the video description, you will find all sorts of helpful things, you will find ways to interact with me emotionally, intellectually, but most importantly, financially. You can hire me for consulting, you can buy my training, you can become a supporting member of this very channel, and of course, you can also find links to continue to ask me Office Hours questions, and if you like this content, if you feel like this content would make a meaningful addition to someone’s life, you can like, subscribe, tell a friend, help the channel grow in other profoundly meaningful ways that are not an emotional, intellectual, or financial connection with me.

I mean, you could perhaps imagine your own set of those connections through watching these videos, but one generally will not exist. I will not acknowledge it publicly. If you need SQL Server monitoring that doesn’t suck, for free, open source, no email sign-up, no phoning home, no weird telemetry data, just all the T-SQL collectors that I would care to run on a server to monitor its performance, cool stuff, weight stats, blocking, deadlocks, top queries, and all of this goes into a very pleasantly colored dashboard for your consumption.

And if you are a fan of our new robot friends that have sprung up from vast data centers around the world, there are optional built-in MCP server tools so that you can just ask questions of your performance data and not have to worry about the robots running ramshackle or repshow, whatever that word is. I forget that word. Sometimes they suck.

Sometimes they just jump out of my brain while I’m talking. It happens. The robots trampling all over your SQL Server, you can do that. It’s pretty okay. This will probably be your last chance to buy tickets for my pre-con down in Jacksonville.

That will be on May 1st. When this gets published, it should be April 28th, so you should hurry up and buy your tickets if you care to see me talk about advanced T-SQL live and in person. All attendees of the pre-con will get access to the Learn T-SQL with Eric material.

That’s a whole lot of hours of recorded material, and it’s way more than gets covered just in the one-day class. Other places in the world will be coming up very soon. Wow, it’s just creeping right up on you.

May 7th and 8th, I will be in Chicago for Pass on Tour Summit East in the Midwest. Not the actual East. The actual East would be like New York.

Or Boston or something like that. We’re in Chicago, though, so it’s the East of the Midwest or something like that. I’ll also be at T-SQL Day in Poland, May 11th and 13th. I have advanced T-SQL pre-cons at both of those events, so you should get your butts in seats for those.

Because who knows how much longer we’ll have to talk about advanced T-SQL for. You never can tell. After that, I will be at Data Saturday Croatia, June 12th and 13th.

As well as Pass Data Community Summit taking place in Seattle, Washington, November 9th through 11th. I don’t think that full tickets are on sale for that one yet. But as soon as they are, you should line right up and go to Seattle.

Just go live there. But with that out of the way, it is still April, and we are still baseballing. The Mets and the Red Sox are not off to a very good start.

They are having some trouble. But that’s okay. Because we still get to watch and sort of enjoy baseball. And, you know, win, lose, or draw, you still drink the same amount.

So that’s the kind of wonderful thing about sportsaholism. You can be a sportsaholic and everything is just all the same. Anyway, I believe we need to go over to this Excel file.

And we need to answer some questions here. Hey, Eric. Hey, you.

How are you doing? I send this blurry-eyed after a painful evening in the SQL mines. Well, that sounds terrible. Are you a dwarf? How did you end up in the SQL mines? Without index tuning.

Okay. Do you know any ways to encourage an update to a table that participates in an indexed view to only seek the members of the view? Batching works.

But if I get the batching even slightly wrong. Then SQL will scan 70 million rows rather than seek for like 500. I would probably try doing a little. So I’ve covered this technique in a few different videos.

I can’t precisely recall the titles of them. But it might be like around like query transformations and stuff like that. But what I would probably do is.

You hit the index view with a sub query and a no expand hint. So that when you are updating the table. It is aware of the indexed view.

And it will hit only members of that. That would probably be the way that I would encourage it the most directly. But, you know, this is one of those things where there are many ways that you could have asked this question.

That would have involved sharing a little bit more information. So that I could give you a more intelligent answer. Because perhaps there are many questions that I have as a follow up for this.

But, you know, you could have pasted the plan. You could have gone to plans.erikdarling.com. And you could have shared the plan via that.

There are many ways with which you could have presented me with sufficient information to give you a better answer. But I would probably like combine. So like as far as getting the batching right.

Like I would probably want to do like where the, you know, the key. So like whatever ends up being the unique clustered index in the index view. I would say like, you know, update table select top.

You know, however many batch rows you want to do. From the index view with a no expand hint. So that SQL Server only.

The SQL Server is better able to understand what your goal is. That would be the way that I would first attempt it. But it’s a little hard to give you much more there without seeing some plans or some T-SQL.

Do you and Brent Ozar argue about logical reads being important, not important for query tuning? Nope. I’ve never had an argument with him about that.

I just disagree completely. There’s nothing to argue about. I’m familiar with cross apply and select top one. Good for you.

And I’m familiar with row number filtered with RN equals one. Well, that all depends on what you alias row number is, isn’t it? Alias it is. All sorts of things, right?

In your optimizer rules, I wish SQL Server had video. You show off cross apply into row number that you later filter to row number equals one. How do you decide when to prefer cross apply into row number?

Well, I look at the query plan. If I am unhappy with the performance of top one and I am unhappy with the performance of row number, I may sometimes combine them. And then you often get the best of both worlds where you get a top above the row number or whatever is generating the row number.

It could be a segment in the sequence projected row mode. It could be a window aggregate in batch mode. It all depends on the query plan and which one runs the best.

That’s why we have these alternatives to explore. It should be fairly obvious right now that I care most about what makes the query run the fastest. So if one of those seems a little too slow for me, then I’ll try the different variations there.

There’s not really a rule that I have beyond that. Which weight stats do people freak out about? That usually don’t matter much.

You know, honestly, it’s kind of like you have two camps. You have people who have been like mentally conditioned to freak out over like, you know, like CX weights. And then you have people who have no idea at all what any weight stat really means or when to freak out about it.

So, I mean, probably not. I mean, probably this, I think the CX weights are probably the most obvious answer there. People don’t seem to freak out too much about any other weights.

You know, I can’t really think of anything that, you know, like don’t matter much. I mean, any weight can matter in sufficient quantity. But usually it’s the parallelism related weights that people are like, there’s too much of them.

I’m like, okay, well, compared to what? I don’t know. It’s always fun to sort of get into their heads and start asking like, well, why do you think there’s so many of them?

Could it be because, you know, your queries suck and you have no good indexes? Things like that. Like perhaps you have not changed your parallelism settings at all.

Who knows? There’s all sorts of funny things that can happen on your way to, in a friendly, reasonable rated consultant fashion, get to the bottom of these mysteries with people.

Why do you feel the weight stats are too much? Are the weight stats in the room with us right now? There are many ways you can go with that. But usually the CX weights are the ones that people lean on the hardest as being a problem.

You know, there’s a lot of dumb blog content out there about them. You know, a lot of the scripts that people use to measure weight stats kind of only give you like, what percentage of weights are the various weights?

They don’t really give you context. Like how long has the server been up? How many hours of those weights have accrued compared to how long the server has been up? Things like that.

So there are many things that, you know, just, you know, matter a lot contextually that people don’t take into account. Anyway, I think that’s probably good enough here. Thank you for watching.

I hope you enjoyed yourselves. I hope you learned something. And I’ll see you in tomorrow’s video. I have not quite yet decided what I’m going to talk about. But when I do, boy, are we ever going to talk about it. All right.

Thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Learn T-SQL With Erik: Filtered Indexes

Learn T-SQL With Erik: Filtered Indexes


Chapters

Full Transcript

Erik Darling here with Darling Data and in today’s video we are going to continue our T-SQL learning adventure where I show you little snippets, tidbits, whimsical little chunks of the full Learn T-SQL with Erik course material. The full course is of course available at my site training.erikdarling.com and there’s a link down in the video description for you, you special son of a gun you, to get $100 off the price of the course. Also down in the video description, all sorts of other helpful links for you to engage in consensual monetary exchanges with me.

You can hire me for consulting. You can buy Learn T-SQL with Erik or any of my other courses. You can become a subscribing, what do you call it, member of the channel.

And then if you want, now taking place every Tuesday, you can ask me office hours questions for free. And of course if you like this content, but perhaps not enough to give me any money for doing any of these songs and dances for you. I understand, you know, can’t all be daddy, rich daddy warbuckses.

You can always like, subscribe. And tell a friend. That is a fantastic way to make this channel grow and expand to something bigger and better and more beautiful than it was before.

If you are a real cheapskate, and I’m just kidding, I understand. Frugality is an important thing. I think Benjamin Franklin said some stuff about that.

I don’t know what any of it is though, because I dropped out of high school. But what I did do pretty well at was building a free SQL. Server monitoring tool.

All of the stuff that I care about monitoring for performance in a SQL Server. Totally free, totally open source. No weird email, phone home, telemetry, any of that stuff. You can just download it.

Run an executable. And start getting great performance feedback about your SQL servers. And then if you want to talk to, or have your robot friends talk to the performance data. And just the performance data.

And not go in there and start dropping tables and indexes. And writing wacky queries. And figuring their life out on your time. I have optional, opt-in, built-in MCP servers. So that you can start having your robots at least tell you what’s going on in the performance data.

I don’t know if I take their advice for fixing things. Maybe, maybe not. Depends, but you know.

What can I tell you? You’re not going to pay me to do it if that’s all you care about. Getting out into the world. Boy, I love going places. Doing things.

Seeing people. You know, amongst my adoring fans. Hopefully not getting shot. I’ll be in Jacksonville, Florida. May 1st and 2nd for Day of Data Jacksonville.

I’ve got an advanced T-SQL pre-con there. As you might be able to tell by the URL. So buy those tickets. Because they’re flying out the door fast.

And I would hate to see you lose your seat. Lose your seat to an alligator or a snake or a bottle of blue Gatorade or something. So, you know. Don’t, don’t, don’t wait.

Benjamin Franklin probably said something about waiting too. Hesitating or something. I don’t know.

Again, I’m unclear on a lot of things, historical. But then, after that. I will be in Chicago, Illinois for Pass On Tour. Again, advanced T-SQL pre-con. Buy your tickets.

I will be at SQL Day in Poland. 11th through 13th. Again, advanced T-SQL pre-con. Buy your tickets. And then, I don’t know. I come home. I take a shower.

I take a little nap. I do a whole crap load of work and don’t sleep for a month. And then I am back in Croatia for Data Saturday Croatia. Wow, a lot of Data Saturdays out there.

I’m still unclear on the difference between SQL Saturday and Data Saturday. But people, people like to fight about that. I don’t get involved with those things though. Just like Benjamin Franklin said not to do. So, I will be there.

Again, advanced T-SQL pre-con. Buy your tickets. And then I will be in Past Data Summit Seattle. Past Data Seattle Summit. In Seattle, Washington.

Not the other Seattle. In Zurbistan. November 9th through 11th. So, you know. That will be nice too.

I’m going to stay in a hotel far away from everyone else. But it is still April. What can I say? I’m getting ahead of myself.

All that talk about May and November and June and whatnot. We are still Apriling here. So, let’s April ourselves. And let’s talk a little bit about filtered indexes.

Now, you just can’t talk about indexing in SQL Server really without talking about filtered indexes. They are a very, very important thing. Conceptually, they are just not that hard to figure out.

It’s an index with a where clause. It only indexes some of the data. It qualifies for the where clause. I don’t know. Like the benefits of that just seem rather apparent to me.

But filtered indexes, of course, function best when they meaningfully reduce the total number of index rows compared to unfiltered indexes. Or when they function to improve statistical information about a segment of data that otherwise might get lost in the shuffle a little bit during the normal course of building a 200-step histogram. Filtered statistics can, of course, do the same thing without the extra index to maintain.

So, those are always, always on the table. But, you know, also having a smaller data source to read from is typically considered an advantage to most people. Especially people who are unnaturally obsessed with logical reads.

Curse you all. There are some rather well-known oddities and limitations with them. And perhaps some less known ones as well.

But, you know, like stuff you can’t do in filtered indexes. You can’t use an OR clause or any other sort of disjunction. You can’t use NOT IN. They can’t be used as a candidate foreign key index.

You can’t create them on index views, which I always found stupid. They can’t reference other tables, which I guess makes sense, right? How would that even look?

That syntax would be weird. You can’t filter on a computed column. Now, you can have a computed column in, like, the key of the index or the includes of a filtered index. But you cannot have it in the filter definition of the filtered index.

Which, again, always found that rather annoying. But, I guess, you know, whatever. They don’t play well with local variables and or formal parameters. And, by extension, they don’t play well with the forced parameterization database setting enabled.

That’s this one right here. You can’t use the IGNORE DUKE KEY creation option with them. And Merge has several issues with filter indexes that Microsoft insists are by design.

And not just out of pure laziness. You also, if you are going to use filtered indexes and you don’t want your data modifications to fail. You need to pay very careful attention to the ANSI set options that your client uses while connecting to SQL.

SSMS just happens to satisfy all these. But many ORMs and other client drivers do not subscribe to this set of core beliefs. So, you might just have a bad time there.

But, anyway, let’s create this index. Oh, I’m not even connected to SQL Server. Isn’t that wild? It gets dusty here sometimes.

And you would think that a filtered index like this would be very, very easy for SQL Server to use. But, alas, if we run this query. And we say, SQL Server, give me a count of comments where the user ID is null.

SQL Server says, you know what? Well, use a different index. Not our filtered index.

That’s not confusing. Not confusing at all. But, hey, anyway. The joys of overlapping demos. You can see that this index up here is called post ID filtered. And this is not.

So, it did not use our filtered index. It said, I got bigger ideas. But if you force SQL Server to use the filtered index, you start to get a sense of why it may have not used that index. Because SQL Server is like, oh, well, you know what?

I’m just not sure. I’m just not really sure what’s in there. So, we scan the entire filtered index. Right?

And then we do a key lookup. And why on earth would we do a key lookup here? Why on earth would we be doing this? Well, SQL Server applies this predicate in the key lookup. And it says, well, I got to go back to the clustered index and figure out if user ID is null.

Even though I feel like we were fairly explicit in our filtered index definition. This will only be indexed where user ID is null. But alas, alas, we do not have that.

So, another kind of funny thing you can do is run this query. Right? And, oh, well, we use our filtered index now because the cardinality was low enough.

Right? So, now SQL Server naturally was like, oh, I’ll do that. But it still has to go back to the clustered index and figure out if user ID is null.

So, the only stuff that could possibly be in here is where that is null. I don’t have a great explanation for that one. However, if we change our index definition.

And this is one thing that is in every single index analysis script that I’ve worked on. I’m not claiming credit for saying that I put it in there. Especially with SP Blitz Index.

But SP Index Cleanup also checks around this stuff. But… If you have a filtered index and the filter definition columns are not at least in the includes, they should yell at you about that. So, if we change our index and we say, you know what, we’re going to include user ID in there.

And all of a sudden SQL Server is like, oh, yeah, well, by the way, now that I can figure that all out in one index, I don’t really have to worry about it. Now, there’s fairly well documented stuff that I said before about filtered indexes. It’s not playing well with local variables or parameters or anything like that.

And there’s all sorts of ways to get around that, right? So, like, we create this index on the users table where reputation is greater than or equal to 100,000. And we run this query and we say, SQL Server, go use that index.

And SQL Server, well, it doesn’t use that index, right? Because it uses an index called reputation. But it didn’t use an index called reputation filtered.

So, that’s kind of a bummer. But if we throw option recompile on there, then SQL Server all of a sudden is like, oh, well, it’s not, look at that. It still didn’t use our index.

Well, you know what? Again, the joys of overlapping demos. There is an exception to this, though. And that is if you are assigning a variable, a value to a variable like this, even with an option recompile hint, SQL Server can’t use the filtered index.

And the reason why it can’t use it is because assigning a value to a variable like this disables the parameter embedding optimization. So, be careful with that. If you find yourself with a filter in this situation, this very situation, at home or at work or wherever you run your queries from, what you can do is stick that result into a temp table and then assign the variable from the temp table if using the filtered index.

The filtered index is important to you because the recompile hint won’t fix that. Another neat thing you can do if you want to play all sorts of funny tricks on SQL Server and you want to use a local variable or a parameter is you can hard code some stuff. So, remember our filtered index up here is on reputation being greater than or equal to 100,000.

You separate those zeros, it’s 100,000. And you can always write your query in a very funny way with some literal values mixed in. So, this first query.

It’s going to say where reputation is greater than or equal to reputation. And the reputation local variable is less than this hard coded 100,000 value. And then you can also say another union all because you need to be able to find both outcomes of this, right?

And you can say where reputation is greater than or equal to reputation. And reputation is greater than or equal to 100,000. And reputation, the local variable, is greater than or equal to 100,000.

So, you can get a very interesting query plan out of this with all sorts of startup predicates, right? These startup filters right here. Normally, I would see filters in a query plan and say, curse your eyes.

I hate you. But these ones are kind of interesting because they have startup expression predicates on them. Meaning that rows don’t pass any of this.

Rows don’t go here unless they pass this filter, right? So, you can see there’s zero action going on in this one. That is a zero and that is a zero. But down here, right?

This is where we’re saying, ah, is it greater than or equal to 100,000? And SQL Server says, yes. And it goes and gets one row from our filtered index down here. So, that is kind of a neat way to work around some of the limitations with filtered indexes.

I do apologize for the weird demo thing up there. Again, overlapping demos. I guess I neglected to drop indexes before starting off on this adventure.

But I promise. In the actual course material, I was way more prepared. So, I’ve got that going for me. Anyway, thank you for watching.

I hope you enjoyed yourselves. I hope you learned something. And I will see you not tomorrow, probably. But I will see you on Tuesday for office hours. All right.

I’ve got a four-day weekend. All right. Go me. Everything is coming up, Eric, for once. All right. Thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.