Before I go on about how adding an index can fix your blocking and deadlocking problems, let me warn you about how indexes can cause you problems.
Without fail, whenever I suggest adding indexes to clients, someone gets real worried about write speeds.
Like, the server can be burning down, spurting blood everywhere, and cursing our mothers in ancient Aramaic, but no, let’s stop and worry about write speeds going up by a few milliseconds.
You have to add quite a few indexes (20), and be modifying quite a few rows (millions) before the number of indexes really starts to hurt “write speeds”.
I haven’t seen a server whose biggest problem was write speeds (outside of Managed Instance and SQLDB) since spinning disks were measured in RPMs. The real problem I see many servers face from having “too many” indexes is increased locking.
The more indexes you add, the more you have to lock when modifications happen, even when you’re changing the same number of rows as you were before. You also increase your changes for lock escalation attempts.
Having a handsome young professional with reasonable rates (me) review your indexing is a good idea.
But you can end up with locking and deadlocking problems when you err in the opposite direction of “too many” indexes, especially if your modification queries don’t have good supporting indexes to help them find the data they wanna change.
It Started With A Scan
I never spent a summer at camp Scans-Are-Bad, but scans can tell us something important about modification queries.
In a very general sense, if the operators in your modification queries are acquiring data from the tables they need to modify via a scan, they’ll start by locking pages. If they start by seeking to rows in the table they need to modify, they’ll start by locking rows.
UPDATE
p
SET
p.Score += 1000
FROM dbo.Posts AS p
JOIN dbo.Users AS u
ON u.Id = p.OwnerUserId
WHERE u.Reputation >= 800000
AND p.Score < 1000;
And this is the query plan for it:
we care a lot
We would care very much about the Posts table being scanned to acquire data, because the storage engine doesn’t have a very granular way to identify rows it’s going to modify.
We would care so very much, that we might add an index like this:
CREATE INDEX
p
ON dbo.Posts
(OwnerUserId, Score)
WITH
(MAXDOP = 8, SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);
To get an execution plan that looks like this:
line up
Now, this isn’t awesome, because we’re updating the Score column, and we need the Score column in our index to find the rows we care about, but if the query goes from taking 5 seconds to taking a couple hundred milliseconds, we’re going in the right direction.
There is reasonably sound advice to avoid indexing columns that change a lot, but if you need to find rows to modify in columns that change a lot, you may need to follow the also quite reasonably sound advice to make your queries faster so there’s less chance for them to become entangled.
The longer modification queries run for, the longer they have to interfere with other queries.
It Started With A Tran(saction)
I often see transactions (ab)used by developers. Here’s a lazy example:
BEGIN TRAN;
UPDATE
b
SET
b.UserId = 138
FROM dbo.Badges AS b
WHERE b.Date >= '2010-12-25'
AND b.Date < '2010-12-26';
/*Go do some other stuff for a while, I hear ACID is good.*/
COMMIT; /*No error handling or anything, just screw it all.*/
The query plan for this update will look simple, harmless, never did nothin’ to nobody. But while we’re doing all that ~other stuff~ it’s holding onto locks.
waterfall
This isn’t always the end of the world.
Right now, the only index is the clustered primary key. Some queries will be able to finish immediately, as long as they’re located in they’re in the clustered primary key prior to the rows that are being locked.
For example, this query can finish immediately because the date value for its row is a touch before the pages we’re updating.
SELECT
b.*
FROM dbo.Badges AS b
WHERE b.Id = 1305713;
take the cannoli
But this query, and any queries that search for an Id value on a locked page, will be blocked.
SELECT
b.*
FROM dbo.Badges AS b
WHERE b.Id = 1306701
That Id value is just outside the range of dates we’re modifying, but because the storage engine is locking pages and not rows, it has to protect those pages with locks.
step on
With this index, both queries would be able to finish immediately, because the storage engine would know precisely which rows to go after, and a more granular locking arrangement (rows instead of pages) would be available.
CREATE INDEX
woah_mama
ON dbo.Badges
(Date)
WITH
(MAXDOP = 8, SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);
In this example, we’re not updating the Date column, so we don’t have to worry about the sky falling on write speeds, or write speeds crying wolf, or a thousand other cautionary tales about overreaction.
NOLOCK, And Other Drugs
Let’s say we’re idiots unaware of how bad uncommitted reads are, and we decide to use NOLOCK hints to avoid our select queries being blocked.
SELECT
b.*
FROM dbo.Badges AS b WITH(NOLOCK)
WHERE b.Id = 1306700;
While that transaction is open, and we’re still doing all those other highly ACIDic things to our database, this query will return the following results:
jailbird
We can see the in-flight, uncommitted, change from the update.
Maybe we return this data to an end user, who goes and makes a really important, life-altering decision based on it.
And then maybe something in that transaction fails for some reason, and everything needs to roll back.
That really important, life-altering decision is now totally invalidated, and the end user’s life doom-spirals into sure oblivion.
And it’s all your fault.
Thanks for reading!
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.
Faster, easier, cloudier, smarter, shardier. Blah blah blah. Sounds great! But how do you get new users to try your product and make any reasonable assessment of it?
Not everyone comes to the table with a bunch of data they can jam right in and get partying with.
Worse, they might be on a platform where exporting their data to use on your platform (because, let’s face it, no two products have the same backup format) is a real and complete pain.
The number of hurdles that users face just getting to the point where they can make an assessment of your product are pretty huge.
Let’s talk about how you can make that easier.
Working Data Sets
Stack Overflow provides public access to their data sets, both as XML dumps, and a web interface.
I’m not saying that their relational data makes sense for every system out there, but it’s a big data set that’s good for analysis.
If you own a database product, you could:
Grab the dumps and turn them into a format that makes sense for your platform
Have a hosted playground for new users to run queries against interactively
This helps potential new customers get comfortable with the inevitable proprietary syntax, gauge query efficiency.
Microsoft, for all its silliness, gives SQL Server users a couple different sample databases to work off of. They even update them for new versions to show off all the new memes features they’ve added.
They even have a free developer edition of the product that you can install and run with pretty quickly. You don’t need this if your product is all cloud-based, but you get the idea.
Hands-down, the most annoying part of testing any database platform, is getting reasonable data to test against in there.
Portability
If you are an installer-based life form, and your database as a lot of settings that might matter for performance and reliability, or uses a specific OS, you should consider having a few different VM images available for download.
This lets you easily distribute a golden copy of an ideal environment for your product, with the OS, database, and data all packed together.
Oracle does this, and for the short time I had to experiment with some stuff on their platform, it was incredibly handy.
If you don’t want to go this route, because you don’t quite have Oracle money, being a fledgling database product, have a dedicated install and config page:
Recommended hardware
OS version
Database install steps
Any additional dependencies
Recommended database configurations
Where to get ample sample data to play with
While we’re talking about sample data, why not have a few different sizes of data? Not everyone wants to set up a 64 core, 2TB of RAM virtual machine just to mess with a petabyte set of time series data.
Have some small, medium, large, and extra large sets available for testing.
Sure, prospective clients might opt for small and medium, but the folks you want to evangelize your product are going to love you for having bigger data sets to show more complex problems and solutions.
If part of the sell for your product is how great data ingestion is, have data ready to ingest based on whatever format you excel at, even if it’s Excel files.
More likely it’s csv, parquet, json, or uh… something.
Visibility
A lot of folks are used to having more than a command line to interact with their database.
Postgres has pgAdmin, Oracle has SQL Developer, Microsoft has SQL Server Management Studio and Azure Data Studio, and there are many third party tools that can connect to a variety of platforms, too.
Writing large, complex queries in a CLI is a drag. It might be great for administration, and simple stuff, but correcting syntax errors in them is like threading needles blindfolded.
You may not want to build a whole bunch of tooling up front for developers to work in, but a lightweight browser-based tool with a “run” button can go a long way.
Take db<>fiddle as an example. You can’t do any database management with it, but you can pretty much fully interact with the database by sending queries in, the way a developer would write and test queries.
Nerdery
I love playing with other databases, but I do not love all the foreplay it takes just to get moving with one.
The more things are different about your platform — and those differences may be spectacular — the harder it is to lure away folks who are inexperienced with the stack you’ve chosen.
You might even have an amazing sales tech team who will come in and do a lot of the heavy lifting for prospective clients, but some companies out there want to do a test run before investing a lot of time and getting hourly check-ins from sales reps about how things are going and when the contract will get signed.
That also ignores one of the most powerful segments of any database community: the developers who will build content around your product, and go out in the world to blog, present, and develop training content.
Thanks for reading!
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.
Whether you want to be the next great query tuning wizard, or you just need to learn how to start solving tough business problems at work, you need a solid understanding of not only what makes things fast, but also what makes them slow.
I work with consulting clients worldwide fixing complex SQL Server performance problems. I want to teach you how to do the same thing using the same troubleshooting tools and techniques I do.
I’m going to crack open my bag of tricks and show you exactly how I find which queries to tune, indexes to add, and changes to make. In this day long session, you’re going to learn about hardware, query rewrites that work, effective index design patterns, and more.
Before you get to the cutting edge, you need to have a good foundation. I’m going to teach you how to find and fix performance problems with confidence.
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.
Why Partially Fixing Key Lookups Doesn’t Work In SQL Server
Video Summary
In this video, I dive into why adding columns to an index to fix every key lookup might not always be the best approach. I share insights from my recent blog post on how fixing predicates in key lookups can sometimes be sufficient to avoid performance issues but highlight that there are scenarios where it’s not practical or efficient. Using the Stack Overflow database as a case study, I demonstrate through query plans and statistics histograms why adding every column to an index might introduce unnecessary overhead, especially for frequently updated tables with large string columns. This video aims to provide a deeper understanding of SQL Server’s costing mechanisms and encourage thoughtful index design rather than knee-jerk reactions.
Full Transcript
Erik Darling here with Darling Data. Still muddling through the holidays, if you can believe that. Anyway, in today’s video, it will hopefully be a nice brief video, I want to talk about why you can’t fix every key lookup just by adding some columns to the index, to an index. And this is, I do have a blog post fairly recently, I don’t actually, I’m not even sure right now if it’s, actually no, it did publish, about how fixing predicates in key lookups can be good enough to avoid some of the really slow crap that can happen with key lookups. However, not all key lookups have predicates in them. And there are some cases, where you might really be, like, gung ho about fixing the key lookup completely by adding, like, a bunch of columns from the select list to the include list of your nonclustered index. And that can have a lot of downsides for a lot of reasons. Right? Because now you, now, every time you modify the base table, you have a, I mean, inserts and deletes, for sure. But, you know, updates, if the columns that you’re now adding to the key of this index are in the include, now you have some additional considerations for when you, for when you update the table. All right? Take up more space on disk, more space in memory. If you are changing things frequently, then, you know, you have some additional transaction logging stuff to think about, some additional locking and blocking stuff to think about.
So, and, you know, and especially if you have columns in your table that are strings, particularly big strings, you know, as you approach the numerical max of nvarkar or varkar columns, either 4,000 or 8,000, or if you have max data types in your table, then all of a sudden you’re looking at potentially storing and having a lot of just really big indexes in your table. So, really big indexes if you start trying to fix key lookups that way. So, what I want to show you is the costing mechanism behind lookups and something that might even be surprising to you because it was certainly surprising to me when I learned about it or when I discovered it through vigorous testing, staring at query plans.
So, I’ve already got an index on my users table in the Stack Overflow database and I’ve updated the statistics to use a specific distribution. I mean, it’s not that I got it naturally once, but then like when I went to redo the demo, sometimes I would get different statistics and sometimes those different statistics would make the lookup versus non-lookup demo fail kind of miserably. So, in order to avoid that, I use a specific statistics histogram that I know gives me the outcomes that I want.
That’s this whole crazy thing here that makes for a really long scroll bar. This is the most interesting thought to ever go through my mind, which is nicely lined up with my ears. Anyway, so what I want to show you right now is how the reputations 23 and 51 are distributed in the users table.
So, if we run this and we look at the results, for reputation 23, there are 13,542 records. And for reputation 51, there are 13,116 records. Very close, right?
Almost suspiciously close in values. There’s reputation fraud going on there. Anyway, if I run this query to look for users with a reputation of 51, and remember 51 has 13,116 rows associated with it.
If I run this, we look at the query plan, and we quite naturally get a key lookup plan. Right here. 13,116.
Oh, look, that’s a good statistics histogram. We nailed that. That was correct. That was on the nose. And, of course, we have this missing index request, where SQL Server says, just go ahead and add every column to this index. All of them.
So, that includes, well, let’s make this a little bit easier for everyone to see all in one go here. If we look at what SQL Server wants in there, we have about me. That’s a bar car max.
We have location. We have website URL. We have display name. These are all string columns. They’re not the location, display name, website URL. Those aren’t maxes, but, you know, it’s a lot of work just to have, to avoid a key lookup.
And this query isn’t slow either, right? If we look at how long this thing takes, it’s 31 milliseconds. I don’t know if I need to maintain an index of that girth over a 31 millisecond query.
Maybe not the best thing in the world there. And if you have questions about why I have 1 equals select 1 at the end of these queries, you are free to visit my website.
Because the question that everyone asks every time they see a query of mine that uses that is, what’s the point of 1 equals select 1? And I have to say, what is the point of 1 equals select 1? What could it possibly be?
Could it be explained to you in a blog post on my site? Did you perhaps type, what’s the point of 1 equals select 1 into the wrong place? Did you mean to leave a comment or did you mean to type that into Google? Or Bing, DuckDuckGo, or whatever search engine wants to sponsor my next video.
That’ll be the one that I ask about. But when we run that same query for reputation equals 23, I’m going to run this two different ways.
I’m going to run this once. I apologize that it’s still highlighted. I know that looks a little hacky, but you’ve dealt with worse from me, so we’re just going to roll with it here. So this one, of course, is just a regular select and where.
And this query down here tells SQL Server that I want to use my nonclustered index instead. All right, so use the nonclustered index. Gosh darn it.
Now, if we look at the query plans for these, you know, this one takes 200 milliseconds about. I don’t know where my thing is. I mean, I don’t know where my cursor is.
Other things I’m acutely aware of. My books, my phone, where these lights are blinding me. Anyway, this query takes about 200 milliseconds.
The cardinality estimate on that is, again, spot on, right? Exact abundo. We’re not missing anything. The stats histogram that I created, beautiful. Doesn’t get much better than that.
Of course, because SQL Server’s query optimizer is so incredibly biased against random IO, it chooses to scan the clustered index rather than seek into our nonclustered index, dive right in, find a few rows, and do lookups to match them in the clustered index.
That’s this whole section here, right? We have our index seek, where I told SQL Server which index to use. And as usual for a key lookup, oh, that wasn’t what I wanted.
Let’s redraw this one. Let’s make sure we’re holding the control key. And let’s put a square here. And as usual with a key lookup, we have a nested loops join. So we take one row from over here, put it into the nested loops join, and then we go find it down here.
That’s what a key lookup is. It joins two indexes together. And unlike in the blog post that I had written recently about predicate lookups, if we look at the key lookup here, all we have is this big old output list.
And all we have in that green text up here that we already looked at is SQL Server asking for us to include every single column in the table in our nonclustered index.
Which again, we don’t really want to do. Now, I totally get it. Select star is a bad practice. You do select star queries, you’re asking for trouble, you’re a bad DBA or developer or whatever you fancy yourself, whatever title your company gave you so they can chronically underpay you.
That is what you are a bad version of. The thing is, when it comes to key lookups, they have the exact same cost regardless of if you’re getting all the columns or if you’re just getting one of the columns that is not part of your nonclustered index.
So coming back up here, this is the index that we created. The reputation column. And since the clustered primary key of the users table is on a column called ID, only that column is inherited by the nonclustered index.
No other column in the table is magically part of our nonclustered index definition. So the problem is that whether we get every column or we just get one column that’s not part of our nonclustered index, key lookup has the exact same per loop cost.
So if we come over to the query plans for these and we look at the key lookup, it has an estimated operator cost of 37.715 query bucks for the select star query and it has an estimated operator cost of 37.715 query bucks for the query that just asked for one additional column.
So technically speaking, to SQL Server, it makes no difference at all to the query optimizer whether you’re selecting one column that’s not the nonclustered index or you’re selecting all the columns that aren’t the nonclustered index.
Doesn’t care. Each loop through has the same fixed cost. So it doesn’t matter at all how many columns you’re selecting.
One column missing from your nonclustered index that’s required by the query accrues the exact same cost as 100 columns missing from your nonclustered index that are required by the query.
And that is absolutely fascinating. So let’s talk for a quick moment here. It would be really stupid, tremendously, monumentally, outrageously stupid if you had every nonclustered index include every column in the table or try to account for every column that you might someday maybe perhaps select from a query to do a thing with maybe.
Per chance, they say. We can sort of figure out why by asking ourselves a couple questions. First, what is the clustered index?
What is a clustered index in SQL Server? It’s every column in your table logically ordered by your clustered index key column or columns.
So higher table logically ordered by for the user’s table would be the ID column which for various reasons makes a pretty good clustered index. A nonclustered index is every column you may include in the clustered index.
All right? And then the key of the whatever columns you put in the key of the nonclustered index. All right? So we order all this stuff. We have the includes down here.
If you let’s say thinking about the user’s table let’s scroll back up a little bit and let’s just look at this select star query. So right now the clustered index is everything in here everything every column in the table logically ordered by this column.
If we were to make a if we were to follow the missing index recommendation that SQL Server wants and we were to put a nonclustered index keyed on reputation and include every other column in the table we would have every column in the table logically ordered by reputation which would just be like in everything but in everything but the name because you can’t have two clustered indexes on a table that nonclustered index would be almost like another clustered index wouldn’t it?
It would be every column in the table ordered by some column. You don’t need two clustered indexes. You don’t. Simply don’t. If you needed two clustered indexes Microsoft would allow you to create more than one clustered index.
So please don’t do that. Please don’t follow the missing index recommendations that include every single column in the table.
It’s often unnecessary. Please spend some time evaluating the speed and efficacy of your queries before you go creating super wide indexes to cover various things because you’re not going to be happy after you create like 10, 15, 20 of those suckers on your table and all of a sudden everything is blocking the deadlocks because every index is getting locked all the time.
Cool. All right. So it’s a snowy day here in New York City and I don’t know maybe I’ll go make some hot chocolate spike it with something extra spicy.
I hope you what is it enjoyed yourselves suppose I hope you learned something if you sat here for 15 minutes and didn’t learn anything well.
Oh maybe maybe I’ll do better in the next video. I hope you enjoyed yourselves. If you like this video at all in any way whatsoever even if you have to rewatch it with the mute button on or if you I don’t know have to have some nitrous before you watch it if you like it in any way shape or form feel free to give the thumbs up button down below a little taparoo.
If this is the sort of SQL Server content that you have any sort of appreciation for I do like getting new subscribers so that I can get more likes and so that I can annoy more people every time I publish a video with Hey Erik Darling did something did you know?
So yeah that’s that thank you for watching
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.
SQL Server’s missing index requests (and, by extension, automatic index management) are about 70/30 when it comes to being useful, and useful is the low number.
The number of times I’ve seen missing indexes implemented to little or no effect, or worse, disastrous effect… is about 70% of all the missing index requests I’ve seen implemented.
If they’re all you have to go on, be prepared to drop or disable them after reviewing server and index usage metrics.
Here’s what you’re way better off doing:
Find your slowest queries
See if there’s a missing index request
Run them, and get the actual execution plan
Look at operator times in the execution plan
Ask yourself if the index would benefit the slowest parts
Or, you can hire me to do all that. I don’t mind. Even the Maytag Man has an alarm clock.
Poor Performer
Let’s start with a query, and just the base tables with no nonclustered indexes added. Each table still has a clustered primary key on its Id column.
Initially, I thought showing the query plan in Row Mode over Batch Mode would make issues more clear, but row mode operator times are a real disaster.
this does not add up.
They’re supposed to be cumulative going from right to left, but here we go from 9 to 4 to 10 to 27 to 22 to 41 to 32. Forget that. Batch Mode it is.
Anyway, here’s the query.
SELECT
u.Id,
u.DisplayName,
TopQuestionScore =
MAX(CASE WHEN p.PostTypeId = 1 THEN p.Score ELSE 0 END),
TopAnswerScore =
MAX(CASE WHEN p.PostTypeId = 2 THEN p.Score ELSE 0 END),
TopCommentScore =
MAX(c.Score),
TotalPosts =
COUNT_BIG(DISTINCT p.Id),
TotalComments =
COUNT_BIG(DISTINCT c.Id)
FROM dbo.Users AS u
JOIN dbo.Posts AS p
ON u.Id = p.OwnerUserId
JOIN dbo.Comments AS c
ON u.Id = c.UserId
WHERE EXISTS
(
SELECT
1/0
FROM dbo.Votes AS v
WHERE v.PostId = p.Id
AND v.VoteTypeId IN (1, 2, 3)
)
AND u.Reputation > 10000
AND p.Score > 10
AND c.Score > 0
GROUP BY
u.Id,
u.DisplayName
ORDER BY
TotalPosts DESC;
The goal is to get… Well, pretty much what the column names describe. A good column name goes a long way.
If you had free and unfettered access to these tables, what row store indexes would your druthers lead you to?
I’m limiting your imagination to row store here, because that’s what the missing index requests are limited to.
Underwhelm
The optimizer has decided two indexes, on the same table, would really help us out. There are two very clunky ways to see them both.
You can always see the first one in green text at the top of your query plan, when a missing index request exists.
If you prefer something human readable, this is what they would translate to, with a little bit of hot sauce courtesy of yours truly.
CREATE INDEX
c
ON dbo.Comments
(Score)
INCLUDE
(UserId)
WITH
(MAXDOP = 8, SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);
CREATE INDEX
c2
ON dbo.Comments
(UserId, Score)
WITH
(MAXDOP = 8, SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);
Big Reveal
With all that in mind, let’s look at the query plan before adding the indexes. We’re going to skip ahead a little bit in the bullet points above, to the last two:
Look at operator times in the execution plan
Ask yourself if the index would benefit the slowest parts
Here’s the plan, which takes ~10 seconds in total. The arrow is pointing at where the optimizer thinks a new index will help the most.
wasted effort
Since the operators in this plan are mostly in Batch Mode, every operator is showing CPU time just for itself.
That entire portion of the plan is responsible for about half of the total execution time, but there’s no index recommendation there.
And look, I get it, missing index requests happen prior to query execution, while index matching is happening. The optimizer has no idea what might actually take a long time.
But if we’re looking at the only pre-execution metrics the optimizer has, you’d think the estimated costs alone would push it to ask for an index on the Posts table.
Perhaps missing index requests should be selected after query execution. After all, that’s when the engine knows how long everything actually took.
Generous Soul
Okay, so those two indexes on the Comments table up there? I added both of them.
The query plan changes, but it doesn’t get any faster.
Once again, a missing index request is registered, but only one this time.
On the Votes table.
Not the Posts table.
CREATE INDEX
v
ON dbo.Votes
(VoteTypeId)
INCLUDE
(PostId)
WITH
(MAXDOP = 8, SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);
Okay SQL Server, you got me. I’ll add it.
Impatience
With that index in place, what sort of totally awesome, fast query plan do we get?
we don’t.
Every time we add an index, this query gets one second slower. Part of the problem, of course, is that the optimizer really likes the idea of joining Posts to Votes first.
All of the query plans we’ve looked at have ad a similar pattern, where Vote is on the outer side of a Nested Loops join, and Posts is on the inner side, correlated on the pre-existing clustered primary key on Posts.
But Posts has a much more important join to the Users table. If we were to make that more efficient, we could perhaps change the optimizer’s mind about join ordering.
And there’s no missing index request to tell us that. We have to use our damned eyes.
Maybe something like this.
CREATE INDEX
p
ON dbo.Posts
(Score, OwnerUserId)
INCLUDE
(PostTypeId)
WITH
(MAXDOP = 8, SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);
Let’s give that a shot.
“Self-Tuning Database”
When vendors tell you about their self-tuning database systems, they’re lying to you.
Maybe Oracle isn’t. I don’t know.
But I’m so confident in this new index that I’m going to get rid of all the indexes that SQL Server has suggested so far.
They were bad. They made our query slower, and I don’t want them interfering with my awesome index.
for a dollar
Now the query is twice as fast, at 5 seconds (down from the original 10 seconds). The two operators that take up the majority of the query execution time now are the Hashes; Inner Join and Aggregate.
They don’t spill, but they are likely ill-prepared for the number of rows that they have to deal with. One may infer that from the estimated vs. actual rows that each one sees.
SQL Server 2014 now uses one shared hash table instead of per-thread copy. This provides the benefit of significantly lowering the amount of memory required to persist the hash table but, as you can imagine, the multiple threads depending on that single copy of the hash table must synchronize with each other before, for example, deallocating the hash table. To do so, those threads wait on the HTDELETE (Hash Table DELETE) wait type.
I tried many different indexing schemes and combinations trying to get the terrible underestimate from the Comments table to not cause this, but nothing quite seemed to do it.
In cases where you run into this, you may need to use a temp table to partially pre-aggregate results, and then join to the troublesome table(s) using that data instead.
Thanks for reading!
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.
Probably the most fascinating thing about Eager Index Spools to me is how often the optimizer will insert them into execution plans, often to a query’s detriment.
In a sane world, a non-loop join plan would be chosen, a missing index request would be registered that matches whatever would have been spooled into an index, and we’d all have an easier time.
While I understand that all of the optimizer stuff around spools in general was written before storage hardware wasn’t crap, and 32bit software couldn’t see more than a few gigs of memory, I do find it odd that so little revision and correction has been applied.
Of course, there are use cases for everything. I was involved in a data warehouse tuning project where rewriting a query to corner the optimizer into using a nested loops join was necessary to build an Eager Index Spool. Maintaining a nonclustered index on the staging table made data loads horrible, but letting SQL Server build one at query runtime was a massive improvement over other options. All that had to be done was to rewrite a simple inner join to remove any direct equality predicates.
While the below queries don’t even come mildly close to reproducing the performance improvement I’m talking about above, it should give you some idea of how it was done.
/*How it started*/
SELECT
p.Id,
UpMod =
SUM(CASE WHEN v.VoteTypeId = 2 THEN 1 ELSE 0 END),
DownMod =
SUM(CASE WHEN v.VoteTypeId = 3 THEN 1 ELSE 0 END),
PostScore =
SUM(p.Score)
FROM dbo.Posts AS p
JOIN dbo.Votes AS v
ON v.PostId = p.Id
WHERE p.Score > 1000
GROUP BY
p.Id;
/*How it's going*/
SELECT
p.Id,
UpMod =
SUM(CASE WHEN v.VoteTypeId = 2 THEN 1 ELSE 0 END),
DownMod =
SUM(CASE WHEN v.VoteTypeId = 3 THEN 1 ELSE 0 END),
PostScore =
SUM(p.Score)
FROM dbo.Posts AS p
JOIN dbo.Votes AS v
ON v.PostId >= p.Id
AND v.PostId <= p.Id
WHERE p.Score > 1000
GROUP BY
p.Id;
With no equality predicate in the join clause of the second query, only a nested loops join is available. But again, this is the type of thing that you should really have to push the optimizer to do.
spool me once
Of course, for the above queries, the second plan is a disaster, like most Eager Index Spool plans tend to be. The non-spool query with the hash join finishes in about 600ms, and the Eager Index Spool plan takes a full 1 minute and 37 seconds, with all of the time spent building the spool.
spool me twice
So, like I’ve been saying, one should really have to go out of their way to have this type of plan chosen.
Matter Worse
Compounding the issue is that the optimizer will sometimes choose Eager Index Spool plans when they are entirely unnecessary, and indexes exist to fully support query requirements.
The below join doesn’t actually work, because it’s not how the tables are related, but it’s a good example of that I mean.
SELECT
u.Id,
u.DisplayName,
p.*
INTO #p1
FROM dbo.Users AS u
OUTER APPLY
(
SELECT
Score = SUM(p.Score),
AnswerCount = SUM(p.AnswerCount)
FROM dbo.Posts AS p
WHERE p.Id = u.Id
) AS p;
The Id column in both the Users table and Posts table is the clustered primary key. There’s no sensible reason for an index to be created at runtime, here.
Of course, the Posts table relates to the Users table via a column called OwnerUserId, but whatever.
The point is the resulting query plan.
crappy
If we tell the optimizer that’s it’s being a dunce, we get a better, much faster, Eager Index Spool-free query plan.
SELECT
u.Id,
u.DisplayName,
p.*
INTO #p2
FROM dbo.Users AS u
OUTER APPLY
(
SELECT
Score = SUM(p.Score),
AnswerCount = SUM(p.AnswerCount)
FROM dbo.Posts AS p WITH (FORCESEEK) /*I am different*/
WHERE p.Id = u.Id
) AS p;
Sometimes this is the only way to solve spool problems.
muscular
Option One: Adding An Index
In most cases, Eager Index Spools are just really irritating missing index requests.
Here’s an example of one. The query itself touches the Posts table three times. Once to find questions, a second time to find answers related to those questions, and a third time to make sure it’s the highest scoring question for the answer.
SELECT TOP (100)
QuestionOwner =
(
SELECT
u.DisplayName
FROM dbo.Users AS u
WHERE pq.OwnerUserId = u.Id
),
QuestionScore =
pq.Score,
QuestionTitle =
pq.Title,
AnswerOwner =
(
SELECT
u.DisplayName
FROM dbo.Users AS u
WHERE pa.OwnerUserId = u.Id
),
AnswerScore =
pa.Score
FROM dbo.Posts AS pq
JOIN dbo.Posts AS pa
ON pq.Id = pa.ParentId
WHERE pq.PostTypeId = 1
AND pa.PostTypeId = 2
AND pa.Score >
(
SELECT
MAX(ps.Score)
FROM dbo.Posts AS ps
WHERE ps.ParentId = pa.ParentId
AND ps.Id <> pa.Id
)
ORDER BY
pa.Score DESC,
pq.Score DESC;
Are there many different ways to write this query? Yes. Would they result in different query plans? Perhaps, perhaps not.
Right now, this query has this index available to it, along with the clustered primary key on Id.
CREATE INDEX
p
ON dbo.Posts
(PostTypeId, ParentId, OwnerUserId)
INCLUDE
(Score)
WITH
(MAXDOP = 8, SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);
Because we don’t have an index that leads with ParentId, or that allows us to easily seek to ParentId in the MAX subquery (more on that later, though), the optimizer decides to build one for us.
el yuck
We can see what index the spool is building by looking at the tool tip. In general, you can interpret the seek predicate as what should be the key column(s), and what should be included by what’s in the output list.
There is sometimes some overlap here, but that’s okay. Just ignore any output columns that are already in the seek predicate. And of course, we can generally ignore any clustered index key column(s), since the nonclustered index will inherit those anyway.
get it for less
Adding this index will get rid of the Eager Index Spool:
CREATE INDEX
p2
ON dbo.Posts
(ParentId, Score)
WITH
(MAXDOP = 8, SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);
In this case, I’ve chosen to add the Score column to they key of the index to allow for an ordered aggregation (SUM function) to take place without a Sort operator.
spool’s out for summer
Option Two: Over Communicating
Let’s take a step back. We currently have this index, that leads with PostTypeId.
CREATE INDEX
p
ON dbo.Posts
(PostTypeId, ParentId, OwnerUserId)
INCLUDE
(Score)
WITH
(MAXDOP = 8, SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);
The section of the query that generates the Eager Index Spool is this one:
(
SELECT
MAX(ps.Score)
FROM dbo.Posts AS ps
WHERE ps.ParentId = pa.ParentId
AND ps.Id <> pa.Id
)
What we know, that the optimizer doesn’t know, is that only rows with a PostTypeId of 2 are answers. We don’t need to compare answers to any other kind of post, because we don’t care about them here.
If we change the subquery to limit comparing answers to other answers, it would also allow us to take care of the existing index by locating the right type of Post, and give seekable access to the ParentId column.
(
SELECT
MAX(ps.Score)
FROM dbo.Posts AS ps
WHERE ps.ParentId = pa.ParentId
AND ps.PostTypeId = 2
AND ps.Id <> pa.Id
)
That changes the full query to this:
SELECT TOP (100)
QuestionOwner =
(
SELECT
u.DisplayName
FROM dbo.Users AS u
WHERE pq.OwnerUserId = u.Id
),
QuestionScore =
pq.Score,
QuestionTitle =
pq.Title,
AnswerOwner =
(
SELECT
u.DisplayName
FROM dbo.Users AS u
WHERE pa.OwnerUserId = u.Id
),
AnswerScore =
pa.Score
FROM dbo.Posts AS pq
JOIN dbo.Posts AS pa
ON pq.Id = pa.ParentId
WHERE pq.PostTypeId = 1
AND pa.PostTypeId = 2
AND pa.Score >
(
SELECT
MAX(ps.Score)
FROM dbo.Posts AS ps
WHERE ps.ParentId = pa.ParentId
AND ps.PostTypeId = 2 /* I am new and different and you should pay attention to me */
AND ps.Id <> pa.Id
)
ORDER BY
pa.Score DESC,
pq.Score DESC;
Which changes the section of the query plan that we’re concerned with to this:
understanding
Sometimes the optimizer just needs a little but more information from you.
Thanks for reading!
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.
A Little More About Isolation Levels In SQL Server
In this video, I talk about some of the misguided expectations that Read Committed has associated with it, and query patterns to watch out for under Read Committed Snapshot Isolation.
If you’d like to see my full day performance tuning session, here are upcoming dates:
In this video, I dive into the often-overlooked world of SQL Server isolation levels, particularly focusing on read committed snapshot isolation (RCSI) and its alternatives. With a humorous twist, I highlight common misconceptions about isolation levels and demonstrate how they can lead to issues like race conditions under different scenarios. By walking through practical examples and real-world queries, I aim to clarify the nuances of these settings and help you make informed decisions for your database environments. Whether you’re dealing with complex query patterns or just looking to avoid potential headaches, this video offers valuable insights into optimizing SQL Server performance without overcomplicating your setup.
Full Transcript
Erik Darling here with Darling Data, a surrogate style of Beardgut Magazine. Long, beautiful relationship. Today I want to talk about isolation levels a little bit because something that keeps coming up, and the most annoying things about isolation levels keep coming up, things that I really loathe having to repeat or try to do. tell people about over and over again. And we’re gonna do that today. Because Friday, there’s nothing like a good Friday talk about isolation levels. So first, no isolation level is perfect for everything all the time. I wish that one was because then we could all just use that and we could stop having these conversations. But usually, SQL Server, where we’re going to go, we’re going to go to the next level of the world. So we’re going to go to the next level of the world. The things that I have to talk people out of is one using no lock hints, because that’s where you get bad data potentially under concurrency. And the other one is the switch between read committed, the default pessimistic isolation level in SQL Server, and read committed snapshot isolation, which is an optimistic isolation level in SQL Server. The reason why that’s the usual choice is because most people is because most people have to like, like, like, don’t want to like, like, like, you can use snapshot isolation, but then every query would have to ask for snapshot isolation in some way, either like, like, like, when it like, if it’s an application, when it connects to SQL Server, if it’s a stored procedure, you have to add it to the code, stuff like that. And that just, you know, unless you have a lot of control, unless you have a lot of time and patience, to figure out which queries you want to use an optimistic isolation level, then that’s kind of a tougher one to talk folks into. That also assumes that you have control over those things, right? Like some, if you have a third party application, you might not, your only hope might be to use read committed snapshot isolation, which kicks in for every read query that comes in and hits the database. So that’s the most common choice. The thing is, every time I talk about switching over, where someone will go to some due diligence, and they’ll read some blog posts where all they can talk, they think that there are bugs in SQL Server with read committed race conditions and things like that. And that’s really not the case. The case is that there are certain query patterns you have to watch out for where you might hit race conditions.
Now, I’m not saying that read committed snapshot isolation is perfect, because again, it’s not. But for servers that I look at, where there is a lot of bad blocking between readers and writers, and a lot of deadlocking between readers and writers, read committed snapshot isolation is perhaps the safest way to solve all of those problems in one go. You’ll still have write queries block one another, but the read queries fighting with write queries and write queries fighting with read queries goes away completely. So it is a great isolation level for most SQL Server workloads. And most mature database platforms out of the box use some form of multi version multi version concurrency control by default, right? Even Azure SQL DB uses it by default, because Microsoft probably like, like, hey, we’re going to release this product, we’re going to manage it. We don’t want people complaining about blocking all the time, because then they have to go solve blocking problems. And that’s not fun, because that’s query tuning, index tuning, stuff like that. So read committed, the pessimistic isolation level has a lot of problems that one might consider to be race conditions, if one were to be really concerned about application and query concurrency. So here’s the first example, and I’m going to figure out which way I have to turn. There we go. That should get us in the frame.
Or get get all the text in the frame rather. So if you have a query that just finished reading what used to be row C, but it’s now a ghost record because it was deleted. And your query just finished reading it and your query just finished reading it and has now moved on to row D, that row C will still show up in your query results because your query already grabbed it. It doesn’t disappear from the results. All right. And these slides are all from my foundations of SQL Server performance tuning class. I delivered it a pass. I’m delivering it a couple more times. Data tune in Nashville and up in Boston in May, as part for the New England SQL Server user group. If you’re in either of those areas, it would be a pleasure to see you.
The second reason why read committed isn’t really all that promising of an isolation level is let’s just pretend that again, we just finished reading row C, but then it gets updated. And now we have two row H’s. Our query will show one letter C and two H’s in the results. Right? That’s not great either. Right? That would seem like a race condition. That would seem like a bug. But that’s the way read committed works. Read committed the non-snapshot pessimistic isolation level. It takes very brief read locks on things, but data can change on either side of those locks whenever it wants. Right? Because those locks don’t hold on for very long. There’s no lock escalation with read queries, at least without hints or whatever. But all of this stuff is open to change as soon as those locks get released.
Another reason why it’s not great is because similar situation. Let’s say we’re reading row E. Row F gets updated to be another value C. So now we have two C’s over here. But all our query will see is one C and no F. Right? And those are, again, something that could very much be interpreted as a race condition in your queries if you are really concerned about concurrency. Now, this is stuff that read committed snapshot isolation fixes. But we have to talk about some other stuff first.
The first thing we have to talk about are some query patterns in under read committed the pessimistic isolation level that can also cause things that look like race conditions, but are really just, again, the lack of promise that read committed has as far as what data is going to return. So if you do something like this, right, you, you know, in your store procedure and your query, whatever it is, if you set some variable value equal to something based on a select, locks on that select, unless you add locking hints into transaction, don’t hold on once that query is done.
Actually, even like once it finds that row, like data, data in the table can change all over the place. So if you were to take this, like whatever this gets set to and use it to like, you know, insert into another row, use it to like find data that you’re going to update, that data could be completely irrelevant by the time your query gets to it.
Again, under concurrency, if you’re just running it in isolation, everything’s going to look great every time. But under concurrency, the data in there could change really quickly. Another pattern that could have similar effects as a race condition, big air quotes on that, is if you dump data into a temp table and then you use that temp table to go update things, because whatever’s in that temp table is maybe invalid by the time you go to do that update.
So these are things that a lot of folks don’t think about when it comes to matters of concurrency. And this is a lot of the lack of understanding about the promises that Reid committed the pessimistic isolation level makes are way overblown. A lot of people think that Reid committed the pessimistic isolation level behaves like snapshot isolation, like where your query takes a picture of the data and it’s perfect, or like serializable where nothing can change while your query’s reading the data, right?
Because everyone thinks that what it’s returning is this magnificent piece of data, but all it really promises is blocking and deadlocking with modification queries. And all it promises is that the data that it read was committed at the time that the read happened.
So remember, just like in the slides, modifications can happen all around it. The only thing that’s guaranteed is if you hit a lock, your query will wait for that lock to release before reading that. That’s the only real guarantee that Reid committed the pessimistic isolation level makes.
Now, the next thing I got to talk about is query patterns that might exhibit what seems like a race condition under an optimistic isolation level. But a lot of the times when you hit this, these are also things where there is some potential for these as well under a pessimistic isolation level. They’re just a little bit more rare.
They’re actually pretty rare under even an optimistic isolation level because you have to write really dumb queries for stuff like this to happen. So I’m in my database crap. At least I hope I am.
Home is where the crap is. And I’ve got this table called dinner plans. And I’m going to populate that table with, well, when I wrote this demo originally, there were a bunch of people who I thought I was going to have dinner with at pass.
It turned out the only person I had dinner with was Kendra. But that was nice anyway. It was a great, great, actually, it was like a few dinners with Kendra. She’s like the only person who would hang out with me.
Whatever. Not bad company, though. And so right now in that table, our dinner plans table, this is what things look like. You got a list of people.
And for some reason, seat number one is free. But the other five seats in the table are all taken. So we’re going to make sure that RCSI is off for our crap database. And we’re going to look at, I have these queries set up in the other two windows over here.
But we’re going to look at just real quick what the query does. So this is the kind of sort of dumb query pattern that would exhibit a race condition under RCSI. And it might not necessarily hit under a pessimistic isolation level, but could still happen if things got weird enough.
So what we’re doing is updating a table alias, our table dinner plans, which is aliased as DP. If you are not a very mature person, you want to make jokes, go ahead. But it’s dinner plans, nothing more.
So we’re updating our dinner plans table. And for some reason, rather than just doing a regular update, we’re doing this exists check through the base table to look for an ID where the seat is free. So this is where things get interesting because we have two references to the table.
We have one for the update and one for the select. Only the reference for the update will ever have the type of exclusive locks on it that will prevent a read query under a pessimistic isolation level from executing and getting data. The reference in the select portion, that can read whatever.
So if you have a no lock hint in there, you’re screwed. If you, you know, if in that inner query, you know, any of the stuff happens that we talked about, that like where data can change around where the reads under the pessimistic read committed isolation level happens, you could still hit what feels like a race condition. All right.
So let’s go look at what happens when we do this. So I’m going to say begin tran. I’m going to run that. And we’ve output this. And now I’m going to come over here and I’m going to run this.
And this is going to get blocked. Right. This query is now blocked because we have this update in a transaction, updating dinner plans. And this query wants to update dinner plans and read from dinner plans.
We come back over to this first window and we commit this. This query will come back and return no results. All right.
Because that other query blocked it, updated that row to find a free seat. This query did not find a free seat when it went to run. So let’s commit this now so that we don’t have anything weird going on.
Let’s make sure this is fully committed. Now let’s change the crap database to turn read committed snapshot isolation on. All right.
This takes a second to run. That’s okay. It’s worth it. Okay. So now that’s turned on. Good. If we repeat that same demo, right, we’re going to run this. Oh, you know what?
I didn’t reset things. Let me admit that. Totally forgot to reset the table between runs so that that didn’t fail. And the joys of remembering stuff.
All right. So now let’s run that. And see, this one finds this. All right. And now let’s run this. Now this is still going to get blocked because that update is still happening.
On the other window, right? This transaction is still not committed. But because of the way an optimistic isolation level works, when this query did its update, the last known good version of the row got sent to either if you’re not using accelerated database recovery, it gets sent to tempdb.
If you’re using accelerated database recovery, it gets sent to the version store, local to the user database you’re in. And now, because this query is going to read a versioned row that this query is updating, when we, right, this one found this seat, right, and we commit this, now we come over here, and what did we find?
We found a seat. We know we found a seat. We found a seat. Because this thing got updated to the reverse Eric down here. All right.
So now if we commit this, we’ll have a little bit of an awkward situation. Because forward Eric will think that he got a seat at the dinner table, but backwards Eric will have the golden ticket and say, I get to sit there.
You don’t get to sit there. And then forward Eric and reverse Eric will, like, matter and anti-matter fight. And, I don’t know, some sort of universe death probably will happen. So these are the kind of query patterns that can cause things that, again, look and feel like race conditions under RCSI that you wouldn’t necessarily hit under, read committed, the pessimistic isolation level.
But, again, the promises that read committed, the pessimistic isolation level make are really flimsy. So, under most circumstances, for most query workloads where people aren’t writing completely idiotic queries, or if you have no-lock hints everywhere anyway, you’re probably better off using an optimistic isolation level because there’s far less, like, room for error than there is when you’re using no-lock hints.
And there’s far less pain than if your read queries are blocking and deadlocking with your write queries all the time. Right?
So, like, this is most queries function better than any database using an optimistic isolation level. If you have queries that don’t, if you have queries that need to read the most up-to-date version of data, just keep in mind that those queries are going to be subject to blocking and deadlocking.
If you want to enable read committed snapshot isolation and you want to have certain queries, not use row versioning, there’s a perfectly good read committed lock hint you can add to those queries or other locking hints that would make sense for those queries.
But just like a direct update like this wouldn’t have the problems that we were looking at. The problem really is the subquery doing the select reads a version of the row that it looks like it wasn’t supposed to because that should have been taken.
But, you know, again, it takes pretty high concurrency for you to find these problems. And it takes pretty stupid looking queries for you to find these problems. Right?
So, in this case, you know, that self-join, completely unnecessary. If you write modification queries that do things like that, you kind of deserve what you get. That’s not a smart way to write queries.
But there are times when you would have to write a query sort of like that, like a different query pattern. Like in this one, it’s particularly stupid because it’s just one table that we want to update and there’s no reason to do a subquery to touch another table.
But if there were like a different table where like we had to like update from like, you know, some reservation list or like a guest list or like, you know, like a list of reservations where you could possibly like go, you know, go to different restaurants or whatever.
Those are circumstances when having a subquery would be necessary. But I just want to remind everyone that like those subqueries under read committed could also read some weird data.
Like data could change before or after and because this table is the only one. It’s not going to have any sort of exclusive locks taken in there or around it. This wherever the select is.
So if you join tables together to do updates and there’s a no lock hint on like one of the on the table that’s not getting updated or you’re just using read committed the pessimistic isolation level.
You can still see weird stuff that’ll look and feel like race conditions under a pessimistic isolation level. Well, actually specifically under read committed the pessimistic isolation level. Serializable and repeatable read offer way more guarantees.
They also offer way more blocking. So you have that to look forward to. Anyway, this was a little bit longer than I expected. And like I said, it’s Friday and got some stuff I got to go do.
So I’m going to go do that. I’m going to get this started uploading as usual. I hope you enjoyed yourselves.
I hope you learned something. I hope that if you are not currently using an optimistic isolation level and you have a lot of problems with locking and blocking, you’ll consider using an optimistic isolation level. I hope that if you are slathering your queries with no lock hints, some bizarre cargo culting about saying that’s a best practice that you’ll consider using an optimistic isolation level and removing those.
If you enjoyed the video, pretty pleased give it a like. If you enjoy this sort of technical SQL Server content, feel free to subscribe to my channel. I’m always happy to have new folks coming in and learning stuff.
And I don’t know. Gosh, I think that’s it. I can’t think of a single other thing to say. Happy Friday. Hope everyone has a great weekend.
Thank you for watching and I’ll see you in the next video. Have a good one.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.
Whether you want to be the next great query tuning wizard, or you just need to learn how to start solving tough business problems at work, you need a solid understanding of not only what makes things fast, but also what makes them slow.
I work with consulting clients worldwide fixing complex SQL Server performance problems. I want to teach you how to do the same thing using the same troubleshooting tools and techniques I do.
I’m going to crack open my bag of tricks and show you exactly how I find which queries to tune, indexes to add, and changes to make. In this day long session, you’re going to learn about hardware, query rewrites that work, effective index design patterns, and more.
Before you get to the cutting edge, you need to have a good foundation. I’m going to teach you how to find and fix performance problems with confidence.
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.
Would you believe I still have to fix queries with a bunch of functions in joins and where clauses? Like every day? All day?
Where things get tough is when there’s some third party vendor app where code can’t be changed, but customers have become so fed up with performance that they’re willing to make other changes to help things along.
This isn’t a big fireworks demo. I could have spent a lot more time finding a worse scenario, and I’ve hinted it to exacerbate the issue a bit.
Sometimes my hardware is too good for bad demos, probably because it’s not in the cloud.
That whole “cloud” thing has some real problems.
Setup
Anyway, let’s say we have this index (because we do, I just created it).
CREATE INDEX
p
ON dbo.Posts
(CommunityOwnedDate, Score)
WITH
(SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);
GO
Let’s further presume that we have this stored procedure.
This stored procedure could have been a query.
CREATE OR ALTER PROCEDURE
dbo.FixNonSargable
(
@CommunityOwnedDate datetime,
@Score integer
)
AS
BEGIN
SET NOCOUNT, XACT_ABORT ON;
SELECT
p.*
FROM dbo.Posts AS p
WHERE COALESCE(p.CommunityOwnedDate, '17530101') >= @CommunityOwnedDate
AND p.Score >= @Score
ORDER BY
p.Id DESC
OPTION
(
USE HINT('QUERY_OPTIMIZER_COMPATIBILITY_LEVEL_140'), /*No batch mode*/
MAXDOP 1 /*No parallelism*/
);
END;
GO
When executed with such great gusto, it’s a bit slow.
At ~1.4 seconds, we’re very upset with performance. Throw the whole thing in the trash. Install DuckDB.
Index Reversal
If we change the key column order of our index, so the column with a seekable predicate can go first, we can get faster query execution.
CREATE INDEX
p
ON dbo.Posts
(Score, CommunityOwnedDate)
WITH
(SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE, DROP_EXISTING = ON);
GO
Now you don’t have to wait 6-never months for your vendor to fix their garbage code.
i could be happy
In this case, changing the order of key columns was a more beneficial arrangement for this particular query.
Results may vary. Not query results! I mean like, if you go do this in your database. On “your” hardware. In the “cloud”.
Thanks for reading!
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.
Creating Uncacheable Stored Procedures In SQL Server
Here’s the demo script from today’s video:
/*Create a table if you need to*/
CREATE TABLE
dbo.DinnerPlans
(
id bigint IDENTITY,
name nvarchar(40) NOT NULL,
seat_number tinyint NULL,
is_free bit NOT NULL,
);
GO
/*First example, with an object that doesn't exist*/
CREATE OR ALTER PROCEDURE
dbo.i_live
(
@decider bit = NULL
)
AS
BEGIN
SET NOCOUNT, XACT_ABORT ON;
IF @decider = 'true'
BEGIN
SELECT
dp.*
FROM dbo.DinnerPlans AS dp;
END;
IF @decider = 'false'
BEGIN
SELECT
whatever.*
FROM dbo.AnObjectThatDoesntEvenPretendToExist AS whatever;
--SELECT result = 'false!';
END;
IF @decider IS NULL
BEGIN
SELECT
result =
'please make a decision.'
END;
END;
GO
/*Say goodbye!*/
DBCC FREEPROCCACHE;
/*This runs without an error*/
EXEC dbo.i_live
@decider = 'false';
/*But there's no query plan!*/
SELECT
object_name =
OBJECT_NAME(deps.object_id, deps.database_id),
deps.type_desc,
deps.last_execution_time,
deps.execution_count,
dest.text,
query_plan =
TRY_CAST(detqp.query_plan AS xml)
FROM sys.dm_exec_procedure_stats AS deps
OUTER APPLY sys.dm_exec_sql_text(deps.plan_handle) AS dest
OUTER APPLY sys.dm_exec_text_query_plan(deps.plan_handle, 0, -1) AS detqp;
GO
Thanks for watching!
Video Summary
In this video, I delve into a fascinating and somewhat unconventional topic: creating uncachable stored procedures in SQL Server. The goal is to demonstrate how to craft a store procedure that will never cache any query plans, effectively leaving no trace of its execution in the plan cache. This can be particularly useful for consultants or developers who need to run potentially suspicious queries without leaving evidence behind. I walk through the process step-by-step, creating a simple stored procedure with conditional logic that references non-existent tables and null values, ensuring SQL Server cannot cache any plans for it. Along the way, we explore why this happens and how it relates to deferred compilation and sensitive operations like decrypting asymmetric keys. Whether you find value in this niche technique or simply enjoy learning about SQL Server’s inner workings, I hope you found this video informative and engaging.
Full Transcript
Erik Darling here with Darling Data. And I know what you’re thinking. It’s two days and two YouTube videos in a row. And I’ve decided that perhaps I’ll endeavor to record a YouTube video every day as long as I have something reasonably recordable today. So, this one, this one is on the very border of reasonably recordable. And I actually find it patently ridiculous. And there are parts of me that wish this didn’t, this wasn’t a demo. Now, let’s start with what it actually is. And the idea here is to create uncashable store procedures. And by uncashable, I mean no query plan will ever be cached for this store procedure.
Right? And now, who would this really appeal to? Aside from like, you know, consultants, people who make monitoring tools, things like that, who don’t want to leave any evidence at how bad the DMV queries and stuff that they have to run are. So, here we go. Alright. So, what we have is, so, I already created this table. We don’t have to worry about this. This is already done. If I try to execute this, I’ll already, I’ll get an error message saying, already there, dum-dum. And I’ll say, yeah, I knew that. Thank you for the emotional support.
So, what we’re going to do is create a store procedure. And I’m going to just make sure that I’ve done things correctly because if I don’t do things correctly, then nothing works, right? Unlike government, I have to do things right for anything to happen. So, here we’re going to create a store procedure. And this is relatively simple, right? Make sure that the problem is, rather, the point is as simple and easily surmised as possible for those out there with short attention span.
And this store procedure right now just has one parameter called decider. It’s a bit. And we have three potential things that could happen when this store procedure runs. If the bit is true, then we’re going to select everything from this dinner plans table.
If the bit is false, then we’re going to select from this table that does not exist. Not anywhere in my database or any database ever. It does not exist anywhere. If I try to run this query on its own, I’m going to get an error that says invalid object name, blah, blah, blah, blah, blah.
You see what I mean? And then if decider is null, we’ll say, please, please make a decision. Kindly support yourselves making a decision.
All right. So, we have that. I’m just going to make sure that I actually hit, I actually executed that the way it should be. And what I’m going to show you is when I clear up the plan cache and I run this for decider equals true, we get results back from my dinner plans table.
But then when I go look at the plan cache, well, there’s a bunch of other stuff in there. I should do that from clear up the plan cache. Now run this. Now run this.
And hopefully we don’t catch anything else stupid in there. But this is our store procedure right here that we just executed. And it has no query plan.
All right. No query plan in there. I can’t click on that. It’s not just a null query plan. It’s no query plan for that. Now let’s contrast that.
There is no comparison. We don’t contrast something like that. And let’s just flip this. So let’s say result equals false. All right. Let’s unquote that.
Or rather let’s quote that out, not unquote. Whatever. It’s going to be that kind of day. So let’s recreate this now with just this select result equals false in there. All right.
But we’re going to clear the plan cache and then run this. And we’re still going to get the same results back. But now when we go look in the plan cache, we have an execution plan where it says we selected from the dinner plans table. All right.
Now I’m going to show you two more things before I talk about why this thing was uncacheable in the first place. So let’s change it back to how it originally was. All right.
And let’s do this. And now let’s try two things. The first thing that I’m going to try to do is get an estimated execution plan for this store procedure. I’m going to hit control and L.
And what happens? When SQL Server tries to estimate a plan for this, you get an error saying there’s an invalid object name. When SQL Server tried to build an estimated plan for this, it has to go and explore every single branch.
It has to figure out what can we estimate might happen here. But since that table doesn’t exist, you get an error. It’s crazy.
The other thing, I mean, it’s not crazy. The table doesn’t exist. How could you possibly come up with an execution plan for a table that doesn’t exist? The other thing is if we change this to false and we actually try to execute it, we’ll get the same error. All right.
So something is clearly very funny about the plan caching behavior. Ooh, that stuff isn’t right. Something is clearly very funny about the plan caching behavior of store procedures that have objects in them that do not exist. And part of this is going to be what SQL Server dorks will refer to as deferred compilation.
So, or deferred resolution or something like that. Where, like, you’re allowed to create modules that depend on things in some cases, even if those things aren’t there yet. So, like, you can create a store procedure that executes another store procedure, but you’ll get a warning in there and say, hey, just heads up.
FYI, this store procedure depends on this store procedure, but this store procedure isn’t there yet. Right? So you have to, like, you can do that, and SQL Server will let you do that as long as you don’t actually try to execute that other store procedure.
So that’s really what’s happening here, but it’s really interesting to me that what that results in is that there’s no cacheable plan for the store procedure. Now, this has one other sort of, well, I’m sorry, rather, there’s sort of like one other circumstance where you might see stuff like this happen. And it’s something that Paul White blogged about a little while back, and I thought I had scrolled to where it was in there.
Oh, there we go. So the uncacheable batch, right? So in this case, so when Paul’s blog post, what he did was he used, like, security stuff, like, you know, sensitive password information.
So something like this, where opening asymmetric key and decrypting it, or something like this, where you’re creating an application role with a password, right? So SQL Server won’t cache stuff like that either, right? Rather, it won’t, like, cache, won’t go cache a query plan for that, because it’s just like, whoa, whoa, whoa.
Sensitive stuff in there. Don’t want that to happen, right? But this is really interesting, because you can do that without having to do any of that stuff, right? Without having to fiddle with weird logic and creating, maybe trying to create roles, or without having to, like, submit any, like, symmetric key stuff.
All you have to do is put in a branch that will never be explored, like this. You could make this any sort of weird thing, right? You could be like, if one plus one is less than two billion or something like that, or, I don’t know, whatever, you pick something, right?
Whatever you find to be fundamentally and provably untrue in your world, in your mind, you can put that condition into your stored procedure, in your stored procedure. And you can have that stored procedure explore, like, or rather, never explore a branch where an object that will never exist lives, like, as a queryable object. And guess what?
You will never cache a plan for that stored procedure. So, if you ever want to hide what you’re doing from the plan cache, this is a great way to do it. All right.
So, with that out of the way, thank you for watching. I hope you enjoyed yourselves. I hope you learned something. I mean, I don’t know how much value you could have learned from this, but if you’re out there writing stored procedures you don’t want anyone to know about, on top of, like, you know, adding recompile hints and stuff, it’s a pretty keen way of doing things.
If you enjoy this sort of hard-hitting, useful SQL Server content, feel free to give the video a like and pretty please subscribe to my channel so that you can get little ding, ding, ding notifications every time I teach you about something hopefully more useful than this. All right. I wonder if anyone wants to buy Beer Gut magazine.
Getting desperate here. Anyway, thank you for watching and I will see you in the next video whenever that may be. Maybe later today, maybe tomorrow.
See how the day finds me. Anyway, thank you for watching.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.