A Little About Skewed Data and Skewed Parallelism

A Little About Skewed Data and Skewed Parallelism


Video Summary

In this video, I delve into a fascinating SQL Server performance issue: how skewed data can lead to skewed parallelism, which in turn can significantly impact query performance by sacrificing CPU resources to reduce wall clock time. I walk through an example using the Stack Overflow database, where I artificially skewed the `votes` table and ran a query that highlighted this problem. By examining the query plan and analyzing various metrics such as CPU time, elapsed time, and memory grants, we can identify when parallelism is ineffective due to skewed data distribution. This video serves as an excellent introduction to recognizing and understanding this common performance bottleneck, setting the stage for future content where I will explore solutions to address these issues.

Full Transcript

Erik Darling here with Darling Data. And in response to many miserable, wretched people complaining about the fact that I need to make a living and owing to the fact that I needed to edit some slides. Anyway, I’ve redone my intro deck. So we’re gonna experiment a little bit with that today. In this video, we’re going to talk about how skewed data can lead to skewed parallelism. And skewed parallelism, of course, means where many rows end up on either one or a small number of threads. And kind of undo all of the, you know, like potential speed improvements by sacrificing CPU in order to reduce wall clock time. They can happen because of that. And how you can find it. I’m not gonna talk about how to fix it in this one, because I’ve got something else coming up for that. But this is a good way to introduce the problem and to help you out. identify it in a few different ways. So if you look at the video description, you will see all sorts of helpful links. You can hire me for consulting by my training, of course, at a discount, because I do care about the people who who buy my training and I’m not going broke when they do so. Becoming a channel member to support my efforts to bring you this high quality SQL Server content. Ask me questions on office hours. And of course, if you if you if you want to get a like this stuff, please do like subscribe and tell a friend and all that good stuff. I just finished Pass on Tour New York City. So that that one is off the list. I still have Pass on Tour Dallas and Pass on Tour in Utrecht, where I have half day pre cons and regular sessions at both. And of course, the Pass Data Community Summit coming up in Seattle November 17th to 21st, where Kendra Little and I are tag teaming two days of T SQL pre cons. But with all that out of the way, let’s let’s party, let’s have ourselves a little database party here. Now, let’s go over to SQL Server Management Studio. What I’m going to do a little bit is explain to you how I created this data. And what the query plan for with with the skewed parallelism looks like in some way that you can start ways that you can start figuring out if you’re having this problem. And we’ll get we’ll get to the solving it part later. Different video. So what I did was the votes table in the stack overflow database has the user ID column and the user ID column in the votes table is skewed and is rather is scrubbed, not skewed. We made it. I made it skewed is scrubbed in the public data releases because you know, they don’t mean, of course, you’re supposed to be able to vote in anonymity.

On stack overflow, no one knows who you upvoted downvoted, you’re free to divulge that information if you’d like. But maybe the nice folks at stack overflow don’t just want to go publicly tell everyone what you did. Now, what I did after I selected everything from the votes table into this new table is I updated that table to replace the null user IDs. There are a couple vote types for bounties. Bounties are when you like want to give people additional reputation coming from your reputation for having a really good answer or something. But so there are some of them populated. So what I did was I set all of the null user IDs, which is the majority of the table, equal to the vote type ID that is in the vote type ID column. And there are not too many vote type IDs are mostly upvotes and downvotes.

There are a few other ones, but they’re not terribly interesting, nor do they comprise much data in the table. Of course, then we had to create a clustered primary key on the table so that we were not dealing with some terrible heapness, which you can blame things on. And then, of course, I needed to create an index on the table to support the query that we’re about to run or rather that I ran ahead of time.

Why did I run it ahead of time? You might ask. And I might I might tell you if you were to ask me such a thing because it runs for a little bit. And what this query does is it selects from the users table because users table is kind of where most queries start. You want to figure out what user stuff belongs to who did what.

And then we are going to outer reply this votes skewed table down here correlated on that new on that newly populated user ID column. And then we are saying where vote type ID is between one and four. This will this is going to be part of what gives us the skew because the vote type ID, of course, is ends up as the user ID for these because these are these are these are the ones that were null before.

So this is where things kind of get fun. So if we look at the query plan for this, right, we will see that we indeed have all of the hallmarks of a parallel execution plan. And we have parallel exchanges.

Well, actually, we have one parallel exchange. OK, singular. We have one parallel exchange. We also have all sorts of little parallel indicators across all of these wonderful operators telling us that SQL Server used multiple CPUs to run parts of the query. But with all those one up here, too, let’s go get that.

Oh, let’s let’s be completists over here. Now, this one here, this is this one doesn’t count because this one this one had this sort had a spill. The spill doesn’t really matter all that much for this, but it is actually kind of interesting because in a way that we’ll talk about. So if we start up here in this part of the query plan and let’s make this a little bit bigger so we can see what’s going on here.

And we expand this actual number of rows thing, node, whatever you want to call it. And we zoom in. You’ll see that the each each of these eight threads in here did a fairly even amount of work.

Right. It’s like you might, you know, pick a few bones about some of these. But and for most of this, it’s fine. Right.

So this part of the starting part of the query plan, the scan of the users table, we do not have any problems there. And when we go into the sort and I don’t know why that decided to do that to me. But if we look at the sort operator, we’ll see that parallelism remains pretty even across all of these threads here, too.

We didn’t we didn’t really lose anything here. And when we go into the nested loops join, things are still OK here. We did.

We’re still doing all right. But now this is where things start to get a little wonky on us. And this is this is why this this lower branch of the plan kind of, you know, is where we start having issues is if we go to the clustered index C here, we will see that all of the rows ended up on a single thread.

The reason why this happens with skewed data is because SQL Server uses this thing called the parallel page supplier in order to hand rows out to threads. Part of how it decides which go which rows go on which threads is by using sort of a modulo hash function to say, oh, if well, if this number divides by this, then we send you to this thread. In this case, all of the rows just happened to hash out to thread one.

And this is going to be true when we go across. Now, I said that the sort didn’t really matter, but there is something interesting about it. And the interesting thing about it is, of course, because all query plans start as serial execution plans and parallel plan exploration isn’t started unless the subtree cost of the serial plan is higher than the cost threshold for parallelism.

That means all of the stuff that SQL Server sort of initially assigns to a query, one of those stuffs being memory grants, are decided before a parallel plan is explored and aren’t changed if a parallel plan is chosen. Now, if you have a DOP 8 query, the memory grant is not then multiplied by DOP. The memory grant is divided by DOP.

So the memory grant for this query, which is, if we hover over here, the memory grant for this query is 359 megs, which isn’t big by any stretch of the imagination. But this number is divided by 8 for us because we have this query running at DOP 8, which means that our sort operator asked for enough memory to sort all that data. But look what happened.

One thread, because we had one thread that had any rows on it, entirely spilled because that one thread had one eighth of the memory. If this one thread had all 359 megs, maybe it wouldn’t have spilled. But since it had 358 or 9 divided by 8, so this thread had like one eighth of the memory grant, it didn’t get nearly enough.

So it spilled a whole bunch. Now, this was a particularly long running spill, right? So we had 4.6 seconds spent, or let’s just call it 4.7 seconds spent here.

And then we have 45 seconds here minus that 4.7. So let’s just call this 40 seconds of this one thread spilling out to disk. So this did cause a performance problem for us because this one like memory requiring operator got one eighth of the memory.

Rather, the thread that we wanted, the memory was spread across all eight threads. So each thread got one eighth of the memory. And that one thread doing all the work means that one thread didn’t have nearly enough memory.

So this spilled out a whole bunch. Now, there are a couple ways where you might, you know, aside from looking at the rows on the per thread, like we did in the query plan, there are some other ways that you might look at this and decide that you have an issue.

So one thing that I always tell people to look out for when they run a parallel query, and it does not run as quickly as they would hope, is to look at the CPU and elapsed time. One very, very good indicator that parallelism was ineffective is if CPU time and elapsed time are very close together.

In a perfect world, let’s say we have a DOP8 query, then we would want to see CPU time be 8x elapsed time, right? We would want to see all eight threads do an even-ish number of work. And we would want to see all eight of those threads work cooperatively and produce an elapsed time, like a wall clock time, that is one-eighth of CPU time.

So in this case, we have 45 seconds of CPU time and 53 seconds of elapsed time. So we can kind of tell here that parallelism wasn’t very effective. From a weight stats perspective, this is where things get a little disappointing.

If we look at weight stats for this query, we’re not going to see any parallelism weights in the query plan that help us figure stuff out. We are going to see a whole bunch of this I.O. completion weight from the sort that spilled, but we’re not going to see the weight that really makes a difference here, which is CX Consumer.

Microsoft screens CX Consumer weights out of actual execution plans because some guy who works at Salesforce says they don’t matter. But in reality, they actually do for this sort of thing.

So if you’re running a query like this, or rather you see a parallel query running, and the CX Consumer weights are really high, and then you have other indicators like I’m showing you here, where the parallelism is obviously skewed, either looking at the rows per thread counter, stuff like that, or the CPU time and elapsed time are very close, then that’s a pretty good sign that your query has very skewed parallelism.

So these are things for you to watch out for when parallel queries are not perhaps running as quickly as you would like them to. And, you know, just to sort of finish things out, this is what the votes skewed table looks like as far as, you know, like distributions in here go.

So I suppose I should probably limit this to something a little bit more useful because we have where vote type ID is between 1 and 4 here. So let’s actually add that in down here.

Let’s not be silly and foolish with ourselves. Let’s make sure that we have just sort of a look at exactly what data we care about. This is how all the rows distributed, and, you know, because of whatever hash function SQL Server chose when it was doing the parallel page supplying, every single row ended up on one thread.

So that’s what happened there. So this is how skewed data can also lead to skewed parallelism. And can cause you performance problems. So thank you for watching.

I hope you enjoyed yourselves. I hope you learned something. And I will have a follow-up video on this one eventually about how to fix skewed parallelism for this query. So we have that to look forward to.

But tomorrow’s video is going to be sort of a different one where I’m going to dig into a different way of troubleshooting query performance that goes a little bit deeper than just looking at query plans and weight stats and stuff. Because we’re going to look at a query that actually doesn’t have anything helpful in that regard.

And we have to go deeper. And we have to look at some more stuff. So anyway, thank you for watching. And I’ll see you in tomorrow’s video, I hope.

I pray we all survive the night. Anyway, goodbye. Bye.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Learn T-SQL With Erik: Query Behavior Under Read Committed Row Versioning

Learn T-SQL With Erik: Query Behavior Under Read Committed Row Versioning


Video Summary

In this video, I delve into the nuances of optimistic isolation levels and row versioning in SQL Server, specifically focusing on how certain query structures can lead to unexpected results when using an optimistic isolation level like `READ COMMITTED SNAPSHOT`. By walking through a practical example involving seat assignments at a dinner table, I illustrate how queries that span multiple phases (select and update) can be problematic. This video is part of a comprehensive beginner content series from Darling Data, where all 23 hours of material are currently available for pre-sale at an exclusive price of $250. With advanced content set to launch after the summer, this offer will double in value, making it a fantastic opportunity to invest in your SQL Server knowledge now and into the future.

Full Transcript

Erik Darling here with Darling Data. And we are going to talk about, in this video, where an optimistic isolation level, a row versioning isolation level, may not be perfect because of the way you have written a query. All right. Now, there are all sorts of things in code that may attempt to enforce integrity of some sort. It may be referenced referential, it may be business rules. It might be in a trigger, it might be in a procedure, it might be in a function, who knows where you’ve hidden these things. And that code may not work the same way under a row versioning isolation level that it would under a locking isolation level, unless you add in locking query hints. Now, before we do that, all 23 hours of this fantastic beginner content is available to you to start watching right this instant. You push button kids. You don’t know how easy you have it. All for 250 bucks currently on the presale price. That’s good for the rest of your life. And that will double in value, at least, when the advanced material goes up after the summer. And you’ll wish that you had spent less now. So you should do that now. Anyway, let’s talk about this stuff. So I am going to create a table and fill it with a table and fill it with a little bit of data. And this is what the data in our table looks like. All right. So I’ve got some lucky fans in here. And what we notice immediately about this table is that there is a there is a seat at this table. Right? There’s an empty seat at this table, we have five lucky fans who are going to eat dinner with someone, but who will they eat dinner with? That’s the question.

Now, when I said, because of the way you’ve written the query, we need to talk about that. So in right now, in my Stack Overflow database, I have read committed the read committed snapshot isolation level turned off, it is disabled, you can see the off button right there. And this is the query that will be running. What we’re going to do is we’re going to declare a variable up here to have a name, right? Because we have to we have to give that seat to somebody in this case, it is going to be Eric. Right? Sounds good to me. I like dinner. And what we’re going to do is we are going to run a query to select the top one seat that is free.

Right? So we had that one free seat. And that that free that seat had is free equals one and had a null for the name, we don’t need to check that though, all we need to see that see is that the seat is free. And then based on this free seat, we are going to update the table. And we are going to set is free to zero and assign that seat to a lucky contestant. And what I’m going to do is essentially output the rows that get inserted. So it’s a little bit easier to see exactly what happened when the query ran.

And then, of course, that that CTE, sorry, that CTE gets joined to the table based on the ID that we find here. So under the normal circumstances, or rather under read committed locking, assuming that you don’t have no lock hints everywhere, because if you have no lock hints on these queries, you’re in, you’re not, you’re going to see weird results no matter what. But assuming that you are using read committed locking and you’re depending on that locking to, to give you the behavior that you want your query to have.

If we begin a transaction and we run this update, we will see that this, this lovely Eric right here acquired seat number one. And we have set his name here and we have said that you are, that seat is no longer free, which means that if we come over here and we try to do the same thing, right? We do this stuff and we try to run the same query.

This query is going to get blocked, but importantly, most importantly here, this query is going to get blocked here. This is where this query is going to get blocked because that other query has an exclusive lock to update the, update the, uh, the dinner plans table. It’s, it’s going to get blocked here.

The select gets blocked. So if I try to run this, it’s going to be blocked and we can, we can tell it’s blocked because it’s just sitting there spinning and nothing good is happening, right? We’re not getting any, uh, not getting anything out of this at all.

We are, we apparently have some sort of police emergency outside. Thank you. Thanks for keeping us safe. Uh, but we are completely blocked up here.

So if I come over and I commit this transaction, this will, uh, this will go, this will be committed now. The, the change that we made here and this query will find no free seats, right? So this thing right here, we got no rows back from the output clause.

And you might think, Hey, this is great. Read committed locking. Amazing. You know, this query was blocked for a while while we were talking about stuff and the, the, the, the, the federalities were outside doing something, who knows what.

Uh, but like this, this gives you, I guess, what would be a correct results. We did, we, we did not lose the update that the first query ran because this, and this query did not overwrite it because this query did not find a free row when this select query finished running. As soon as that lock released and this query was allowed to run, it was like, there’s no free seats.

Wah. Ah, that’s a bummer. This backwards Eric can’t possibly have dinner now. So let’s reset those tables. Right?

So we have set name back to null and we have set is free back to one. And that’s all for seat number one. Uh, and now we are going to come over here and we are going to set read committed snapshot on. This might take a second based on, cause we have the, the, the rollback immediate thing happening, which is always, always the best way to run your, uh, your changing database read committed state.

And, uh, let’s do this again. Right? Let’s, uh, let’s re, let’s rerun this.

So we are going to, uh, run this whole thing and this is going to complete immediately. Right? So, uh, we see that, uh, we see that forwards facing Eric has gotten this seat and this seat is no longer free. Now, since now that we’ve done this, what’s going to change is that when this update ran or like, well, I mean, this update is currently running.

We have not committed this update yet. Uh, this update is generating a version of this row. Right?

The row, like the row that we found here, this update has generated a version of this row where is free equals one. And the name, the name is still null. So if we come over here and run this, it’s still going to get blocked, but it’s going to get blocked in a different place now.

So before, when we ran this, this part of the query, this select got blocked. The select got blocked because of the update lock, but now this part of the query, this select can run and read the version of the row that got created when that update happened. What’s blocked here now is this update trying to change data in the table, actually trying to change that same row.

So if we come back over here and we commit this transaction and we come back over here, we’ll see that now backwards Eric has overwritten forward Eric. Okay. So because of this query got blocked in a different place, the select didn’t get blocked.

The update got blocked. The select found a row in the row versions and said, oh, cool. Look what I found. It’s a free seat.

And then updated the table to give that free seat to someone, even though that seat was taken fair and square by forwards facing Eric. So that’s not good. Right.

So this is not a situation that we want to run into. Now, to make things perfectly clear here, though, if we had just written the query like a sort of normal human being, that situation would not arise because there wouldn’t be two phases to the query. There wouldn’t be the select phase to locate a row to change and then an update to change that row.

If we just ran an update top one to set whatever seat to Eric and said is for equal zero, we wouldn’t have had that because the update would have done would have just blocked the other update and we wouldn’t lose it. If you have code that relies on locking for that sort of behavior and you want to switch to an optimistic or rather a row versioning isolation level, you may need to look at something different. You might need to do something.

You might need to change your queries. So if we come back over here and we reset this, well, this will go back to square one and we still have read committed snapshot isolation enabled and we are still running under the read committed isolation level. Right.

We set this here so that the query is free to use the row versions when it when it runs. And let’s do the same thing now. So let’s run this up to the commit transaction. Do that and now let’s come over here and just run the whole thing.

This query gets blocked again, but now it’s the select that gets blocked again and not just the update. So if we come over here and commit this transaction and we come back over here, this query will go back to not having found a row because now this select would have been blocked rather than just the update getting blocked because we added this read committed lock into it. The read committed lock in makes this select obey the locking semantics of read committed locking and not use row versions to go find this stuff.

This thing got blocked looking at the table. So if you if you are looking at switching to using a row version isolation level and you have maybe code that looks like this or you have code in triggers or other places that might like, you know, enforce some sort of integrity or business rules or something. You might need to start looking at that code to figure out if you need to add those read committed locks lock hints to it.

If all of your queries already have a bunch of no lock hints on them, you don’t need to do that. But you do need to get rid of the no lock hints. So your queries start using the row versions and stop doing dirty reads.

All right. I feel like this was a complete video. I feel like we have we have we have lived up to expectations here. All right.

We have we have completed our intended course of action. So thank you for watching. I hope you enjoyed yourselves. I hope you learned something and I will see you in the next video. Next. We are actually done talking about transactions and isolation levels after this, at least for the beginner content.

After this, we have a bunch of programmability stuff to talk about. So, of course, there’s there’s a lot in there. So, oh boy, more videos for you.

All right. You’re welcome. Anyway, that’s good here. Thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

SQL Server Performance Office Hours Episode 29

SQL Server Performance Office Hours Episode 29



To ask your questions, head over here.

In Brent’s post about maintenance he said updating statistics causes parameter sniffing. Should I stop doing that? It’s very confusing.
Hi Erik. You discussed index compression in a YouTube video about 10 months ago. I do not see a video on table compression. What is your professional opinion on table compression, and do you ever use it or recommend it to clients? Thank you sir.
Hi Erik, what are thoughts regarding AI and its long term impact on the dba world. I used Claude for the first time a month ago and I’m now considering retraining to be a plumber… Cheers, Oli the dba.
Which metric best reflects actual execution time – the “Last Total Elapsed Time” in the Messages tab via SET STATISTICS TIME ON? In production, a query ran slowly despite a low time shown in the execution plan (guessing low CPU time); what factors might cause this, and how can I find and resolve them? It wasn’t just slow in SSMS but the app too. So I guess we can rule out high Client time?
I’ve been watching your T-SQL bgeginner content and it’s really great. I thought I was more advanced but I’m learning a lot of surprising things. I can’t imagine whatt eh advanced content will look like. How so you remember all this stuff?

Video Summary

In this video, I dive into a series of Office Hours questions from viewers, providing insights and advice on topics ranging from parameter sniffing and statistics maintenance to table compression and the long-term impact of AI on the DBA world. Whether you’re dealing with performance issues or curious about future trends in technology, there’s something for everyone. I also share my thoughts on the T-SQL beginner content that has been well-received by many, emphasizing the importance of continuous learning and practice in the tech field. If you have your own questions or want to support the channel, make sure to check out the links provided in the video description.

Full Transcript

Erik Darling here with Darling Data, and you guessed it, you smart devil you. We are going to do an Office Hours episode. Why? Because I have five questions to answer. So I’m going to answer them. If you want to ask your own question, there’s a link right there. It’s going to look very much like a link down in the video description where you can go ask your very own questions. It’s all very nice and streamlined for you. If you want to support the channel, there’s also a link down there to do that. It’s just a nice way to say thank you for the endless hours of content that I produce. If you’re into that sort of thing. If you’re into the whole gratitude trip. If you need consulting help, and boy, do you need consulting help. I’ve seen your servers. They’re messes. I am available for all of these things and more. And as always, my rates are reasonable. If you want to get your hands on my performance tuning training, you can get all 24 hours of it for about 150 US buckaroos. That will last you for the rest of the day. The rest of your life. Another link down in the video description to do all that with. And if you want to get in on my T-SQL course, while it is still at the pre-sale price of $250, you can. 23 hours of beginner content is out and available. And this will go up to $500 when the advanced material drops after the summer. And so please buy it now before you send an email to me saying, hey, can I get it for the old price? And I say, no, no, you can’t. Anyway, the nice folks at Redgate, they’re making me leave the house this summer. So August, September and October, I will be in New York, Dallas and Utrecht with the Pass on Tour extravaganza, all leading up to the Pass Data Community Summit taking place in Seattle November 17th to 21st.

And now let’s go answer some office hours questions here. Let’s zoom this in to a reasonable degree so that everyone can see exactly what’s happening here. All right. In Brent’s post about maintenance, he said, updating statistics causes parameter sniffing. Should I stop doing that? It’s very confusing. I know the post you’re talking about. I’m not sure that I would have ended the sentence so abruptly about that. But, you know, to each their own. You can’t. It’s hard to write hard to write a whole book in every post. But, you know, when when when you look at Brent, one of the first things you notice is that he is a fitness enthusiast.

What what what may not be obvious is that his fitness icon is a lady named Susan Powder, who is big in the like, I guess, the 80s and 90s. And her catchphrase was stop the insanity. And so I would what I presume was meant by that is if you are running nightly statistics maintenance and you come in in the morning and you are facing some sort of server meltdown because all your queries got bad, different, weird plans that maybe you should examine the way that you are updating statistics nightly. And perhaps rein that in a little bit, perhaps change your technique a bit. I can’t tell you exactly what to do because I don’t know if you are facing these things.

But, you know, when you update statistics, you, of course, invalidate a lot of execution plans. The execution plans become uncashed or however you want to put it. And then SQL Server compiles a new plan based on whatever comes in next. So, you know, you can either take that to is, you know, if you are facing that, if you if you are facing that issue, then you might consider some things like figuring out which queries are susceptible to that.

And you can either maybe, you know, not not not do the statistics maintenance for tables involved in those queries. You could also, you know, you know, use a plan guide or force a plan with query store that doesn’t result in your server melting down. You could, you know, like, you know, after maintenance, you could run a run a store procedure or query with the with parameters that get you the execution plan that you want.

There are there are there are lots of ways to do that that don’t involve stopping updating statistics wholesale, because if you want to stop that, you would actually have to look at, you know, turning auto update stats off and, you know, some other stuff, which could also cause parameter sniffing. And, you know, it’s just my general feeling and my general experience is that when the statistics that are a good representation of the data in your tables and indexes is a good thing for SQL servers query optimizer. So, you know, it’s a little tough to balance that with not updating statistics.

So, I don’t know, perhaps there will be a clarifying post at some point in the future. But sorry, you were confused by something someone else said. Anyway, hi, Eric.

Hi, you discussed index compression in a YouTube video about 10 months ago. I do not see a video on table compression. What is your professional opinion on table compression? And do you ever use it or recommend it to clients?

Thank you, sir. Sure. So, when you say table compression, I assume you mean that you are talking about applying compression to a heap, since index compression would involve clustered and nonclustered indexes. Both, since they are indexes.

And a clustered index makes for a clustered table, not a copy of the table, but the table. So, the only thing left is a heap. That is a table without a clustered index. And the answer there is no.

And the reason why is because index compression, you know, so, like, just remembering the details of this, for an index, when you apply compression, assuming page compression, then as rows get added in, row compression will be applied. And then when a page is full, then page compression will be applied. That could be, it’s a little foggy for me because I haven’t looked at that in a long time.

But the thing with heaps is that data is not compressed on insert to a heap. You have to rebuild the heap in order to apply it. So, usually when I’m talking about heaps with clients, it’s not should I add page compression to this heap.

It’s should this table be a heap? Check yes or check no. If no, then depending on the size of the heap, perhaps we should explore page compression for when we decide on and add a clustered index to that table to make it a clustered table and take away its heapness.

So, there we go. All right. All right.

Oh, another hi, Eric. Look at these polite people. Hi, Eric. Hi, it’s me. Oh, that was not a good square. That was covering up all sorts of stuff. All right. What are your thoughts regarding AI and its long-term impact on the DBA world? I used Claude for the first time a month ago.

Jeez. Late to the game. And I’m now considering retraining to be a plumber. Cheers, Ali the DBA. Well, maybe we should start calling you Ali the plumber. So, I’ve talked about this stuff a couple of times in the past.

And, you know, again, with the AI LLM thing, you don’t really want to watch where the ball is. You want to watch where the ball is going. And so, like, right now, if you are a, you know, moderate to highly skilled DBA, AI is not going to replace you immediately. Depending on the trajectory of AI, because there are all sorts of variables and factors that are going to play into the trajectory, it could get very good and end up being enough to replace moderate to highly skilled people.

Or other factors may kick in and it may be less of an impact. So, like, right now, the things that a lot of AI people worry about is, like, you know, like, just as far as, like, you know, like, things progressing with it. You know, because, like, LLMs, like, if they, like, on the current trajectory, will just keep getting infinitely better and people will stay about the same.

Like, what a lot of people are just like, well, you know, it’s, like, high energy expenditure. It costs a lot of money to do all this stuff and there’s not a lot of, like, return on investment for it. So, like, there are things that could make AI, like, stop just infinitely getting better and kind of, like, crater it off.

But then there also might be some things that happen that improve either the energy, you know, cost, things like that and make it so that there is some return on investment for it. But I think the biggest problem that people are facing right now is that there are a lot of idiots in the world and a lot of idiots in the world are in charge of people. They are managers.

They are executives. They are C-levels. And they are not very highly technical people. And they think that AI and LLMs right now can replace people. And I’m sure that there are some replaceable people in the world.

To some degree, everyone’s replaceable. But I think the biggest thing right now is they don’t realize how, like, sort of not good, like, LLMs are, like, at, like, the deeper level thought. And I know that because when I try to ask an LLM to do deep research, it ends up quoting my blog post back to me.

And I’m like, there has to be something better. Like, there has to be, like, better out there. Like, why are you coming back to me with my own blog posts?

So, you know, depends on a lot of stuff. It could, of course, you know, be a real nightmare for people. And it, of course, could just end up being, you know, sort of a, I don’t know, like, slightly better Siri or something.

I don’t know. So, you know, it does depend a bit on a lot of things. But really, the worst part about it is the higher-ups thinking that it’s way more capable than it is.

It’s, you know, because they, like, they’re under a lot of pressure to use it and keep up with, like, technology trends and stuff. And, you know, like, I try, like, I use AI, or I try to use AI for a lot of things. And I just find it severely wanting for most of the things that I try to do with it.

Anyway, let’s see what we got here. Which metric best reflects actual execution time? The last total elapsed time in the messages tab via set statistics time on?

Question mark. And that went on? In production, a query ran slowly, despite a low time shown in the execution plan, guessing low CPU time.

No. What factors might cause this, and how can I find and resolve them? It wasn’t just slow in SSMS, but the app, too.

So I guess we can rule out high client time. Well, I don’t, I don’t know. It depends on how many rows your query was returning. But I don’t, I don’t, I don’t, I don’t really use set statistics time on anymore.

Now that query plans have, like, the operator times and stuff in them, I just, I don’t see much of the point in it. You can get all that information from various parts of the query plan without having to switch over to the messages tab and look at a bunch of nonsense. But, you know, easy ways to figure that out.

So assuming that you are looking at the right thing in the actual execution plan, sometimes looking at the last operator is not a great way of figuring out. If, if, what, what, how long a query took, because if it was running in either mixed row and batch mode or just all in batch mode, then there could have been some longer running operators off to the side that you didn’t see.

So if you right click on the root of any query plan, any actual execution plan, rather, there will be an actual time statistics thing in the properties window. And that will tell you the CPU and elapsed time of the query.

And that, that, that’s also a good place to look because if you are returning a lot of rows, that will, of course, also include the time that it took for SSMS to receive and render the results. And that can be, that can take a long time.

It might be very, very easy thing for you to test just dumping the results into a temp table, assuming that it’s not like millions and billions of rows. And just see if the elapsed time reduces because you’re sort of discarding the results. You could also try discarding the results via SSMS or assigning columns to variables or something.

But that can have an impact on the query plan overall. So I don’t usually recommend that. But, you know, you know, I wish you had shared the plan or something with me, because then I could give you a more definite answer.

But like I said, I don’t really use set statistics time anymore. I generally rely on what’s in the execution plan. And what you see for the operators and actual execution plans is, of course, wall clock time, not CPU. For CPU, you have to look at the operator properties to determine that.

All right. Last question here. I’ve been watching your T-SQL beginner content, and it’s really great.

Well, thank you. What a sweetie you are. These polite people with their hierics and their compliments. I thought I was more advanced, but I’m learning a lot of surprising things.

Well, surprises abound with T-SQL. It’s actually what the S in SQL is. Surprise. Surprise.

I can’t imagine what the advanced content will look like. How do you remember all this stuff? Well, most of the time I don’t. Like a lot of people who work in tech, I have no internal dialogue or biographical memories.

So I write a lot of stuff down. And that’s generally how I am able to remind myself of things if perhaps I have forgotten something. But, you know, a lot of it is just the fact that I do it all day, every day.

And, you know, the more you do stuff all day, every day, the more you remember, kind of. The more it kind of just sticks with you becomes kind of muscle memory with things. And it becomes harder to forget.

Or I don’t know. I don’t want to say it becomes harder to forget because, man, there’s always a way to forget. But, you know, it does become a bit easier to have these things in the front of your mind rather than having them fly out of your ears. Anyway, that’s five questions.

Thank you for watching. I hope you enjoyed yourselves. I hope you learned something. And I will see you in the next video tomorrow. All right. Good night.

I love you.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Learn T-SQL With Erik: Read Committed, Not So Committed

Learn T-SQL With Erik: Read Committed, Not So Committed


Video Summary

In this video, I delve into a scenario where a query running under the read committed locking isolation level in SQL Server appears to return impossible results, challenging common misconceptions about read committed. I explain that read committed only ensures that your query reads data as it was at the time of taking shared locks, not that other queries cannot modify this data before or after your query runs. Through a detailed example involving a query to find top-scoring posts and their answers on Stack Overflow, I demonstrate how concurrent updates can cause unexpected results due to the timing of lock acquisitions during execution. This video is part of my ongoing effort to clarify complex SQL Server concepts for beginners and experienced users alike.

Full Transcript

Erik Darling here with Darling Data. You’re welcome. In this video, we’re going to continue. We’re going to look at a situation where a query running under the read committed locking isolation level appears to return impossible results to us, the end user. And we’re going to do this because some people still have very, very strange misgivings and misunderstandings about read committed locking isolation. And what exactly read committed means. When you get down to it, read committed only means that your query at the time that it attempted to take the shared locks that it requires in order to read rows and pages and whatnot, the data at the time that it was read was what was committed in the database. That does not mean that other queries cannot interact with that data and change that data before and after your query has been submitted. And that’s the data has released those shared locks. All 23 hours of the beginner content is available to you. Currently, there’s a link down in the video description for you to buy it. The presale price is still 250 bucks with the value doubling, skyrocketing to $500 after the summer months when my brain has officially stopped sweating and we can start thinking again. So anyway, let’s take a look at this query. Now, the goal here is to find the top score of the query.

So we’re going to find the top scoring post that is a post type ID to his questions. And what we want to find are all the other answers. We want to find all the answers for the question that has the top scoring answer to it. So we’re selecting star from the post table. We’re going to have an index hint in here just for demo stability. And we’re saying where the parent ID of whatever we find, right, or the parent ID of the things that we want to find equals the parent ID for the highest scoring answer to a question. So all the other answers to the question essentially. Now, because we have a sub query in this query and this, right, we have the post table referenced once here and once here.

This really means that the post table is going to be read from twice. Under locking read committed, that means that there are, I mean, many opportunities for reads to happen at all different points in time because the shared read locks are going to happen for one reference to the post table, right? And we’re going to take those and quickly release them. And then we are going to join one reference to the post table to another reference to the post table on the parent ID column. And that’s going to be another opportunity for shared read locks to slowly be taken and released as they get the data that they need.

So let’s look at the execution plan for this thing. I’ve already run it ahead of time. The first thing I actually want to do though, is show you over here that when we run this unfettered without anything happening around it, without anything else interacting with the data, all the results that get returned over here have a post type ID of two.

Okay. Everything, everything next to me is a two. This whole column is all twos because that’s what we were looking for here. So if you, your, if your post has a parent ID, that means it was an answer to a question. If your post does not have a parent ID, that means it was the question, right?

Questions belong to answer. If your post, let’s try that again. If your post has a parent ID, that means it was an answer. So it was like, it’s like, it’s like the parent ID is the question ID that ties your answer to the question.

There we go. So the execution plan for this features two things. Well, I mean, I guess we have three opportunities in here to read strange data from the post table, don’t we? We have a clustered index scan here and this takes about five seconds.

So a single thread scans the clustered index and gets all the rows. The next thing we do is sort all of those rows. After we sort all of those rows, which we get, we have one of, there’s one row that comes out of this because we find the top one scoring answer to a question here.

Right. So we do that. We scan things, we find the top scoring thing, and then we join what comes out of this to a nonclustered index on the post table. And we find the nine, in this case, there are nine other answers to the question that we found with the top score here. So we go find the nine rows here that we care about.

And then over here, we touch the post table a third time because we need to do a lookup in order to get the other columns that we need to satisfy the presentation section of the query. So we have this whole output list here, right? This is all things that we’re going to do. We’re going back to the clustered index to get these columns after we’ve done a seek to find the parent IDs that we care about here.

So there are really three opportunities to read data from the post table. And they all happen at slightly different times. Like because there’s a sort, all the 11 million rows that we need to find in the post table, right?

We have to get all those rows to the sort operator. Then the sort can happen. Then we do a nested loops join to the lookup.

And then we have another sort here, which means all the rows have to show up here. And then we can start doing a lookup here. So really, there’s like one, two, three different points in time that we’re reading data from anyway. And during these three different points in time, all sorts of other things can interact with the data underneath you.

So let’s do this. Let’s go over what we’re going to do in the other windows. Let me scroll down here first.

So this is the one that we’re going to run when we start doing things. I’m just going to tuck this down here so it’s a little tidier on the screen in case I need to zoom in again. Over here, what we’re going to do is we are going to, this is the last row in the post table.

This is the highest post ID. And what we’re going to do is start a transaction and update the post table. And all we’re doing is adding one millisecond to the last activity date for this last row in the post table.

Right? There are no rows after this. This is the highest assigned ID. The ID is the clustered primary key.

So this is the end of the table. And then what we’re going to do is run this. So let’s pretend that this is all happening very quickly. Like in the database, it’s not me running things in separate tabs. This is a database with a fair amount of concurrency.

Right? And so we have things running at all different times. And those things need to be able to run and do things. And we don’t want to like have like a train schedule for all these queries where it’s like, you go. All right.

You go. Update. Yep. You go now. Right? Like that’s not happening. These queries are all just getting sent in and run and trying to do their thing while other queries are getting sent in and trying to do their thing. So let’s begin this transaction and run this update.

And now let’s run this query. Now, this query took about five seconds before. This query will not take five seconds this time.

Right now, this query is blocked by that update. Right? Because like one of the things that we have to do in this query plan is scan. We have to scan.

We have to touch the clustered index twice. We have like the first thing that we did was scan through the clustered index and then sort all that data. Right now, this query has read everything in the clustered index except that last row because that last row has a lock on it. Right?

So everything got read except that last row. So this select query running under recommitted locking is blocked from taking a shared lock on that last row by the update in the other window. So while this update is holding this lock open, let’s come over here.

And what we’re going to do is we are going to update a very coincidental nine rows, nine or ten anyway. And we’re going to set their post type ID to one. Coincidentally, these are the other rows that we need to, or rather the other rows that get returned by our query is the other answers to the question that have, that belong, they’ll belong to the question with the highest scoring answer and all of stack overflow.

This query runs successfully. Now, if we come over here and we roll this back, we don’t have to commit this. We can just say, oops, I didn’t mean to add it.

I didn’t mean to add that millisecond there. We can roll this back. And as soon as we roll this back, we get results immediately. If we look at the execution plan, well, things look a little bit different. This scan of the cluster index now took one minute and 28 seconds.

It took a minute and 28 seconds because it was like the five seconds that it normally ran for, plus the like 25 or so seconds that I was trying to explain things to you. What’s neat now is if we go to the properties of the select and we go to the weight stats, the top weight that we’re going to see is LCKMS. Right.

So we can see that this query waited a long time on a lock. That lock, that shared lock, LCKMS, it’s trying to take a shared lock, was blocked by the update query that started a transaction and said, go outside and smoke a cigarette. I don’t know.

Maybe grab a beer for lunch. I don’t know. We’ll see what happens. There’s a halal cart. I don’t know. Get crazy out there. But what’s very interesting now is if we look at the results of this query, even though we specifically requested that only things with a matching parent ID, right?

So only, only, again, only answers have a parent ID. That’s the questions don’t have a parent ID, right? And post type ID is answers and post type, post type ID two is answers and post type ID one is questions. Even though this is what we asked for.

And even though the first time we ran this query, we got all post type ID two back. If we look over here now, magically, mystically, all of these post type IDs were one, which seems impossible. Because a post type ID of one, it cannot happen for an answer.

Post type ID two is answer. We looked for other answers and we found them. They all have the same parent ID too. We were not, oh boy, oh boy, oh boy.

Questions do not have a parent ID. Questions are the parent. So something terrible has happened here. And this sort of stuff can happen all the time under any sort of application concurrency using locking read committed. So be careful out there.

Before you go and you start thinking that read committed locking is some sort of snapshot point in time, amazing, consistent, confident. Doesn’t it sound confident, right? Read committed.

What do you read? I read committed data. I don’t read uncommitted data. I’m committed data only. It sounds like an LLM level of confidence from read committed, but we do not get that. Read committed has very weak guarantees.

Those guarantees do not coincide with the start of a statement or the start of a transaction. They are just a mishmash of various points in time when your query was allowed to take shared locks. So again, be careful out there.

Your queries might be returning weird stuff all the time. This isn’t even NOLOC, which is the surprising thing. Like you would expect this sort of thing using NOLOC or read uncommitted, but not read committed.

It’s committed to reading committed data. Well, maybe not so much. Anyway, thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Learn T-SQL With Erik: Reads Deadlocking Writes Under Read Committed

Learn T-SQL With Erik: Reads Deadlocking Writes Under Read Committed


Video Summary

In this video, I delve into the intricacies of SQL Server isolation levels, focusing primarily on the notorious “read committed” level and its potential pitfalls. Specifically, we explore a scenario where a read query deadlocks with a write query due to their interaction with indexes. By walking through these examples, I aim to provide you with a deeper understanding of how read committed can lead to unexpected issues like deadlocks, especially when multiple nonclustered indexes are involved. I also discuss potential solutions and highlight the benefits of using optimistic isolation levels such as Read Committed Snapshot Isolation (RCSI) or snapshot isolation, which offer more robust protection against these types of problems without requiring extensive query or index modifications.

Full Transcript

Erik Darling here with Darling Data, continuing on to talk about isolation levels, mainly focusing on what a turd of an isolation level read committed is. And in this video, we’ll be focused in on seeing a read query deadlock with a write query. There will be very similar circumstances to the previous demo, just with a little bit more involved because it is a deadlock, and not just blocking. Again, if you’ve been watching my videos and you know, seeing stuff that I do here, you may have seen this demo before, but you know, we’re gonna go into a bit of detail on it. So if you’ve perhaps forgotten or never knew in the first place, or perhaps you knew about this, didn’t you? It may be good for a refresher. All the same stuff over here, 23 hours of the beginner content is out there and available for you. The course is still at the presale price of $200,000. And that price will be going up to $250,000. And that price will be going up to $500,000 after the summer, when the advanced material is complete. So with that out of the way, let’s look at this situation. Now, what we have going on here is, well, we have a select query. And this select query is going to do a couple neat things.

It’s going to start a while loop. And while the I local variable is less than 10, 100,000, we are going to assign the post ID to this one. And we are also going to thank you, zoom it. We are also going to increment the I parameter in here, we have an index hint for a little bit of demo stability here, no big no nothing, nothing too interesting. And then over in this window here, we have an update query. And the update query is going to do something rather similar, except what it is going to do is basically flip this value for vote type ID back and forth between eight and zero. So we’re saying vote type ID equals set vote type ID equals eight minus vote type ID. So this will either be eight or zero because the vote type ID for this is currently eight. Let’s see, let’s see, let’s see, do we get an estimated plan for this one we do. So if you remember the last video we did on read queries blocking right queries, it’s going to be a sort of similar situation here.

We have an index seek that finds the rows that we care about. And since SQL Server estimates that that more than 25 rows are going to leave this index seek, SQL Server chooses to use the unordered prefetch optimization here. And it does not in this time around opt to choose to use an optimized nested loops, just the unordered prefetch portion. But same deal as before, because we’re using the unordered prefetch every time a row comes out of here and goes into the loop, we come down here to go get a column out of the clustered index.

And for SQL Server will hold on to locks the entire time that we are doing the lookup. The lookup in this case is just to output the column post ID. Oh, I guess that’s the same thing as before. And again, because the prefetch optimization is in play, the locks on the clustered index will be held until this statement completes.

Now, over in the update window, this is where things get a little bit different. Now, if you recall the last one, the last demo on just the blocking, it was only the clustered index that was getting updated. This time, in order for there to be a deadlock, we also need to have a second nonclustered index get updated.

This is what causes the sort of deadlock embrace scenario rather than just a plain blocking scenario. Because now we have two queries that one is holding the object level shared lock. One is trying to update both the clustered index on the table and the nonclustered index on the table.

So we have this sort of circular dependency going on, which causes the deadlock this time around. So if we look at this, you’ll see that we are updating the primary key on the votes table. And we are also updating this other, I mean, we’re updating actually two other nonclustered indexes here.

But we only need one of these nonclustered indexes in order for this to happen. So what I’m going to do is I’m going to start running this and start running this. And then I’m going to hit SP who is active.

And we missed it that time around. There we go. We got it.

Sweet. So here is what this looks like. Now let’s kill this one off. That got a little jumbled, didn’t it? These SSMS tabs get all weird on me when one of them starts, one of them, you hit one and it starts saying executing. And then all of a sudden that tab gets bigger and switching tabs around gets somehow confusing.

But very professional. So here we have our update query right here. And it is trying to take an intent exclusive lock here.

And here we have our select query. And it is attempting to take that object level shared lock here. And they are blocking each other.

You’ll note that session ID 68 is here. And session ID 61 is here. And if we move the blocking session stuff over a little bit closer so we can frame this all up at once, you’ll see that session 61 is blocked by 68.

That’s right there. And session 68 is blocked by 61. Now, of course, the choice of deadlock here is very easy.

SQL Server will always choose the select query as the deadlock victim. That’s what the error message that we get here is. And it does that because the select query is very easy to roll back.

The select query hasn’t done any writes. There’s no log. There’s no version store. There’s no accelerated database recovery persistent version store, rather. There’s no work to do to kill this query off, aside from to say, pow, you’re dead.

But, like, this query over here just runs and does the update until I cancel it. So, like, the update never loses the deadlock battle. It’s only ever the select that loses it.

But we can also take a look at the locks that want to happen here. And you can see for the… This was for the update, right?

Yeah, the update. The update query was granted a whole bunch of locks, right? Intent exclusive. It was granted an X lock here. It was granted an X lock here.

But then there was a couple down here that… No, sorry. There’s just one over here that it is waiting on, right? And it is waiting on this intent exclusive lock on the vote type ID index. And then for the locks for the select query, this one has been granted a few different types of locks, right?

Like, all these shared and intent shared locks have happened there. But then this one shared lock on the clustered primary key, right, where the lookup is happening to, this one is being forced to wait.

So this one is waiting because of the modification query. And this one is waiting because of the select query. And neither one can make progress, right? So they are just stuck waiting for each other.

And that’s why the deadlock occurs. So again, sort of the same situation here. We could, of course, extend the non-cluster… Sorry, we could, in this case, we could extend the nonclustered index so that there is no lookup in the select query.

That would be one way of taking care of this. Another way of taking care of this would be to, of course, use the no-lock hint, which I am, you know, pretty avidly against doing.

But, you know, like I said in the last video, it’s really not no-lock or read uncommitted that’s the problem. It’s the fact that you have to deal with all this crap under read committed that’s really the problem. Read committed is the enemy.

Right. And so a better solution would be to use an optimistic row versioning isolation level, like either read committed snapshot isolation or snapshot isolation, in order to prevent this sort of deadlocking at a grander scale.

The same way you could use that to prevent the sort of blocking we saw in the last video on a grander scale, without having to go through each and every one of your block, like reader, writer, blocking and deadlocking things and saying, okay, well, can I, like, what’s the query plan?

Can I fix the query plan? Can I do this? Can I do that? Can I do the other thing? Can I change indexes? What can I do? The optimistic isolation level thing just fixes all of it in one big go. So you don’t have to go through and nitpick every single individual query.

If you just have a couple queries that this is happening for, then you can probably pretty easily fix this without going and changing any database settings, assuming you’re allowed to change indexes.

But if you have this happening across a wide variety of queries, your life gets very, very difficult when it comes time to fixing these things, because you have to go through a lot of different stuff and fix a lot of different things. And then, you know, you’re maybe not even 100% guaranteed to fix everything.

Not every type of deadlock and not every deadlock situation is entirely unavoidable. You might need to also, you know, introduce, like, you know, lock timeouts, or you might need to introduce deadlock priority, or you might need to introduce some sort of retry mechanism for your queries that end up in deadlock situations.

So there’s a lot more that goes on. Like, you can make your best effort at fixing all this stuff with the things that I’ve talked about, making changes to the queries, indexes, etc.

But at the end of the day, not every blocking and deadlocking situation is entirely avoidable, unless you get to an optimistic isolation level, which resolves things in a far, like, far more, like, far more specific way, let’s say.

Anyway, thank you for watching. I hope you enjoyed yourselves. I hope you learned something. And I will see you in the next video where we will continue to pick on Read Committed, because too many people think that it is too good of an isolation level.

And, uh-uh, not on my watch, pal. All right. Thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Learn T-SQL With Erik: Reads Blocking Writes Under Read Committed

Learn T-SQL With Erik: Reads Blocking Writes Under Read Committed


Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Learn T-SQL With Erik: What Does Read Committed Guarantee?

Learn T-SQL With Erik: What Does Read Committed Guarantee?


Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

SQL Server Performance Office Hours Episode 28

SQL Server Performance Office Hours Episode 28



To ask your questions, head over here.

How often have you dealt with corruption? When is the last time you saw it?
Have you ever met a query you couldn’t tune?
Which database will win the database wars?
What are the hardest kinds of queries to write?
When will Joe Obbish be back for a podcast?

Video Summary

In this video, I dive into answering five of your most pressing questions about SQL Server and beyond during another exciting Office Hours episode. I cover a range of topics from dealing with corruption to the future of database management systems like DuckDB and Postgres. Whether you’re curious about tuning queries or wondering which database might win in an imaginary “database wars,” this video has got you covered. Plus, I share some upcoming tour dates for my consulting services, including stops in New York City, Dallas, Utrecht, and the eagerly anticipated Past Data Community Summit in Seattle. If you have any questions of your own or want to support the channel, there are links in the description to get involved.

Full Transcript

Erik Darling here with Darling Data, and we are in the midst of another wonderful Office Hours episode coming from me to you, in which I answer five of your most pertinent, important, burning questions about SQL Server, life, love, the world around you, mysteries of the universe, anything that you’re interested in. If you would like to ask your own questions for Office Hours, there is a link right there to do that. If you would like to support the the channel, there is a link, well actually both of these links are down in the video description. If you’re feeling very kind, and you want to ask me a question, or you want to sign up for a membership, you can do either one. If you need help with your SQL Server, if performance is just not what you always dreamed it would be, I am available for consulting as a consultant to help you with those problems. Health Checks, Performance Analysis, Hands-On Tuning, Dealing with Performance Emergencies, and of course, Developer Training. And as always, my rates are reasonable. Performance Tuning Training, about 150 US bucks, about 75% off there. That link down in the video description, and you will get all of that content for life. Speaking of for life, my T-SQL course is, well, all the beginner content is done, which means it is somewhere near half done. The advanced material, of course, will be bigger than the beginner material, because there is much more, to say in the advanced material, but 23 hours of beginner content is out there now. It is 250 bucks at the pre-sale price, and that price will be going up, doubling the value of the course after the summer months. The summer months, of course, will be busy times for your friend, Erik Darling from Darling Data. Red Gate is taking me on tour across several points of interest across the globe. New York City, August 18th and 19th.

Dallas, September 15th and 16th. And Utrecht, October 1st and 2nd. And then, of course, the event that we’ve all been waiting for, Past Data Community Summit in Seattle, November 17th to 21st. But with that out of the way, let’s do these office-y, hour-y questions here. Let’s see. Let’s make sure that’s framed up nice for everyone. We don’t want anyone to have to struggle to read these things, do we? More importantly, we don’t want me to struggle to read these things.

How often have you dealt with corruption? When is the last time you saw it? Hmm, geez. So, I don’t deal with corruption a lot. It’s not really up my alley. You know, I am a performance-tuning person. I enjoy performance-tuning.

I do not enjoy things like corruption, high availability, disaster recovery, backup, stuff like that. I can, you know, I can work with it a bit, but it’s not, those are not things that I care deeply about. So, I have dealt with corruption a few times while consulting.

You know, most people are surprised when we come across it. You know, some of the initial analysis that I do when I work with people does, like, look at things where, you know, corruption might be popping up, cropping up, hiding, and where they just might not, like, be able to see it. So, you know, like, use SP Log Hunter and look at the error log.

Use SP Health Parser and go through the System Health Extended event. And then, also, I look at the Memory Dumps DMV. And sometimes, you look in the Memory Dumps, and the Memory Dumps will also indicate data corruption.

But it has been a while since I saw it. I don’t see people hitting data corruption issues so much. I think, you know, disks have become, at least to my mind, sufficiently resilient that corruption is not as common of an issue as it used to be.

I, you know, I don’t have any experience with that aside from the work that I do. But, you know, I do run across people who have not run DBCC CheckDB ever. And, you know, maybe they have gotten at least, I don’t know, maybe very lucky that they have no corruption.

But it’s just not something that I run into a lot. Let’s see here. Have you ever met a query you couldn’t tune?

Well, I mean, yes and no. There are certain, let’s just call them pathological issues with an environment that make some queries much more difficult to tune than others. You know, a lot of the times, if you are not allowed to add indexes, then tuning a query becomes a little bit more difficult because, you know, like a lot of the sort of logical rewrites that you do would depend on having some sort of reasonable index around that would allow you to access data in an efficient way.

If you are dealing with tables that are very big and don’t really have indexes on them, that becomes a little bit more difficult. But, you know, as far as just like, like, have I ever looked at a query and been like, I have no idea how to fix this? No, that’s, you know, that it’s been a very long time since I’ve had that happen to me.

But I’m trying to think. Yeah, I mean, really, it’s just, you know, dealing with, it’s usually not like, like, like local to just the query that’s in front of me. Usually there are like extenuating circumstances with either the hardware or the database or something that that that make tuning efforts a lot more difficult.

But as far as just like, you know, could I, if given unlimited resource or not even unlimited, just given like some more time and some more resources, I could I could I could really do this. But for now, we can we can take some shortcuts and like, you know, try to get batch mode happening or something to to speed that up or like crank maxed up up a little higher. So we have more CPU is working on on on on all those rows or something.

But there’s never been a query that I’ve looked at and been like, well, I just can’t do anything. We’re we’re we’re stuck. Let’s see here.

Which database will win the database wars? I mean, obviously, DuckDB. No, I don’t know. I mean, look, Postgres is having its moment.

I and I don’t think that’s a bad thing. I do hope that Postgres having this moment will force other database vendors to perhaps realign their priorities and perhaps think a little bit more about their pricing models for things and what they offer along with those pricing models. It is, you know, you know, I joke that like everyone who uses SQL Server and has never used Postgres thinks that using Postgres will solve all of their problems.

But in general, you’re you’re sort of just trading for a different set of problems there. There it’s it is amusing to me also that, you know, doing development work on Postgres is like working with SQL Server 2050. But doing like monitoring Postgres stuff is like working with SQL Server seven.

Right. It’s like there is just not not a lot built in that’s terribly useful or helpful. And I know that there are plugins and extensions and things that you can add to it that surface more.

But yeah, like out of the out of the box, if you don’t have like some some someone who like knows and cares and pays attention, getting getting meaningful information out is a lot more difficult. Postgres query plans specifically are not as friendly and graphically inclined as SQL Server ones are, nor do they surface. I think some of some of some of the more interesting details that the SQL Server plans give you.

So, you know, you know, do I think SQL Server is going to win? Probably not. I mean, SQL Server, to my mind, does have the best query optimizer.

Postgres’s query optimizer is not nearly as good as Microsoft’s. Not to say that it isn’t capable of doing some things better than what Microsoft SQL Server does. But I think like just generally speaking, I would I would much rather have Microsoft’s query optimizer than than Postgres’s.

But, you know, like I said, I think that Postgres having its moment now is a good opportunity for other big relational database vendors to perhaps have a moment of clarity about some of the things that they’ve been doing with the product over the years. Perhaps correct their courses in a way that will make the specter of a free database perhaps a little bit less appealing or something. Anyway, let’s see here.

Oh, I got a good question next, but you got to get through this one first. What are the hardest kinds of queries to write? I answered a similar question, I think, recently. Anything that involves like a lot of math, any sort like I don’t I don’t have any groundings or foundations in math.

So if there’s like, you know, some advanced data analysis formula that you need to apply to stuff, you know, I would have to, you know, go look at that and figure out how to, you know, write the T-SQL version of that, which I am not particularly good at. And I would not be able to necessarily like, like, like, validate easily. Like, well, I wrote all the math out.

It looks right. Can someone tell me if it’s right? So those are the hardest kinds of queries to write for me are the ones that involve like equations and algorithms and things like that, because like, I just don’t have any grounding in those areas. So here’s the question.

Here’s a good question. When will Joe Obish be back for a podcast? Well, I’m hoping that Joe will be feeling in the mood for recording in the, you know, I don’t know, reasonably near future. I don’t have an exact date, but I have at least gotten an email from him expressing interest in recording again.

So hopefully that is on the horizon. Anyway, that’s five questions. They’re short ones again this week.

All right. Well, I don’t know. I think, I guess it’s, it’s nice when they’re not overly long and involved because then I don’t have to think too, too much. But anyway, thank you for watching. I hope you enjoyed yourselves.

I hope you learned something and I will see you in the next video, which will be, I don’t know, something suitably SQL Server-y. All right. Thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Learn T-SQL With Erik: Optimistic Isolation Levels Don’t Allow Dirty Reads

Learn T-SQL With Erik: Optimistic Isolation Levels Don’t Allow Dirty Reads


Video Summary

In this video, I delve into the nuances of locking versus row versioning and isolation levels in SQL Server, addressing a common misconception that optimistic isolation levels like read committed snapshot isolation allow dirty reads. I explain how no lock hints permit dirty reads while optimistic isolation levels do not, using practical examples to illustrate these concepts. The discussion is aimed at dispelling myths perpetuated by third-party vendors who often rely on outdated or incorrect information, and it serves as a reminder that modern SQL Server features like read committed snapshot isolation offer robust solutions for managing data consistency without the blocking issues associated with traditional locking mechanisms.

Full Transcript

All right, Erik Darling here with Darling Data. In today’s video, we’re going to talk about locking versus row versioning, isolation levels a little bit. Because I find that even to this day, I still need to make a big clarifying point to people. One of the most common things, so I mean, just to frame this up a little bit, I work with a lot of people who use a lot of third party vendor tools. Cool. You know, someone out there for everyone. And what I run into with a lot of these third party vendors is, of course, a lot of no lock hints. And of course, still a lot of blocking and deadlocking problems. And so I’ll make suggestions around using a row version in isolation level. And one of the most common responses that I get when I do this is that someone from the vendor team will, I don’t know, maybe they’ll, maybe they’ll, maybe they’ll, maybe they’ll, maybe they’ll be able to do this. They’ll Google something or maybe, maybe they’ll have just heard something from some point in the distant past. And they’ll say, no, no, no.

We can’t use read committed snapshot isolation. We can’t use snapshot isolation because they allow dirty reads. Meanwhile, there are, there are no lock hints every here, everywhere, which do allow dirty reads. And there is no sign of an optimistic isolation level, which do not allow dirty reads. So at that point, I often just feel like giving up, right? Because how many times can you hear the same, the same person from the same stupid blog post and repeating the same stupid information back to you?

So, yeah. So I end up having to do a lot of work to prove that optimistic isolation levels, in fact, do not allow dirty reads. No lock hints. Do allow dirty reads. And it’s, it’s seemingly always a losing battle. Anyway, all 23 hours of the beginner content is available up on the website. Now, the link is down in the video description. It is at still at the presale price of 250 bucks, and that will double in value to 500 bucks when the advanced material comes out after the summer.

So let us, without further ado, dispel any notions that optimistic isolation levels allow dirty reads. Because apparently this is something that I still have to do in 2025. Thanks, other bloggers. So to recap a little bit, locking read committed is the default isolation level for SQL Server, aside from an Azure SQL database, which uses read committed snapshot isolation by default.

Lucky Microsoft that you get to make that choice. Lucky you. Locking read committed steps things up a bit from read uncommitted and specifically disallows dirty reads. However, this comes with a couple tough trade-offs.

Your selects can block modifications. Your selects can be blocked by modifications. All right, zoom it.

There we go. And your selects can deadlock with modifications. And this is all under read committed. The locking isolation locking read committed. Locking read committed does not give you a point-in-time view of data because the shared locks that it acquires are released as soon as possible.

Under most normal circumstances, we do have to cover a circumstance where those locks are not released and where lock escalation of shared locks to the object level does occur. Now, locking read committed can absolutely, and we will cover this as well, show you deleted rows, missed rows twice, and missed rows entirely as it is reading through data. But that will, again, be something that we cover later, not today.

Row versioning read committed, which is not locking read committed. It is read committed snapshot isolation, is an alternative to locking read committed. It does alleviate the blocking and deadlocking downsides that happen under row versioning, that happen under locking read committed.

So read committed snapshot isolation uses row versions, gets you around locking problems that locking read committed allows for. The row versions are copies of locked rows with the values that are currently committed at the time that the statement began reading rows and not when an explicit transaction began. But these are not dirty reads.

Row versioning read committed can, just like locking read committed, show deleted rows, miss updates to rows, and miss new rows. So locking read committed, you get all that stuff, plus blocking. Row versioning read committed, you get all that stuff, but you have fewer blocking problems.

It’s up to you to figure out which ones are better. But this is only, this part here, this is the important part, this is only from modifications that have not been committed yet. This cannot happen with committed ones, unlike with locking read committed.

So what I’m going to do is I’ve already, I’ve already set snapshot on for stack overflow. This one, I don’t need the whole row back immediate because we’ve already got it turned on. I’m going to create a table called five rows.

And I am going to put five rows into it. The numbers, one, two, three, four, six. And we have no row five in here. Note that it goes right from four to six here.

And then just a get date for the date column that’s in there. Now, over in this tab, I have a select query. So this database has read committed snapshot isolation enabled.

If I add a locking hint to a query, like read committed lock, no lock, whatever, that locking hint will direct reads from that table to obey the locking semantics of the isolation level of the locking hint that I put in there. So for read committed lock, this will still obey the semantics of read committed locking. If I just run this query under, in the stack overflow database, this one is going to read row versions should row versions exist, right?

So if I run this query over and over again, we just keep getting back the same rows. If I come over here and I begin a transaction and I delete from the table five rows where ID equals one. And I come back over to the read committed lock query.

This will just get blocked, right? Okay, fine. If I come over here to the read committed snapshot isolation query, I get back results immediately. But what I get back includes a result from an uncommitted delete, right?

So the delete that I began a transaction for has deleted row one, but it has not committed yet. So because we can see only the committed values, we do not do a dirty read. We see row one, but that’s okay because this delete hasn’t committed yet.

This delete could change its mind. We could also do this and we could begin a transaction and we could add 1,000 years to the date column for IDs two and three. If I come over here and I run this, we will get back results immediately, but we will not see uncommitted results for these rows.

These are still happening in 2025, right? So we see these rows because this still hasn’t committed. So we don’t see 3,025.

We see 2025 still. We can even come over here. Oops, that’s not it. We can even come over here. We can begin a transaction and we can insert a row with ID five into the table, but we have not committed this yet. So when we come over here and we run this, we still see 1, 2, 3, 4, 6.

We do not see data from uncommitted transactions. We have seen potentially out-of-date data, but not really because the transactions have been committed. So this is still perfectly valid data.

So we have just seen a bunch of, right now we just see rows as they existed before any of these modifications started doing stuff. If I come over here and I commit this transaction to show the delete, and then we come over here and we look at this. Now, because that delete committed, we see that row 1 has been removed.

If I come over to this tab and I commit this transaction for the updates, that’ll go just fine. We can come over here. We can run this.

And now we see that rows 2 and 3 have indeed updated to be the year 3,025 from the year 2025. So we have indeed seen committed data. We have not seen uncommitted data yet.

And if I come over here and I commit this transaction, this all goes fine. And I can run this and I see 2, 3, 4, 5, 6, right? Because now I see that row 1 got deleted.

I see the updates to row 2 and 3, and I see the insert of row 5. So this is all fine. We have not seen a dirty read. If I come over here and look at this, well, we see what the end result.

We got 2, 3, 4, 5, 6 with no row 1. But how long did we wait? We waited the entire time I was doing all that other stuff.

So like I said, optimistic isolation levels do not allow dirty reads. We did not see one dirty read that entire time. We saw the correct versions of rows, versions of those rows from before the modifications, like, fully committed.

They were in flight. We read a row version. We didn’t commit them yet.

So those changes weren’t, like, fully permanent to the database. We did not commit them. They were not quite consistent in a consistent state in the database because they weren’t committed. So we did not see those changes reflected yet.

As soon as those transactions committed, we saw the changes reflected in our data. But we didn’t get blocked once, right? Only with the read committed lock, we waited, well, I mean, in this case, about two and a half minutes for me to do all the other stuff.

And then we got back this. So it’s up to you how you want your data to look. You can either see, uncommit, you can see the, like, you can either see the changes, right, or not see the changes until they commit, which is row versioning.

Or you can wait a long time and then see the changes after they commit when you use the locking isolation level. How would you, I don’t know about you. I’m an impatient person.

I would much rather get back my results quickly and then see committed data once it fully commits. But not once did this give us a dirty read. We did not once see data from an uncommitted transaction.

We only saw data once those transactions fully committed. Anyway, thank you for watching. I hope you enjoyed yourselves.

I hope you learned something. And I will see you over in the next video. We have a few more things to talk about when it comes to actually read committed specifically being one of my least favorite isolation levels. So we’re going to do those next.

But anyway, thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

Learn T-SQL With Erik: Writing Bad Data With NOLOCK Hints

Learn T-SQL With Erik: Writing Bad Data With NOLOCK Hints


Video Summary

In this video, I delve into the hidden dangers of using the `NOLOCK` hint and the `READ UNCOMMITTED` isolation level in SQL Server. I explore how these settings can lead to unexpected issues, such as dirty reads where your queries might return inconsistent or erroneous data, potentially causing confusion for end users. I walk through a practical example demonstrating how using `NOLOCK` can rewrite uncommitted data into your database, leading to potential data corruption and making it difficult to maintain the integrity of your data. This video is part of my T-SQL course, which now offers 23 hours of beginner content at a pre-sale price of $250, with the advanced material set to release after summer and double in value to $500. If you’re serious about mastering SQL Server, this course is an excellent investment for your skills.

Full Transcript

Hey, Erik Darling here with Darling Data, and in today’s video we are going to talk about sort of an unforeseen peril of the read uncommitted isolation level. Whenever people talk about it, they say, oh, your select queries will do dirty reads. True, they will. Assuming that dirty reads exist, they can and will be done. And you will, your end users will be like, what the hell am I looking at here? I think, I think the database is corrupt. I don’t know what these results are. There are 50 rows of the same thing for some reason. So like, like, you know, not a good idea to use, right? If you, if you truly don’t care about, you know, your end users or what data they see or what reports they get, cool. Like you can, as long as, as long as that is a consenting relationship between you and your end users where they know that they are going to pay you money and you are not going to care, cool. Stick with it. You’re doing great. Otherwise, you should, you should probably think about something else, right? Anyway, this is of course all teaser content from my T-SQL course. 20, all 23 hours of the beginner content is out and available. Still at the pre-sale price of 250 bucks and doubling in value to 500 bucks when the advanced material publishes after the summer. But now let’s go talk about the, the, the nastiness of NOLOC. Now, first, read committed and NOLOC are absolutely synonyms for each other. They are twins. They do the same thing. Read uncommitted is of course the weakest of all the isolation levels with absolutely no protective shared locks taken on data that is being read. Read uncommitted is an isolation level that is defined in the SQL standard and available in SQL Server. And NOLOC is a table hint that indicates reads should obey the read committed locking semantics. You do not need both. You, I mean, really shouldn’t use one or the other, but you know, here we are. Uh, and if SQL Server developers were thinking clearly when they were naming things, they would have named the hint, no respect instead of NOLOC to clearly indicate that it means that your query will not respect locks taken for by other queries. Not that your query takes absolutely no locks.

Because when they name it NOLOC, it makes it sound like your query is just like this fancy lock free lad just running around town, but it’s not. Um, it, you know, it does like, it does make it so that it won’t take shared locks, like to read stuff on data. That’s, that is true. But there are still schema stability locks that get taken so that data can’t change part way through, uh, meaning like, like table structure data rather. Uh, so like, you know, if you were saying select star from table with NOLOC, then someone wanted to like add a column or drop a column or change a column from an integer to a, in VARCAR 4000 and like start typing words in there. Your query, your query, your query, your query that’s currently running wouldn’t be like, wait a minute. Now I gotta get rid of this column or add this column into the results or wait a minute. I’ve been showing the numbers one through a million. And now I have to start like showing letters and stuff. This is weird. So like, like there are locks that get taken to prevent some things from happening.

But, uh, not any of the, uh, like changes to data, uh, that you might like, like to on, on rows and pages that you might, uh, might, might see in other situations. So, uh, like when you use read committed or repeatable read or serializable, those shared locks do get taken and your read queries can both get blocked by modification queries, deadlock with modification queries, and even block modification queries. Um, generally it’s not advisable to use NOLOC hints. Even when people make what they consider to be an informed choice about allowing it, the full consequences of what you’re doing when you use it are rarely understood, at least until incorrect data ends up being written into your database, uh, and also returned to users.

Um, you know, of course your queries won’t generally block or be blocked by concurrent modification queries, but it comes at a very steep cost. Um, it’s the only isolation level that allows dirty reads. Something that a lot of people sort of mess up is thinking that there are optimistic isolation levels that allow for dirty reads, but there, there aren’t, but that’s not true.

And we’re going to talk about that in the very next video. So, uh, what we need, my mouse will actually scroll here are a couple of tables. Uh, we’re going to have one table called no and one table called lock, right?

Uh, aptly named, I think, because we’re talking about NOLOC. In the table called no, we’re going to insert a single row. This is turning into a mother goose thing very quickly, right?

So we have one row with the ID equals one and, uh, the column no set to NULL. So that’s all that’s going in there. In the column lock, we are going to insert one row as well.

Uh, it’s also going to have an ID of one because we’re going to join these things together. And right now the lock column is going to be inserted with a value of 100. So if we put that in there, now we have a table called no and a table called lock.

What I’m going to do in here is open a transaction and I’m going to update the column lock in the table called lock to be the integer maximum, right? So two, one, four, seven, four, eight, three, six, four, seven. Sure.

That’s someone’s phone number. I apologize if it is. And we’re going to run this. And the first and most obvious thing is that if we, you know, run this query, we can see that we read this row, right? We can see this, this change to the data in flight.

When we inserted a row into this table, it had the value of 100. It has now been updated to this number. So if we say, you know, to select from that table with no lock, or we set the transaction isolation level to read uncommitted, we can see that data change in flight. Okay.

So this is just a read query. If I come over here and I run this update query and I, like, we can’t put a no lock hint on this table because this is the table that we’re updating, right? This end here means we’re updating this here, right?

So the table no is aliased as n and that means that’s what we’re updating here. And we’re setting the column no equal to the column lock from the lock table. And we are, of course, joined on those ID columns that were helpfully populated with the number one before.

So if I do this, but I say it’s okay to read dirty data from the lock table. If I say it’s okay to read from this table using the no lock hint, the same thing is going to happen here that happened over here, where we see this number from this in-flight transaction, right? We see one row got affected here where we set lock equal to this number.

So if I run this, and what I’m going to do is I’m going to use the output clause, and I’m going to use the inserted table to show you what values ended up in this table, right? So this inserted shows you the updated values that anything from the deleted would show you the values that got overwritten. If I run this, and we look at the results, we’re going to see that we actually end up updating a row in this table to the value that it sees from the in-flight transaction in the other tab.

This is really not a good thing. You have now rewritten uncommitted data to your database. Right?

So if this query over here happens to change its mind and roll back, or maybe someone says, hey, dummy, you’re not supposed to give anyone that much of a thing, right? 2.14 billion. You’re not supposed to do that.

Like, that’s clearly erroneous. Then all of a sudden, this table will still have bad data in it. So now we’ve written uncommitted data to the table no, and we would now have to find some way to undo this. If all your queries are doing this, and all of your modification queries are saying, it’s okay to read from these other tables with no lock.

Who knows how much uncommitted stuff, rolled back stuff, erroneous stuff, incorrect stuff has ended up being written to other parts of your database. This is why when I say, you know, even if you have made a rather, like you think you’ve made a rather informed choice about using no lock hints in other places, as people start to see the commonality of those no lock hints, they are going to start spreading to other places. And you’re going to start writing garbage data, not only, not only are you going to start returning garbage data to your end users, but you’re going to start writing garbage data to your database.

And now your database is full of garbage. Anyway, thank you for watching. I hope you enjoyed yourselves.

I hope you learned something. And I will see you in the next video where we’re going to talk about locking versus row versioning and how row versioning isolation levels don’t actually allow dirty reads. All right.

Thank you for watching.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.