YouTube Days: A Little About Computed Columns In SQL Server

A Little About Computed Columns In SQL Server

I’m working on ramping up my recording and streaming setup now that life has settled down a little bit, and publishing the results to YouTube while I work out the wrinkles. Enjoy some free video content!

Video Summary

In this video, I delve into the intricacies of using expressions within SQL Server queries and how they can impact query performance. I explore why SQL Server cannot use expressions to seek data in indexes and demonstrate a practical example by querying users from the Stack Overflow database based on their combined upvotes and downvotes. After running several queries and analyzing the resulting execution plans, I explain that creating an index directly on the expression did not significantly improve performance. However, introducing a computed column allowed SQL Server to provide better estimates and potentially seek into the index more efficiently. The video also highlights the importance of expression matching for leveraging indexes effectively and suggests improvements for SQL Server Management Studio to enhance usability with regional number formatting settings.

Full Transcript

Erikaphor now, Darling here with Darling Data. And I’d like to thank the lovely and talented Randolph West for giving me some very good, specific suggestions on my audio settings last night, especially for bearing with me while I was in the midst of several red wines and was a little bit unaware of my surroundings. Randolph ran me through a vocal analyzer. I’m not sure if that’s a fetish yet or not, but it’s on the table now. And anyway, the suggestions were to reduce the gain a little bit and adjust the compressor setting a little bit so that I got peaks instead of tables. So, there we go. Thank you, Randolph. These a hell of a them. Today’s video, we’re going to talk a little bit about how… well, so… in a video recorded for the last 24 to 48 hours, I hesitate to use terms like yesterday and today because due to time zones and other temporal dilemmas, we never know how correct those terms ever were. If I was in New Zealand, it could be two days ago. But in a video recorded for the last 24 to 48 hours, I talked a little bit about how…

if you need to calculate an expression, right? Like run a function on a column, add two columns together, like tack something onto a column. SQL Server can’t use those expressions to seek to data in an index because those expressions happen at a later point in a query running than when you first touch an index or when you touch a table, when you touch those 8KB pages that make up your… …tables and indexes and basically everything in SQL Server. So, what I’m going to do is run this query and the goal of this query is to look for users in the Stack Overflow database who have more than 1 million combined upvotes and downvotes. My meteorologist skills are increasing with every video. Watch out. So, I’m going to be all powerful and control the weather like a chainy. So, if we run this query and we look at the query plan, we’re going to take about 444 milliseconds to scan the entire clustered index and return one single row.

So, if we look at the details of the clustered index scan, we’re going to have this big predicate over here. Upvotes plus downvotes greater than or equal to $1 million. My meteorologist skills really took a steep fall on that one. So, of course, job interview red meat right there, a clustered index scan. Users are complaining, dear lord. And so, let’s create an index on the users table. On the upvotes and downvotes column. Let’s see what kind of…let’s see if we’re going to get the job or not.

We’re going to run this and look at the query plan because that’s what we do here. We look at query plans all day long. We stare at them. And this sort of depressingly didn’t change execution time all that much. This still took right around 430 milliseconds. The last one was like 440 milliseconds. That could have just be…like Windows Update could have been running in the background and looking for something. And that could explain like the 10 millisecond change in CPU time there.

So, let’s not get too hung up. Let’s not pretend that we won. Alright? So, the main problem is that we still have this expression that needs to get calculated for SQL Server to find what we need. And of course, if we…the way that, you know, non…the way that indexes in general are stored, we don’t have any metadata about… We know what values are in upvotes and we know what values are in downvotes, but we don’t have any information about what, like, a calculation on those two columns would produce.

Right? SQL Server doesn’t track that. Same thing with, like, date columns or date time, any time-related column, really. If you have two of them in an index, SQL Server’s like… And SQL Server’s like, oh, well, there’s an hour between them and it was like 21 days, 16 hours, 7 days, whatever that… Nothing compares to you, Leric, is…like, SQL Server doesn’t track the amount of, like, time between two date columns, right?

So, whenever we need to calculate something based on two date columns, two columns added together, things like that, we need to pre-compute that so SQL Server can find that data. Alright? So, what a lot of folks out there might want to do is create a filtered index, maybe to figure out where upvotes plus downvotes are already greater than a million. Unfortunately, that is…you can tell by the red squiggle in IntelliSense that’s throwing right there, that we are not going to be able to create this.

So, if we try to, say, where upvotes plus downvotes is greater than or equal to one million, we’re not going to be able to do that. Even if we used advanced parentheses technology, we would not be able to create a filtered index based on that. So, disappointing there. I think you can do that in Oracle, so maybe you should just use Oracle instead. Sorry, Microsoft.

But, what we can do is we can do the old Microsoft SQL Server two-step, and we can create a computed column that gives us the results of upvotes plus downvotes. And note that we don’t even have to persist this in order to create an index on it. Now, just creating that computed column, even not persisting it, will give us, if I remember correctly, a slightly better estimate than before.

Not great, not awesome, but better. We still don’t, we’re still not able to seek to the data we care about. But, now SQL Server is at least like, hey, hey, maybe, maybe an index would help. So, SQL Server will give you a missing index request in some cases, but SQL Server won’t say, oh, if you create a computed column and then create an index, everything will be better.

But, SQL Server before was just like, nah, man, you’re screwed. So, there’s that. So, let’s follow SQL Server’s now very helpful missing index advice. And let’s create a nonclustered index on our computed column.

Notice how that was pretty quick, right? So, that was SQL Server actually writing all those values down in the index. Before, we just had a virtual column where SQL Server would still have to compute that every time. But, now that we have it all written down, we can, hopefully, stealthily, much more quickly than before, seek into our index.

Now, instead of taking about 430 to 440 milliseconds, depending on what Windows Update is doing, we can, oh, my finger went way over weird. I tilted, that was my fault. We get, my finger, I’m losing my finger again.

We get one roll back in zero seconds and a slightly better estimate of 293. Before it was 1570, before it was something else, I forget. I’ll have to rewatch the video and maybe edit that in.

Just kidding. Not doing that for a 10-minute YouTube video. Ha ha. If you want me to do that, you’ll have to pay me. One thing, though, that is disappointing about computed columns is that what helps SQL Server use them is something called expression matching.

Now, in this query up here, notice we didn’t touch the up-down computed column directly, but SQL Server was still able to use that index. It was able to match the expression in our where clause to the computed column and use the index based on that, which is great. But, it doesn’t work if we tinker with that expression at all.

So, let’s run these two queries back to back. We get one row back from both of them. This one, again, seeks to the data we care about.

Zero seconds, and the one down here, unfortunately, scans the whole indexes back to around 440-something milliseconds with a really awful estimate of 73,739,714. If anyone from Microsoft is watching, and you happen to know anyone who works on SQL Server Management Studio, can I get a comma?

Can we get some commas and numbers? Can it be a setting to add whatever regionally appropriate terminator to thousands is in our numbers? I know that some countries use dots, other countries use commas.

Here in the US of A, we are fully committed to the comma. And it will be very useful if we can get some commas in our numbers in SQL Server Management Studio. Please, I beg of you.

I beg of you. You have access to all sorts of functions in C-sharp like format and whatnot. They can produce values, numerical values that have commas or whatever regionally appropriate. Thousands terminator you have, or hundreds terminator.

I don’t know. Math is beyond me. Anyway, this is another example of how if you need to calculate an expression in a join or where clause, you might not be able to do it cleanly even with a good index on the columns that you care about because the expression that those columns or whatever else produces, a function like left, right, r-term, l-term, substring, replace, all that stuff.

SQL Server doesn’t store that, right? SQL Server doesn’t have access to that. SQL Server runs those expressions at a different point in the query than when it touches the index and is able to seek the values.

So be very careful with how you express things like that in your queries. If you find yourselves in a performance quandary where an expression like that is causing a scan of an index, it’s taking too long, then you might need a computed column and an index to help you locate that data quickly.

Anyway, thank you for watching. I hope you enjoyed yourselves. I hope you learned something.

I hope that my petition to not have BeerGut Magazine permanently ban me from using the name Eric. We’ll get enough signatures to pass Congress or whatever happens. I don’t know.

If not, I’m going to start a contest to rename me. That’d be fun. Anyway, thank you for watching. I hope you learned something. I hope you enjoyed yourselves. And I will see you in another video. Have a good one.

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

3 thoughts on “YouTube Days: A Little About Computed Columns In SQL Server”

happydba says:

April 21, 2023 at 1:22 pm

Audio still buzzy – more down gain 😀
1. happydba says:
  
  April 21, 2023 at 1:25 pm
  
  Actually the audio sounded good until yesterday so *shrugs*
  1. Erik Darling says:
    
    April 21, 2023 at 1:28 pm
    
    Yeah it’s been a bit tough to get right, but the newer videos are spot on I think. The worst part is that a lot of the artifacts don’t show up in shorter recordings.

Comments are closed.

A Little About Computed Columns In SQL Server

Video Summary

Full Transcript

Going Further

Related Posts

3 thoughts on “YouTube Days: A Little About Computed Columns In SQL Server”