SQL Server Performance Office Hours Episode 2
| What evils come of vertically partitioning a table into multiple tables? I have bad history with two of the alternatives. The first being long thin tables with foreign keys to what ought to have been their columns and the second being tables with too many columns. |
| How can I keep our devs happy by avoiding stored procedures and embracing their preferred ORM (GORM in this case)? Not a big fan of ORMs. |
| test after adding a field |
| I got a table with only 2 columns, bigint and varbinary. For some reason app inserts a row in the table and after that updates the varbinary field with a value (likely a blob). I’m seeing blocking because of this. Any recommendation to mitigate it? |
| Will you be at SQL Bits this year? |
To ask your questions, head over here.
Video Summary
In this video, I discuss some common issues and advice related to SQL Server development. Starting off, I address the topic of vertically partitioning tables into multiple tables, explaining that while it can be beneficial in certain scenarios, overly wide tables or entity attribute value (EAV) tables with too many columns can lead to performance issues due to increased complexity in indexing and query execution. Moving on, I tackle a question about keeping developers happy by avoiding stored procedures and embracing ORMs like GORM. I share my perspective that while ORMs are useful for development, they often hit performance bottlenecks when dealing with complex queries or specific database operations. I suggest allowing developers to use ORMs until they encounter these issues, at which point offering store procedures as a solution can help maintain their satisfaction and address performance concerns effectively.
Full Transcript
I had forgotten to turn on my microphone. Thankfully, I remembered before I started making any mouth movements. So we’re going to just roll with this one. That’s okay. I’m not too concerned about that. Anyway, we are going to do our second Office Hours episode. And we’re going to change up the format slightly. And rather than read directly from the Google Sheet, because things have been coming in much more frequently now, and I don’t want new stuff to show up in the middle of a reading and get distracted. So I’ve pasted the chunk of questions that I’m going to answer into an Excel file, and I’m going to read from the Excel file. Can I, The other nice’s the other as F bucks a month. If you don’t have four bucks a month for various socio-economic reasons, you can like, you can comment, you can subscribe. These are all things that are free aside from the brief spasm of muscle that it takes to click something. If you want to ask questions that show up on these Office Hours episodes, you can do that by going to this link, which is also included in the video description. It’s going to be a recurring theme of all of these things, and you can put your questions in there and I will eventually answer them.
Oh, there’s that. If you need help beyond just asking an Office Hours question, if you are just absolutely stumped with some SQL Server thing, you can hire me to do all of this stuff. I am in fact a consultant. I in fact get paid to consult. That is the majority of my income. It is not YouTube.
So, there we are. And as always, my rates are reasonable. If you would like some other reasonable things from me, like training, you can get that for about 150 bucks for the rest of your life. It’s a pretty good deal. Again, fully assembled link, video description. Click it, buy it, be happy.
Upcoming events, we have SQL Saturday, New York City 2025, taking place on May the 10th, with a performance tuning pre-con on May the 9th. I know we’re missing that awful joke from that burned out franchise by doing it on these dates, but I am fine with that.
So, with that out of the way, let’s office our hours here. And we’re going to talk through these questions. And first up is this one. This is a real humdinger. What evils come from vertically partitioning a table into multiple tables? I have bad history. Oh, you ought to clear your browser cache then, pal.
With two of the alternatives, the first being long thin tables. For the folks playing along at home, those are going to be called entity attribute value tables, or EAV, like victor, elephant, apple, victor, EAV tables. With foreign keys to what ought to have been their columns, and the second being tables with too many columns. Well, I think the problem that you would traditionally have would be that the tables that are too wide. This is, so the tables that are too wide, those fall into the category of generally needing to be vertically partitioned. Remember, V’s go up and down, that is this way.
Horizontal goes this way, right? Yeah. So, what you would want is a way to move those columns out into a new table and go through a process we in the database world called normalization. You can usually tell you’re in a position where you need to normalize tables when you have one of two types of columns in your table. One of them are going to be columns that share a prefix. Like, let’s say you have an orders table and you have mistakenly decided to retain a whole bunch of customer information in your orders table. You might have a bunch of columns that say customer underscore something. I call this tables within tables, right? So, when you have that, that is a sign that you need to normalize things out. Overly wide tables cause lots of problems in SQL Server because the wider your table gets, the wider your indexes need to get to compensate for the columns that people, and generally the more indexes you need because you’re going to have people who need to filter on a certain set of customer details and then return a certain set of columns. And it just adds too many different combinations of things that can possibly happen in a query from one table to really be able to like sufficiently index for. Sure, there’s always like the, you know, the clustered or non-clustered column store thing, but those like, those aren’t always the best idea for an OLTP type scenario where you would have that kind of information in there. So, I have no problem with the normalization thing. There, you know, you can run into, or rather I have run into many schemas in my life where the stuff was overly normalized and like, like, like, like a query just to return like a few simple facts about a thing took like seven joins. Now, there, there is a certain nirvana to the, the, the, let’s call it the perfectly normalized schema where you will never have to update a column. You can only, you will only have to insert a column because any information added to the database will be new information for a new table.
There will be no need to modify existing information in a user table past a certain point. The EAV tables are a separate one. I’ve had, I am a big fan of the EAV or entity attribute value tables when column, columns not, don’t have like a similar prefix. It’s when columns start getting suffixed with things, especially numbers. A easy way to think of it, phone one, phone two, phone three, email one, email two, primary address, secondary address, stuff like that. Because as you know, that’s a great way to end up with a table that’s real wide. And most of the columns aren’t used by most people. You added a whole new column because one, one customer wanted to add one field to something. That’s a bad time for everyone. Um, I, I really, I don’t believe in that and nor do nor do what, and I, I hate sparse columns in SQL Server. They are a complete waste of everyone’s energy. Uh, and they don’t work with, uh, with index compression. You cannot page or row compress stuff in a table where sparse columns are involved, which makes them, uh, I mean, just, I, I hate them for that. Whoever, whoever didn’t, whoever, whoever from Microsoft decided to make that, uh, a point of contention with interoperability between the two features was a real knucklehead. Um, I hope they get dragged out into the street and left there. Um, so I don’t, I don’t have any problems with entity attribute value tables because what they make it real easy to do is if someone needs a new piece of information added, all they have to do is add a row to a table. They don’t have to add a whole new column that impacts everyone else. So that’s a much nicer way of doing stuff. Um, so I don’t have any problem with the long thin table, the entity attribute value table, nor do I have any problem with, you know, normalizing out data to a certain point. Um, I’m not saying you have to get to the normalized nirvana where your workload is basically insert only that, that, that, that takes a certain level of, of, of skill and cunning that, uh, most people would, would are hopeless in getting there. But I think it’s actually kind of a, is it is kind of a cool goal for some workloads. All right. So, uh, hopefully that answers your question there. Now the next question we have is how can I keep our devs happy by avoiding store procedures and embracing their preferred ORM, GORM in this case. All right. Uh, not a big fan of ORMs. Well, um, I think it’s funny because developers will never be happy. Uh, developers are not naturally happy people.
Uh, they are, uh, deeply envious, uh, perpetually unhappy crowd. Uh, it’s, it’s, you, they, you just can’t keep them happy. They’re, I don’t know, they’re, they’re difficult, right? Uh, they always want to be working with something newer, something shinier. They always want something different. They always think that whatever is currently in place is a mistake. And with, if they had their druthers, they do something different and get something different and, uh, everything would be better, right? It’s just, you know, this, this constant, like chasing utopia that developers have that, um, they’re, they’re, they are rarely right about. Now, one thing that you can do is you can let your developers be happy with their ORM until they’re unhappy with it. Uh, the, the approach that I usually take is, you know, you know what, you, you know what, you want to use an ORM. You want to write all your queries with it. Cool.
The second you are unhappy with the performance of a query, then we are going to talk about something that’s not your ORM because your ORM is going to be tremendously inflexible and have all sorts of limitations placed on it as to how you can actually run your queries to get, get them to perform well.
Uh, you know, it’s very, you know, when you think about something like entity framework, the number of problems that could be solved, uh, with an entity framework query, if you could just use temp tables naturally with it, tremendous. Uh, you know, there are lots of things that you can do in the context of a store procedure that use more advanced, um, let’s just say SQL generally, because ORMs obviously can be used across a wide variety of databases. Uh, there are all sorts of things you can do in the context of, uh, of, of a, you know, let’s just say a non ORM query, right?
Whatever, whatever, whatever protocols are, you know, your database uses SQL Server, of course, big on the store procedure. Uh, so what, what’s, let’s just stick with that. You know, there are all sorts of things you can do within a store procedure to, uh, fix performance problems, that, uh, you end up with when you’re using ORMs that just generate those one big monolithic query, you know, like sure every, a lot of people out there have, uh, you know, they had developed or been like, Oh, this query sucks. I don’t understand why you look at like the SQL that it generates and you see like select extent 40,920. And you’re like, Whoa, it’s a big query.
Yeah. Like SSMS like chokes up and freezes. When you try to look at the query text, forget the query plan. That’s out of the question. Uh, so there, there is a certain breaking point with ORMs where they are no longer the best approach for things. So my advice would be to let your developers use their ORM until they are unhappy with something that is happening with the ORM and then offer them the, uh, alternative of a store procedure. It may not be, uh, your ideal scenario, but the thing is you’re not tasked with doing the development for the app. You are tasked with some sort of administration thing. The developers are tasked with, you know, um, like new features, fixing features. Uh, they are tasked with, um, I don’t know, uh, other developer crap. I don’t know.
The usual stuff. Uh, and, and, and, and they need to be able to do their jobs with a certain amount of comfort and, and, and, and happiness, whatever, whatever they, whatever they might actually get out of, out of that. Uh, and, uh, they, they need to be able to do that and they need to be able to do their jobs. If, if it makes it easier for them to do their jobs using ORMs, that’s great. But at a certain point when you hit that, those performance thresholds where the ORM is no longer generating queries that can be handled well, uh, then alternatives like store procedures are where you can come in and say, Hey, I got some ideas. Let me help you out. And, you know, you can, you can get them to be like, you know, happy most of the time. And then they can be like, well, you know, that store procedure really saved our hide. So we’ll, we’ll just, we’ll just be okay with it.
Uh, you know, just one thing, one thing I think is, uh, important to remember about developers is that if they are application developers, if they do not spend a lot of time in the database, the T-SQL that they would write would be garbage anyway, it would not be good. It would, you know, it would most, at this point, let’s be honest, it would come from an LLM and LLMs are just as good as your most mediocre developer. So like you’re, you’re, you’re better off letting them develop where they’re comfortable. And then as soon as they hit like a breaking point with, you know, performance or, um, you know, something else isn’t working the way they want it to do the way they want it. Like, you know, like it could be something related to like isolation level or, you know, some query hinting that needs to be done. Then that’s when you can offer up a deeper level of expertise within a store procedure and say, Hey, I can fix this stuff with this. Let’s take it, let’s take this one thing out of the ORM. You’ll, you’ll live with the rest of it. So that’s, that’s my advice there. So, uh, next up we have, uh, I got a table. You, you got a table. Where did you get this table from?
Where did you, where did you got a table? I don’t know. Someone else give it to you? You steal it? Where, where, where was this table got? Uh, it is a big int in a var binary. For some reason, I’m going to assume that’s var binary max based on information that comes along later in the question.
Uh, for some reason, the app inserts a row in the table and after that updates the var binary field with a value, likely a blob. That’s why, that’s why I was guessing that it’s a var binary max. I’m seeing blocking issues or rather, I’m seeing blocking because of this. Any recommendations on how to mitigate it? Well, it’s certainly an interesting pattern. I don’t necessarily, uh, have a problem with that pattern off the bat. It is a little bit strange to not just insert everything all at once, but perhaps there was some good reason for that historically. I don’t know. Uh, but you know, most, most blocking problems do come down to indexing. Assuming that this table is either a clustered table, meaning it has a clustered index on the var binary field, uh, or, um, perhaps somewhat less likely. Maybe it has a non-clustered primary key on the bar, on the, uh, big int column. And then the, uh, the, the int in the, the bar binary max most likely are, uh, left in a heap structure. Um, I would say that your, your better bet there would be the clustered table because you would only have one object to modify, even though you would still be doing two modifications, you would only have one object to modify that might be slightly better. My bigger concern is that there’s something more going on here though. Uh, because even in a worst case scenario where you are inserting a row, which in this case would be one big int, or maybe you haven’t even reached big int status yet. Maybe you’re still inserting ints, but let’s say that you, you, you’re inserting one big int row with a null in the, in the var binary, probably max field. And then you are updating the, uh, the, the var binary field based on the, um, based on what I plus, let’s just make it easy. Let’s just assume that it’s an identity value. Uh, you’ve returned like, um, Oh, whatever scope identity or something, or you output the identity values from something. And then you do that update there. If you’re only doing that to one row, the blocking shouldn’t be that profound. Now it could pile up a bit, but, uh, it typically, like I typically wouldn’t expect to see it going on that long unless something else were happening.
Uh, the something else happening would perhaps be, uh, transactions not being committed, uh, in a timely manner, uh, perhaps sleeping transactions, leaving, uh, things open for too long. And that’s why things pile up. Um, if, if this were, if this were really, really high concurrency, I suppose that you could be having, uh, last page contention issues on the, the big int table, but I don’t, but that, the, that, that information is, is left to, left purely to my imagination. So I would be more concerned that there is something else going on. Um, you know, not necessarily something going wrong between like the begin transaction, insert a row, update a row commit. I would be concerned that something, that something is not getting to that commit, uh, when it should, um, it could be an application bug. It could be some sort of, um, you know, like, like data return thing. I don’t know.
So, uh, the first thing I would look at is indexing. The second thing that I would, I would look at that would be, uh, you know, if depending on how you’re catching the blocking, if the blocking is going on long enough that the block process report catches it, uh, I would want to look for sleeping sessions in there. Uh, if the blocking does not go on long enough for the block process report to catch it, or the blocking is very short and you’re catching it with like SP who is active, or maybe you have a monitoring tool, then I would look for the sleeping transaction, the sleeping session stuff in there. Because a lot of the times when I see what looks like a very simple interaction with the table and there is a whole bunch of blocking going on because of it, uh, it is usually because transactions are not committing when they should. So that would be up to you and the app developers to figure out. All right. So the final question of this one, we’re going to close out the month of January with, with this question. Will you be at SQL bits this year? And the answer is no. Uh, SQL bits has done some, uh, interesting experiments with, uh, their, their paid training days the last couple of years. Uh, the year last, um, they, they changed the payment structure. Uh, so the pre-con speakers, uh, would max out, and I think it was like, it was 5,000 pounds, which would, I don’t know, like, I don’t know what the current exchange rate is, but let’s just say that’s somewhere between five and 6,000 USD.
Uh, which made it sort of economically not feasible for me to go to London for a week because I would lose my shirt on air travel and hotels and food and not working here. Right. Like, like, there’s a lot, a lot of money that goes out the door to attend a conference. Um, you know, as you know, and, uh, you know, before anyone chimes in that like they can find a ticket from America to London, like on like, you know, question mark airlines sitting in the cargo hold for 300 bucks, that is not how darling data flies. Okay. We, when we’re, especially when we’re going internationally to go teach classes and go have to be happy and peppy for a week, we, we get the nice seats. Okay. So that like, like SQL bits made it impossible year before last to, for me to go. Right. Like I didn’t even submit this year. They, they changed things again. I’m not exactly sure what the payment structure is for pre-cons, but what they, but what they changed is, is how they do things. Normally pre-cons happen pre the con, right? Before the conference, there’s like two or three days of paid training and then the regular conference. Uh, SQL bits has changed it so that they’re offering like basically like a four or five day conference with training days taking place on every day.
Uh, I don’t know what they changed the payment structure to, but one of the other things that they changed was that they are curating their speakers. Now they are, they’re having some elite team of, of technologists and visionaries and luminaries and all that other stuff, uh, curate their, their, their, their paid presenters. And this year, sadly, I was not curated.
Um, there was some sort of lapse in the curation team’s judgment and, uh, I was, I was left, I was, I was left unemailed. So if, if you’re, if you’re a SQL bits attendee and you would like to see me there, it would be worth your while to get in contact or fill out whatever survey materials SQL bit sends out and tell them that you would like to see me speak. I’m not, I’m not expecting an avalanche of this to happen, but if you, if you, if you care enough to ask and you care enough to get a response, well, uh, that, that is one way to perhaps affect the steering of the curation committee in the future so that, um, you can get something beyond the side one track one list of speakers that most conferences pick up. So, uh, those are, that’s the end of the questions for this, this office hours episode. Uh, thank you for submitting your questions. Uh, I hope you enjoyed the answers. I hope you learned something from the answers and, uh, I, I do hope to keep this going.
So, you know, um, more the merrier, send in, send in what you got. Uh, and I will see you in the next video where we’ll, we’ll talk about, oh, I don’t know this and that. Got some, got some more stored procedure stuff to talk about. And, uh, I actually have another, another batch of office hours questions that I’ll get to there too. So thank you. Uh, goodbye.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 25% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.