SQL Server Performance Office Hours Episode 9
| My company (using SQL Server 2019 Standard) has an algorithm that keys addresses into a varchar(40) has a cross-reference database that assigns an identity property to each new value, allowing us to post the numeric in our datasets. Production has to search their generated string keys in this database’s table to get the integer key in return. We have ensured proper string data typing on lookups and have unique compressed indexes on the string values. What would your next tuning step be if this was not getting the performance you needed? |
| What are your thoughts on a proc with heavy dynamic SQL used to provide for flexibility – to include extra tables to join, table variables of IDs, and usually resulting in multiple query plans? Is there a best practice to handle this other than “create separate procs” (and stop using table variables)? |
| What options do we have for optimizing queries against tables that store JSON data in SQL Server? There are often queries that update the JSON in place or depend on some JSON value for filtering. Would indexing a computed column make sense over an indexed view? At what point do we start trying to design a normalized table structure to store this data? |
| Filtered index over IS (NOT) NULL – good, bad, or ugly? |
| how does RCSI/Sanpshot isolation work when crossing databases, if either the calling DB or target DB does not have it enabled? |
To ask your questions, head over here.
Video Summary
In this video, I dive into a variety of SQL Server-related questions during my Office Hours session, covering topics like optimizing queries against JSON-stored tables, parameter sniffing and execution plans, the use of table variables in stored procedures, and snapshot isolation across databases. I share insights on how to effectively troubleshoot performance issues, discuss best practices for dynamic SQL and procedure design, and provide advice on when it might be better to normalize data storage rather than using JSON or XML columns directly. Whether you’re a seasoned DBA or just starting out, there’s plenty of valuable information here to help improve your SQL Server skills.
Full Transcript
Erik Darling here. Guess what? If you were feeling a strange vibration in your soul, if you woke up this morning and you thought, wow, today’s going to be a good day, it must be because it’s time for Office Hours. My favorite days, favorite moments in my life answering these questions. If you would like to ask a question for Office Hours, you can do so at this lovely link right here. It’s available in the video description, I promise. I make things easy for you. The other thing that I make easy for you is if you would like to join this channel as a paying member to say thank you for the extraordinary amount of work and effort that I put into these things. PowerPoint’s free. Microsoft, have you seen Microsoft subscription charges? They wanted to charge me an extra 30, some 40, 40 bucks a year to add Copilot to things. And then they had the audacity to put an asterisk and say Copilot included. And like, of course it costs 40 more dollars. So here’s a big tip for you. Here’s a money saving tip for you. If you got an email from Microsoft that was like, we’re going to start charging you more. If you go to cancel and you like click through all the yes, I’m out buttons, you can actually sign back up for like the Office 365 Classic that doesn’t have any of the AI stuff for the normal price. So screw you Microsoft. Pigs. Anyway, since I just saved you 40 bucks a year, if you want to contribute 40 bucks a year to me, you can sign up for a membership. Anyway, if you are just broke because Microsoft took all your money, Microsoft bullied you in the cafeteria. You have zero dollars left. You can do other things that support the channel. You can like, you can comment, you can subscribe. And I don’t know, I think that’s about it. I’d say you can give me a hug, but you know, I don’t need any social diseases at this point in my life. If you need help with your SQL Server, if you think, gosh, it’s time for young, good looking consultant to come fix all my problems for me, because Copilot ain’t doing it. I am available for all of these wonderful SQL server activities. And as always, my rates are reasonable. If you want some training on SQL Server, if you are just like, man, I have all this room in my brain, what am I going to do with it? You can get all of my performance tuning content for 75% off that is about 150 US dollars. And you get that for your entire the entirety of your life. So you know, do your don’t die.
Before you finish watching anyway. If you would like to catch up with me live and in person, I will be at SQL Saturday, New York City on May the 10th. And there will be a performance tuning pre-con that I will be, I will be in the room for, given by Anderias of Autor on May the 9th. And we together will give you cookies. And it’ll be nice. But with that out of the way, let’s answer some office hours questions over here. We got some we got some long ones today. Boy, howdy. Let’s see here. My company using SQL Server 2019 standard. I feel like we need like a montage cut on this one has an algorithm. It’s a fancy word for query that keys addresses into a VARCAR 40.
Okay, as a cross reference database that assigns an identity property to each new value. Okay, getting there, allowing us to post the numeric in our data sets. Sounds good. Production has to search their generated string keys in this database’s table to get the energy key column. We have ensured proper string data typing on lookups and have unique compressed indexes on the string values. What would be your next tuning step? What would your next tuning step be if this was not getting the performance you needed?
Well, I mean, there’s a lot of information in there. What there isn’t in there is an example of the query or an example query plan for any of that stuff. There are many ways for you to share this stuff with me. There’s a wonderful website called dbfiddle.uk. My friend Jack runs it and you can put example queries in there. You can use sample data. You can do all sorts of stuff. If you want to share query plans, there are a number of ways to do that.
Paste the plan.com, anything that you can paste the XML into, and I can answer questions like this better. But with most things performance, the question comes down to what is the server telling you and what is the query telling you? It’s difficult to say if this is a performance problem specific to the query and process that you have in place.
Like it could be a server level, like something is messed up in a muck, or it could be at the query level. And in which case, the things like the execution plan will tell you where you are spending the majority of your time. Since you are on SQL Server 2019, you have this wonderful facility where when you get the actual execution plan, it will tell you where you actually spend time in your various operators.
And then you can look and you can say, gosh, I spend a lot of time here. How can I make this better? So that’s where I would start. I don’t know where to go based on this word problem, but I don’t know exactly what I would do based on this word problem.
But I know exactly what I would look at to figure out where things are slowing down on me. You know, it could be that you need to watch the server and all this stuff is happening. And you might see that the server is overloaded in some way, either from a CPU or memory or locking perspective.
Perhaps there is some clashing between you generating these FARCAR 40s and identities and other stuff, and you trying to read from those tables. I don’t know.
But I do know where I would look. And hopefully now you know where to look too. All right. What are your thoughts on a PROC?
Procedure with heavy dynamic SQL used to provide for flexibility to include extra tables to join, table variables of IDs, and usually resulting in multiple query plans. Is there a best practice to handle this other than create separate procedures and stop using table variables?
Well, table variables really shouldn’t result in separate plans since table variables have a recompile rate of exactly zero. You should be reusing execution plans if your queries are properly parameterized and, you know, you send in two of the same query.
It should be happening. There may be something going on with table variable deferred compilation. If you’re on SQL Server 2019 plus and you’re using compat level 150 or better, you might need to look at that. But in general, I have no problem with this.
I don’t even have a problem with there being multiple execution plans for things. The plan cache has been babied and treated with kid gloves for years, and it has done nothing but abuse and mistreat us, clearing out frequently, not giving us all the information we need.
So I don’t really see a need to be kinder to the plan cache. So screw you, plan cache. If you are overloading query store because these queries are particularly busy, that might be a slightly different matter.
But in general, I don’t have a problem with this, and managing plan reuse generally comes down to proper parameterization. And, you know, if you have this, you know, people are just constantly generating queries that need to do different stuff because you are obviously, like, joining to different tables on some occasions.
You might be selecting different columns. You might have completely different where or join clauses based on this stuff. You might even throw in some exists or not exists just for, you know, happy fun time giggles. It’s one of those things where, like, I don’t mind heavy dynamic SQL because heavy dynamic SQL has been very, very good to me over the years.
And when you need to provide this type of flexibility to end users, there is no guarantee that separate procedures would get you anything more than properly parameterized dynamic SQL executed with SP execute SQL would get you.
It essentially gets you the equivalent behavior of a store procedure because SP execute SQL is a store procedure. And that store procedure is executing a query, and you will get plan reuse on that query so long as everything lines up appropriately.
So it may not be a solvable problem for you if the number of queries that you are generating is indeed vast. All right. And here’s, oh, God, another long one.
All right. What options do we have for optimizing queries against tables that store JSON in SQL Server? There are often queries that update the JSON in place. Oh, oh, come on.
Or depend on some JSON value for filter. No. When indexing a computed column makes sense over an indexed view, at what point do we start trying to design a normalized table to store this data now? There are two good times to design tables to normalize JSON or XML or any other type of blob-ish data.
When you first design the application and today. When you have to do, like, blobs like that, ideally in any database, are retrieve only. You just get them out and you return them somewhere where they have to be returned to.
You should, if you need to parse them apart, you should do that in the place that you return them to. Doing this stuff in your database is just a crappy idea. Don’t do it to yourself.
It hurts. It’s painful. I’ve seen way too many people try to do this, fail miserably. And by the time they realize how miserably they have failed, it is far too late to re-architect the number of things that would need to be re-architected in order to solve the problem. And you are stuck.
Well, XML is a particularly awful beast. But JSON, Microsoft smartened up a little bit. But the really bad part about XML is that, like, the XPath stuff isn’t valid as a computed column unless you put it in a scalar UDF. And as you all know, if you put scalar UDFs in a computed column, even if you index them, life is hell.
So the JSON stuff is a little bit more forgiving with that because you can create computed columns based on JSON and you can index them. And you can, like, do that without the presence of a scalar UDF. So that would be one way of doing it if you have, like, a minimal amount of data inside of there that you need to, like, filter on.
That doesn’t help you with the modifying of the JSON. In fact, it would probably hurt you with the modifying of the JSON because now you have to maintain the separate computed column and index structure when you modify the JSON. So if you want my personal opinion, just don’t do this, period.
Parse out your JSON, store it in a table, don’t update it directly, don’t filter on it directly in a query. It’s just bad news for performance. Let’s see here.
Ah, blessedly short one. Filtered index over is not null. Good, bad, or ugly. Perfectly fine in my estimation. Just make sure that when you create your filtered index, at bare minimum, the column in your filter expression is included somewhere in the index. It could be a key column.
It could be an included column. In my experience, the include column is just fine for it because you are probably going to be, it’s probably going to, like, with no other predicates in place, it’s just going to be a scan of the index anyway because SQL Server knows that all of the data in the index matches your requirement.
The nice part about is null and is not null is you can’t really, like, they don’t have the usual problem with, like, bit columns where you could be searching with a parameter. Is null and is not null, pretty much have to write those out as literals every time, so you don’t have to deal with that as much. So just a fine thing to do.
The only caveat’s there. Again, make sure that the filter, the column that is in your filter expression is somewhere in the index definition. And, you know, what do you call it?
There was one other one. Make sure that it is a useful filter. If your filter expression gets you, like, doesn’t eliminate, like, half of the table data or better, it might not be worth it in the long run. Like, if you still have, like, 75, 80, 90% of the table, like, covered by whatever filtered index, what are you really getting out of it?
Not much. Not much at all. And finally, question number five.
You can tell it’s question number five because there is a five right here. So this will be the last one today. How does RCSI and snapshot isolation work when crossing databases? If either the calling DB or target DB does not have it enabled?
Well, if you’re in the calling database and you try to set the isolation level to snapshot and snapshot is not enabled, you’re just going to get an error. So you can’t do, you can’t go from, like, here to here with that. I believe that’s also reciprocal.
I don’t think that you can start a transaction using snapshot isolation if you are crossing boundaries into a database that doesn’t have it enabled. RCSI is somewhat more forgiving because you don’t actually have to set anything for that. So the behavior that you should see is that regardless of where the query starts, when you start reading data from a database where it’s enabled, you should see the row versioning stuff.
But, like, for any database where it’s not enabled, your modification queries aren’t generating row versions so they can’t use it. Right? Like, it’s just impossible.
Like, there’s no row versions for them to read so they can’t possibly use that isolation level. So, good for us. We have answered all five questions, hopefully to everyone’s satisfaction. I don’t think there’s much more to say here.
Thank you for watching. I hope you enjoyed yourselves. I hope you learned something. And I guess I’m going to try to cue up five more to answer. We’ll see how it goes.
Anyway. Cool. Thank you for watching. Thank you.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.