Indexing SQL Server Queries For Performance: Unpredictable Searches
Thanks for watching!
Video Summary
In this video, I delve into indexing strategies for handling unpredictable searches in SQL Server. Unpredictable searches are essentially any queries where the WHERE clause, JOIN conditions, or selected columns can vary widely—think dynamic SQL within stored procedures or ad hoc queries from front-end dashboards. I emphasize a common pitfall: using double wildcard searches on every column with a parameterized search string, which can lead to significant performance issues due to full table scans and implicit conversions. To address this, I introduce the `RECOMPILE` hint as an effective solution for many scenarios, explaining how it allows SQL Server to use actual parameter values in the query plan, thus avoiding caching plans that might not be optimal. Additionally, I discuss the challenges developers face when dealing with such queries, including the lazy approach of relying on `RECOMPILE`, and highlight upcoming features like Microsoft’s new “Oppo” optimizer for handling optional parameters more efficiently in future SQL Server versions.
Full Transcript
Mmm! Mmm! Mmm! Mmm! Mmm! Mmm! Mmm! Mmm! Mmm! Erik Darling here with Darling Data. As you’d well expect at this point, if you’re surprised by any of this, I don’t know what’s wrong with you. Maybe you should start taking notes about your day. You might have some sort of head knock that’s plaguing you in some way. I would suggest getting that checked out by a medical professional as well. At this point, I feel like this is a well-established set of facts. I am Erik Darling, and my company is Darling Data, and we talk about SQL Server. That is our goal. That is our role goal. In today’s video, we are going to talk about indexing for performance with unpredictable searches. You can consider an unpredictable search as anything that generates a query. Intel. Shut up. Intel. God damn it. Anything that generates a query that you don’t know what the WHERE clause is going to be, or maybe you don’t know what the JOIN sometimes are going to be, or what columns are going to select, or something like that.
It could be dynamic SQL in a store procedure, or it could be some ad hoc query generated by a front-end dashboard type thing that you have created, hopefully in a sober and thoughtful way. But that is what an unpredictable search is for me. Now, you have no idea what users are going to ask for, or maybe even what quantity they are going to ask for it in. There are all sorts of crazy things that you have to account for. But of course, before we talk about how you can account for that, we must talk about other accounting matters. More like my accounting matters. We are being honest here.
If you would like to support the content that I create on this channel somewhere in nearer around this channel for the low, low price of $4 a month, you can click the little video description in the video description. Click the link in the video description. There we go. Those are the right, that’s the right chain of words. And you can become a member. $4 a month. That’s about $3.10 for me after YouTube takes out taxes.
If you are just too encumbered by other debts and ransoms and your money is already allocated fully to other endeavors, you can like, you can comment, you can subscribe. There are all sorts of things you can do to let me know that you love me and would die for me. If you look at the things that I talk about on this channel and the first thing you think is, gosh darn it, that Erik Darling sure can make SQL Server faster in exchange for money.
Perhaps I could do one of these things for you because I’m very helpful when it comes to these things. And not only am I helpful, but my rates are reasonable. And while we’re on the subject of reasonable, gosh darn it, have you ever wanted to get 24 hours of SQL Server Performance Tuning training for $150 and not have to pay another $150 every year for the rest of your natural life?
You can do that with my training. It’s amazing. You click on that link, you enter that discount code, and kaboom, you have what I just described. It’s pretty wonderful.
Mouse. No upcoming events 2025, blah, blah, blah. We’ll talk about that later. But now let’s talk about this indexing conundrum that we have. First, I’m going to close event viewer.
I was troubleshooting a blue screen earlier. I know. Do my talents end anywhere? Yes. Pretty much where the computer ends.
The limitation right there. Where the computer stops. No idea what I’m doing outside of that. It’s funny how that works.
So we’re going to talk about one of my favorite things. I’m going to move some of this text down a little bit there so it fits on the screen better. One of my favorite things that I help clients with are big ugly queries. I know. Weird that a performance tuner likes big ugly queries.
Suppose it’s a lot easier to take a big ugly query and chop out all the easy stuff and make it go faster than to take what looks like a perfect query and tune that better. But, you know, some people just don’t know what they’re doing in either case. So that’s why people like me exist.
So one of the most common things that I see people start with looks something like this. What you see my head is mildly infringing upon. Where someone will have a parameter or whatever called search string.
And that search string will be used to double wildcard search every column in a table. This is one of the worst possible things you can do if you need to handle unpredictable searches in SQL Server. I beg, I beg you not to do this.
This will end up terribly. You may, you may look at some of the columns in this list like owner user ID and creation date and last activity date and think, hmm, Eric, I don’t think those are strings. And I’d say, well, you are right, but you’re searching them like strings and all hell’s going to break loose.
It’s not going to be a good time for you or your SQL Server. So, like, you really shouldn’t be doing anything that even looks like this. The problem is somewhat obvious if you’ve been using SQL Server for long enough.
Double wildcard searches and searching columns with mismatching types will lead to not only big scans because of the double wildcard, but also you’re going to be dealing with implicit conversions. Everything, the title column actually is a nvarkar 250 or something column.
So that would at least not result in an implicit conversion right there, but the rest of them you are hosed on. It’s going to be a bad time. What you might think is that using something like some of the built-in functions in SQL Server might be more highly performative, not performant, we don’t say that word in this channel.
I might beep that out later if I can figure out how to beep things out. You might think that it might be more performative. There might be more performativeness if you use car index or pad index, but generally there is not. There are some edge cases where I’ve seen them, like, do somewhat better, but it’s not really worth talking about.
But the problem in, like, the way that you deal with null parameters, if you use, like, isNull or coalesce or anything like that, it’s not a more clever scenario than doing something like this. Both of these end up with the exact same problems unless you use a statement level recompile on your query, right?
Something like that. If you do this, you can get around most of the issues that come out with these unpredictable searches. Most of the time, this will be Mr. Fix-It for you, right?
This will do okay. And for a lot of people, this is the path of least resistance, and this is the simplest thing to do, because the alternative is writing a lot of dynamic SQL to cope with which parameters actually need to be part of the where clause. I’m going to show you an example of just using recompile, it being nice and easy, and as long as you’re not allergic to compiling a query plan every time a thing runs, whether it’s because, you know, you are burned out on CPU anyway, if you have CPU slack, and you’re able to compile queries whenever you want, this is a perfectly fine thing, as long as that, like, the compile time for that query is not awful.
If you, every time you, every time you run this query with a recompile, it runs for a long time, but the time isn’t, like, in the query plan, you might want to check the compilation time to make sure SQL Server isn’t going off on one of its little, one of those little, like, thought cloud staring things. So there is that.
Now, the problem with developers is that they are often quite lazy in the database. They often think that they are far more clever than they are when it comes to things in, doing things in the database. You know, SQL Server is an expensive piece of equipment, but people treat it, it’s kind of like a garbage dump for, like, their worst code.
The recompile advice is generally good enough, but when you use it, again, just pay attention to compile times. There’s like a certain, like, you know, if it’s like, you know, a few hundred milliseconds, screw it. But like, if it’s, you know, getting up into like five, 10 seconds of compile time, it might be, you might have to think about alternate ways of doing this.
SQL Server right now doesn’t, right now doesn’t offer any great programmability or optimizer support for the types of queries I’m talking about. However, it was, it was a thing that came up in the SQL Server 2025 release notes that there is a new optimizer feature, feature called Oppo, which is the optional parameter problem orifice, or I forget what the other PO is for. The optional parameter problem or something.
So it looks like Microsoft is taking some steps to try to address this in SQL Server 2025. So, you know, sooner or later, you will probably see that in Azure SQL DB. And if Microsoft ever decides that it cares about managed instance again, you might even see it come to managed instance.
Microsoft’s track record with getting things cloud first has not been awesome lately. There are signs that there may be a return to that ethos, but who knows? It’s a bit weird.
But anyway, sometimes writing good queries does require extra typing and thinking and that’s often not what developers are famous for. You want to start throwing weird features at it like Hecaton and memory stuff. You feel like you need to partition your table when you don’t.
You want to start, you want to move to Postgres because you swear Postgres will just do better at everything. You’ve heard it’s this magical unicorn and everything’s better there. Or you want to start using NoSQL, you know, stick it all in Elasticsearch or MongoDB or whatever.
Or you decide that maybe you just need to build your own ORM. Start from scratch. It’s 18 to 24 months of work.
Far more interesting than five to 10 minutes of typing some extra characters into SSMS. So, if we have this query and like a store procedure, let’s say, this gets a little more convenient to show this here than like a dumb front end that I would make badly anyway. And we stick a recompile hint at the end of this thing.
This will do okay. Without the recompile hint, this will go really, really poorly. You might notice that the time over here at the end is six minutes and 17 seconds. That is a very long query.
A lot of that is because of a bad memory estimate where the sort does take an additional like six minutes and three seconds. Because when we get up to this filter operator, we’re at 13 seconds, which still isn’t great. Right?
We can see very clearly that we spent a lot of time in other places in the query leading up to that six minute ordeal. But the main problem is without the recompile hint, the predicates in your search, and this doesn’t matter, again, if you use is null, coalesce, whatever other clever arrangement you think you’ve found that makes the optimizer do smarter stuff. That predicate is always going to look something like this.
This is because SQL Server has to cache a plan that’s safe for any outcome of these parameters being null or not. The recompile hint gets around this by allowing for something called a parameter embedding optimization, which allows SQL Server to infer the parameter, take the parameter values and use them in the query as literals. So it doesn’t have to cache a plan that’s safe for anything.
It can just, it can just say, hey, there’s a plan for these values. This is as good as I can do, which may not always be great either. But that depends on a lot of other things like indexes and whatnot. So you should really avoid this sort of thing without the, without the benefit of the option recompile hint, because it will, it will go quite poorly.
If we do use option recompile with this store procedure, things will go generally okay. Now, the thing that I run into a lot is that with store procedures like this, there’s usually a value that people think users will always search on. And they’ll design an index or two that makes sense for those types of queries.
But then as soon as you depart from that, as soon as that one value isn’t involved, things get really, really bad. That’s especially true without the recompile hint, because you cache a plan and SQL Server reuses it and you might have used the entirely wrong index for that query. So for our purposes and for the intent of the query, let’s pretend that we think we’re always going to be searching on owner user ID.
And we know that we’re always going to be ordering by score descending. That’s exactly what our query does up here. We have an equality predicate on owner user ID and we have an order by score descending.
This does get a little bit more complicated if you have dynamic sorting allowed as well. But we’ll talk a little more about that when we get it somewhat further down. So we can design an index that looks like this, that takes care of our immediate equality search and our ordering.
Right. So we can search to whatever user IDs we care about and we have the score descending column in order based on that equality search. And then we can put any secondary search columns over here. Now I’ve done that for the date columns.
These are going to be residual predicates. You cannot see to these because the score column is in the way. Again, we’ll talk about that a little more in a moment. But since post type ID is an incredibly unselective, it’s a very dense column. I have stuck that in the includes because it doesn’t matter so much in this case.
So I already have this index created, I believe. I mean, it should anyway. Yes, I do.
Good. We got an error there. I’m smart. So like if I go and run these two queries with where I do use owner user ID, like the plans that I get from these are perfectly fine. Right.
We get two seeks. The SQL Server does okay enough cardinality estimation from these. No one is upset or hurting from this. And where this query. So like what the parameters that I used for the very slow plan up here.
This were these were actually the creation date and last activity date parameters that you see in here. These things. So that’s what I passed in to get this slow query without the option recompile with option recompile.
SQL Server makes a much better choice overall. Just chooses to scan the clustered index. Doesn’t choose to reuse a plan.
Gets close enough to, you know, okay cardinality estimate. And like, you know, this thing finishing in under a second without it without a good supporting index. Totally reasonable.
Now, where things get tricky, of course, is that you when you start writing queries like this that are unpredictable, you have no idea what set of indexes to roll out because you have no idea what are going to be the most common set of search criteria that people pass in. And it takes kind of a lot of a lot of sort of like logging and analytics to figure out what the most common search things are in like, like, like if they’re slow or not, and then how to index for them. And you can end up with a lot of different permutations of sort of a similar index definition when you when you when you go down that route.
Now, this is a pretty simple thing because we’re just hitting one table, right? We’re just we’re only hitting the post table and we have all sorts of search parameters against that. So, but, you know, if we were to think about this for a query that might touch more than one table, you might be dealing with more than one search element and things might get really tricky.
Because now you have to think about indexing multiple queries and taking into account join keys and stuff like that. There’s all sorts of things that get really, really tough and complicated with this. So, if you don’t have the ability to rather let’s say much like the $4 a month that you could use to become a member of this channel, let’s say you can’t afford to recompile this query every time.
The way to get around that is to write dynamic SQL, the safe parameterized kind that avoids SQL injection and has equivalent plan reuse functionality to any other store procedure that you would run. Because we execute the query using SP execute SQL, which is a stored procedure and you get the type of parameterization and plan reuse and all that other good stuff that you would find in other cases. But you would use that to build a sort of a custom where clause based on what parameters apply to the particular search that you’re running.
So, if we recreate this with all that in mind, Oh, jumped around on me a little bit there. These things will all still be okay.
All right. We’re going to get, you know, pretty much equivalent performance to what we saw with the recompile. We get the two seeks here with the, you know, I mean, we do reuse this plan. So, the cardinality estimate is reused, right?
We can see that there is plan reuse with this. So, this guessed one of 82 and this is 5,000 of 82. So, the guess of 82, rather that cardinality estimate of 82 rows persisted there. And that also lives on with this.
If we executed more than one variation of this, that we would reuse the estimates for that plan, right? So, no big deal here. But again, coming back to what’s difficult with these things is knowing how often they run. Query store is pretty helpful for this sort of thing.
Because, well, if you’re smart and when you write dynamic SQL, you put in a comment with the procedure name that something comes from. You can use my free store procedure, spquickiestore, to search query text for this type of token. And you can find all the queries that run and see how many executions they have, which is a very useful thing.
So, the problem with rowstore indexes is that the order that the keys are defined in defines how queries can access stuff. Now, there are clients who I’ve been working with for years and I’ve been talking about indexing for years. And they’re still unclear on the fact that if you have an index like this that leads with owner user ID, that is not the same as if you have an index where owner user ID is like the second or third or fourth or fifth key column in the index.
Like, by that point, you have lost any useful sorting for searching stuff. Multi-column indexes really are only useful for either things that search for the leading key column or things that search for the leading key column and then other stuff. Right?
Or, you know, ordering and other things. But, you know, if we’re thinking about just being where clause centric here, this is really when, like, when rowstore indexes, multi-key rowstore indexes are very useful because you can seek to owner user ID and then you can seek to score or order by score. And then you can seek to creation date or last activity date or post type ID or ID.
You could, you could seek, you can seek across all the keys, but you have to start seeking with that first key column. Otherwise, it’s just a scan of everything because there’s no helpful, it’s not ordered in a helpful way after that. So if we, if, so like, for example, like with our other query where we need to search on these two columns, we’re not accessing these two columns first via where clauses or anything.
So everything is all out of sorts for those, for those searches. That’s why we ended up scanning the clustered index. So what you generally want to do is, if you can, and there are lots of considerations for this, and this is, you know, we’ve already kind of gotten to the 20 minute mark here.
And somewhat, some of you may feel dissatisfied that I have not gone into all of the potential, you know, things that might come up by using columnstore. But it’s sort of generally creating a non-clustered columnstore index, right? And just because it doesn’t say non-clustered in here doesn’t mean it’s not.
I can’t create a clustered index on the post table. It already has a, I can’t, rather, I can’t create a clustered columnstore index on the post table, because it already has a clustered rowstore index. So it is creating a non-clustered columnstore index.
So in general, if you create a clustered columnstore index that, you know, spans the columns you care about searching in your query, things will end up a lot better, and the unpredictability doesn’t matter as much. The reason it doesn’t matter as much is because columnstore indexes don’t have that column to column dependency that rowstore indexes do.
Each one of these columns is stored in its own little, like, vertical up and down columnar index within the columnstore index. And there’s, like, lots of little mini indexes, sort of, unlike all the different segments and row groups. So you can pick and choose which columns you want to hit.
So if you want to hit owner user ID and creation date, or you want to hit score and last activity date, it doesn’t make any difference to the columnstore index what order you search for things in or anything like that, because each one of these columns has independence from any other column that might precede it in the key of the index, unlike rowstore indexes.
So you can make all of your search query, you can give all of your search queries a single, good, highly compressed, very nice place to search for data from, without having all the problems that, of creating multiple non-clustered rowstore indexes in order to try to satisfy every different permutation of a search. So the way that I love to handle this sort of thing for clients, and as long as, like, you know, the shoe fits with using columnstore in their database, you know, like, I think my biggest consideration is, like, if we’re relying on query parallelism to make things fast and you’re on standard edition, we need to really carefully test the columnstore thing because Microsoft hates people on standard edition, and it limits all of the columnstore parallelism to a DOP of 2.
You cannot exceed a DOP of 2, so if you have a query that runs really fast at DOP 8 using rowstore indexes in standard edition, and you start using a columnstore index and you’re like, wait a minute, my queries got slower, even though I’m using batch mode in the columnstore index.
Well, it’s because your query is limited to a DOP of 2, and you can’t do nothing about that. So the way that me, Erik Darling, with Darling Data, likes to handle unpredictable searches by using columnstore is typically a lot easier and more efficient than creating, like, 17 different rowstore indexes to account for everything that someone might search for.
So we’re going to wrap this one up here. Thank you for watching. I hope you enjoyed yourselves. I hope you learned something, and I hope that you will continue watching.
I forget what the next video is about, but it probably, hopefully it won’t be as long as this one, because, you know, the tongue does get tired. And, you know, there’s a wise man who once said, the reason dogs have so many friends is because they wag their tails, not their tongues.
I guess that explains why I spend so much time alone. Anyway, thank you for watching. Thank you for watching.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.