Get AI-Ready With Erik: Combining Search Scoring

Video Summary

In this video, I delve into the world of search algorithms, specifically focusing on reciprocal rank fusion (RRF) and weighted scoring techniques. These methods are crucial when dealing with diverse data sources that require different ranking systems to be combined for a unified, superior search result. I explain how RRF works by summing each document’s reciprocal rank and adjusting it with a constant K, ensuring higher weight is given to items appearing high across multiple methods. Additionally, I discuss weighted scoring as an alternative approach, where scores are normalized between 0 and 1 to provide more precise control over which factors matter most in the search results. The video covers practical examples like SQL Server documentation searches, enterprise bug tracking systems, and performance tuning recommendations engines, illustrating how these techniques can be applied to improve overall search quality without complex retraining processes.

Full Transcript

Erik Darling here, Darling Data. And there’s no demos in this video, so if you just want to skip right ahead and buy the course for $100 off with that coupon up top, you could just go ahead and do that. You can hear me prattle on about this stuff from the comfort of your home whenever you want. So, uh, low, low price of $100 off. So, uh, when, when you are working with vectors, um, one thing that you’re going to have to deal with as a database person is that you might have multiple, multiple things in your data, uh, aside from just the, the vectors that we’ve been talking about that would, uh, indicate that a, a certain, uh, match to your data. That a search, uh, a search, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, It’s a wonderful, it’s a wonderful quote, right?

Makes it, we can sound very authoritative reading this quote. Reciprocal rank fusion, RRF, is a powerful model agnostic algorithm used in hybrid search to combine results from multiple ranking systems, like keyword and vector search, into one unified superior ranking.

Superior, like Perrier sparkling water. Stuff goes straight to your head, I hear. And what it does is it sums each document’s reciprocal rank, which is one divided by the rank from each list, and you adjust it by a constant.

The constant is called K. Why is constant spelled with a K? Perhaps it’s some Germanic influence that I’m unaware of, but that’s what’s in the math, so we’re going to stick with it. Because, you know, people who know math tend to be pretty smart.

So if they want to spell constant with a K, they can go right on and do that. But the whole idea is to give higher weight to height. Sorry, this is a direct quote, so I can’t go off script here.

Giving higher weight to items appearing high across different methods, improving overall search quality without complex retraining. It’s crucial for blending diverse search types into coherent results, commonly used in modern LLM-based systems for better relevance. So just in the Stack Overflow database alone, we have multiple things that might inform us as to what is a good match, what is good content, what is high quality.

Where do we go from here? We have vector distance, right, which is a number from 0 to 2, but lower is better. Okay, so like the lower your vector distance is, the more similar two sets of vectors are.

Then we have, in the post table, we have a column called score. And typically, you know, not, maybe not like, maybe it’s not a perfect system, but, you know, it’s a pretty okay one. We have the score column, and that can be negative, right, or it can be a very high positive number.

But we can’t, like, so we couldn’t automatically just, like, add or subtract score from vector distance because, like, you know, vector distance is going to be a tiny little decimal. And if you have, like, a score of, like, 18,000 on a thing and you’re like, well, oh, I’ll just make it negative. And it’s, like, you’re at negative 18,000 plus, like, minus, well, I mean, like, it’s two negative numbers plus, so it’s more of a negative.

So it’s, like, you could have just a really big, like, negative number, but that would look weird. And then you could also, like, do stuff like keyword boost, like we looked at in the last video, where you can, like, you know, you can use, like, sort of, like, some, like, backup search terms to be, like, oh, well, if it has this in it, then it’s, like, like, Thai food. And then, like, my favorite Thai dish is called, it’s, like, it’s called Rama.

And it’s, like, this spicy peanut sauce thing. So, like, whenever I’m looking for, like, whenever I’m in a new city, if I’m, like, you know, got to hankering for Thai food, one thing I’m, like, well, if you’re a Thai restaurant and you have this Rama dish, usually chicken because I’m a little bit of a coward. But if you, like, if you have this chicken Rama, I’m, like, man, I’m going to you first.

I don’t care how much, like, I don’t care if you have a D for your, like, you know, health and sanitation report. I’m in there. We need some dirty peanut sauce. So, like, that, you know, that’s just, like, keyword boost, right?

Good stuff. So, the first option you have is reciprocal rank fusion, where you ignore scores and use rank positions only because you have to do some very fancy math to figure this out. So, I go over all that math and show you queries that can do this in the full course.

I’m not doing that here because, well, I mean, you got to save something for marriage, right? But you would want to use reciprocal rank fusion when you are combining different ranked lists or rather ranked lists from different sources where the scores are not comparable because they are different units and scales. It’s also very simple and robust and, you know, in that simplicity and robustness, you lose a little bit of fine-grained control.

Where this might make sense is if you are building a SQL Server documentation search and you wanted to combine vector similarity search, which returns distances, and then you would have, like, full text search, which would return ranks, like, 128, 95, and 42. Like, just, like, those two numbers are completely incompatible.

Then you also might want to think about, like, recency, right? Because, like, for a lot of things, I mean, look, I’m like a crate digger when it comes to, like, SQL Server information. Like, usually when I’m looking for something, I’m looking for something real old, right?

Like, the new stuff, I don’t know. Like, maybe if I need to learn about, like, something that just came out, it’s one thing. But, like, usually I’m looking for, like, some old Craig Friedman post. And I’m like, no, no, no.

Get that bottle under the bar that’s, like, covered in dust. Yeah, that’s the one. Right? So, like, that’s usually what I’m about. But, like, all three of these things sort of, like, have their own incompatibilities. It’s, like, really hard to sort of, like, coagulate.

Like, oh, well, you know, this is your vector similarity. But, you know, like, you wanted some keywords in there that were pretty hard for vectors to navigate. So, we did some full text search and got you some numbers back on that.

But do you want the new stuff or do you want, like, that old dusty bottle under the bar? Right? Like, what are you looking for here? So, like, you might have, like, a recency or, like, some, like, sort of temporal time-based way of ranking stuff.

But these things are just not, you just can’t, like, put those things together. Right? Like, that’s what reciprocal rank fusion does.

You also have the concept of weighted scoring, where you would normalize all your scores to a number between 0 and 1. Right? Which is sort of like vectors, like, the vectors of 0 to 2, or the vector cosine of 0 to 2.

The idea here is to put everything on the same scale. Right? And then you would decide how much each factor matters.

Like, you know, I think, like, in the, well, I mean, I know in the video that I do, I show you an example where it’s, like, starting with, like, a 90% semantic and 10% popularity setup. And then, like, how, like, how that changes at, like, 70-30 and 60-40 and 50-50 and stuff. And, like, how that really changes the search results.

But, like, the 90% semantic is usually a pretty good one. But, so, like, what it would mean is that, like, relevance would matter the most. But you still want to be, like, well, this is a really popular post.

So, like, maybe, maybe we ought to think about, you know, like, you know, incorporating that feedback from our voting system in here a little bit. But the thing is that those two metrics just live in completely different universes. You know, think, like, again, going back to the stack overflow data, vector cosine distance is a range from 0.0, which indicates identicality, and 2.0, which means opposite direction.

Right? So, like, 0.0 was, like, you are just, like, giving each other a hug and you’re walking this way. 2.0 was, like, you, like, just, like, broke up and going your separate ways.

You know, like, the three records you let her borrow under the thing and you’re mad because she stole your Cure t-shirt. And you’re, like, man. Stinks.

Anyway. Again, if you try to combine those raw scores, then the post score would dominate everything because, like, if, like, you have some post with 18,000 votes and it’s always going to win, even if it’s completely irrelevant. Right?

If there’s, like, a high cosine distance, then, like, but, like, score being a number that high, you’ll be, like, well, like, even if I turn that into a negative number, it’s still, like, kind of whack. So, with the 90-10 split, relevance still wins, but you take in another feedback. There’s another feedback mechanism at play.

So, just, I mean, just think of it sort of like a pie chart, right? So, it’s, like, the result is 90% about relevance and 10% about popularity. It also makes it easier to reason about things, like, if I want popularity to matter more, I can change the weighting from 0.9 and 0.1, which is the 90-10 split, to 0.7 and 0.3, to, like, a 70-30 split. The 90-10 split essentially means I care about relevance way more than popularity, but I still want popularity to matter, right?

It’s, like, it’s a sensible thing to do. You would want to use weighted scoring when you need more precise control over which factors are important to you. It gives you a lot more knobs to tune, right?

And you can do that for multiple things. It’s not just, like, you know, similarity and popularity. Like, if you wanted to throw recency into the mix, you could do that, too, and you could add that to your weighting measures, right? You could have a three-way weighting split, not, you know, just a monogamous two-way rating split, right?

You got to get, like, get a whole room involved, right? Like an orgy of rating split, but ranking splits, whatever. So the other thing that weighted scoring is good for that, I mean, reciprocal rank fusion is also kind of good for that, is if you want to incorporate, like, a whole bunch of different stuff, not just ranks, and, like, the stuff that you have to combine is, like, just somewhat, like, it’s just, like, too different as far as, like, what it returns.

So, like, let’s say that you wanted to have an enterprise search where recency matters three times more than relevance, right? Like, you, like, maybe you’re, like, it’s only 60% relevant, but if it’s recent, like, we should show it because maybe that’s what someone’s looking for. And, like, maybe, like, like, we have, like, a bug tracking system, and you want to find recent bugs with similar keywords because, like, those are going to be more relevant to something you’re working on today than, like, maybe something from, like, three years ago that, like, has, you know, that is more similar.

But, you know, it’s just, again, it’s, like, something that you’ve dealt with years ago. It’s maybe not relevant to what you’re doing today. Another example would be, like, let’s say you were a SQL, like, you were putting together a SQL Server performance tuning recommendations engine, right?

And because you watch my videos, like, I’m not doing this. If you were doing this out there in the world, you would know from watching my videos that query execution time matters about 50 times more than logical reads because it is not SQL Server.

Just to call back to one of the earlier videos this week, it is not, right, because vector search is not good at not. It is not SQL Server 2008 or SQL Server 2008 R2 anymore. We live in 2026, and if we’re still looking at logical reads, we ought to have our heads checked.

And we also know that things like statistics freshness matter 20 times more than index, right? I mean, that should be, like, it could clearly be 100 times more, you know, like, how many zeros can we put in there, right? Like, having up-to-date statistics is much more important than caring about index fragmentation because, again, it is not SQL Server 2008.

It is not SQL Server 2008 R2, and we should act like we know what time it is, right? Anyway, that’s probably good here. Thank you for watching.

I hope you enjoyed yourselves. I hope you learned something. And I will see you in tomorrow’s big Friday video, which is actually, thankfully, a small Friday video, so I can get you on your way to having a great weekend and can all live in peace. All right.

Thank you for watching.