Get AI-Ready With Erik: Full Text Keyword Search In SQL Server

Get AI-Ready With Erik: Full Text Keyword Search In SQL Server


Video Summary

In this video, I dive into the world of vector search in SQL Server 2025 and its limitations. Starting off with a discussion on why vector search might not be ideal for precise queries like searching for specific versions of SQL Server (e.g., 2008 R2), I highlight how these searches can return irrelevant results due to the nature of vector matching. Moving forward, I explore other challenges such as understanding negations and how certain terms are interpreted by vector search algorithms. To address some of these issues, I propose using full text search—a tool that better understands logical operators like “not”—as a complementary method for enhancing vector search results. While full text search isn’t without its drawbacks, it offers a practical solution to refine searches in SQL Server 2025, making the overall process more robust and accurate.

Full Transcript

Erik Darling here with Darling Data, and we are going to get back into our SQL Server 2025 AI Preparedness Tour. This is, of course, in support of my new album, Get AI Ready with Erik. If you are interested in purchasing this album, not perching on this album, that would be uncomfortable for everyone. You can do that at the link up there. The blue thing, it starts with HTTPS. I realize there’s some other blue things down there, but there’s definitely only just one blue thing way at the top. That link is also down in the video description where you can very conveniently click on it and change your life. So, we’ve talked about a bunch of vector-y stuff, so we’re going to continue on with that. And the thing that we have to, you know, first go through, and I mean, I’m going to say something unpopular at the end of this. I know you might be shocked by that, but I’m going to say something unpopular at the end of this one. But up front, vector search is not good for some things. It lacks precision, one might say. So, if, you know, let’s say you care desperately about SQL Server 2008 R2 for some reason, I don’t know, it’s just, it makes for a good example.

But, and you vector search for it. R2 isn’t like a big, significant, vector-y thing, right? Like, the vector sounds like, oh, R2, D2, right? So, it treats 2008 and 2008 R2, because like SQL Server 2008, that is like, kabow! But R2 is like, not so much. So, let’s run this query against our database. And we didn’t, like, normally, the first thing I do before I run a query is make sure execution plans are turned on, but that doesn’t matter here. So, that’s good for us, right? We don’t need to worry so much about that. We can just run our query and not care about the execution plan, at least not yet anyway.

But if we look at the results here, right? And using my lovely SQL Server SSMS 22 result zooming, look what we get back here. Like, sure, the top result is SQL Server 2008 R2, but with a distance of zero, I’m pretty sure that was the post title that we searched for, right? Because we didn’t filter out the exact search. Well, I don’t know. Actually, no. Yeah, no, actually, that won’t matter here. Good.

Because sometimes with these, you know, you have to, like, if it came from an original post, you should screen out the original post because you don’t want to match your, like, the original post to, like, the, like, candidate posts. But that doesn’t matter for this one. So, we got a bunch of, but if you look at this column, there are a lot of no’s in here, right? There’s, like, five in a row and then a sixth one. What is that? Six in a row? One, two, three, four. It looks like five, six, and then seven, eight, nine, ten.

Yeah. So, yeah, like, six out of the ten results that we got back were not 2008 R2. They were just SQL Server 2008. And that does not bode terribly. That may feel like it doesn’t bode terribly well for vector search. But the thing you need to understand is that, like, vector search is not meant to be, like, the end-all, be-all replacement search for everything else.

There are certain things that you might have to do to bolster searches like this. And we’ll talk about an unpopular option for that at the end here. Another thing that vector searches do not have a very good understanding of is the word not.

Which, I don’t know, if you’ve ever talked to an LLM, like, I don’t know, those things seem to do okay with stuff here. Like, I am not interested in buying a Lamborghini. I am interested in buying a different kind of sports car.

Like, I don’t know, I’m pretty sure they would figure out. But, you know, they’re probably using a whole different set of, I don’t know, maybe smarter, better things inside of them. They can understand not, right?

But if we were to search something like this, right, and we were to say connection string not related to, like, not entity framework. I want a different connection string. I want, like, a Python connection string or something, right?

You should probably say that. But this, again, just makes it for a pretty good example. Where, if we run this query and we look at what we get back, well, we get problems with entity. I mean, this one says EF4.

So, I mean, it doesn’t exactly match to identity. But it’s still that. This one doesn’t have entity in it. But, you know, I don’t know that that’s going to help. These ones are all entities.

And, you know, I mean, this one is still ADO.net. Maybe that’s close enough to entity framework. But at least in my mind, they’re basically the same thing, which is untrue. But, you know, in my mind, I’m just like, you do that stuff?

Cool. I don’t, I don’t, I don’t. You need database help. You should hire me for that. But you can see that a lot of stuff in here does come back as entity framework related. And this is probably another good example of computers maybe not understanding strings too well.

Right? Because this one is EF4. And we were just searching for entity. But I’ll be damned if I’m going to list out all the EFs in this query to search for. So, you know, it’s close enough.

I can still see stuff got weird in there. So the unpopular thing and the thing that you might need if you are going to do vector search in SQL Server is one of my least favorite things in SQL Server.

Both from a, you know, from a performance perspective, certainly. But at least, you know, like, like I used to do a lot of work on a database product called Relativity. It’s a e-discovery product.

I used to be a DBA at a shop that had them. And for a long time, like they’re like early, much earlier. I’m sure it’s much better now. Like way earlier on, they were using full text search for like document search stuff. Because it’s all like lawyers searching through cases, like trying to sue people and find documents and stuff like that.

So there’s a lot of full text search in that. And my, like, my head goes to terrible places when I think about full text search. Because there was like terabytes and terabytes of data in those things.

But, you know, full text search, for what it’s worth, does understand not. Because you can say not contains entity. And you can say contains connection string when you want to search for stuff.

And boy, howdy, does entity framework not show up in this one, right? So like at least, at least it understands not contains, right? We can say not contains and we can skip the entity framework stuff, right?

Or at least entity, right? Or whatever we did. Entity. Yeah, entity. So we are able to get rid of entity using full text search. But there is, of course, a lot of, you know, sort of backroom pain involved with full text search.

Despite its, I guess, advanced understanding of not. So, you know, you could choose to go this path, you know, which kind of sucks, right? It’s maybe not the greatest thing in the world.

However, it could be used to bolster vector search in some places. And, you know, I guess your alternative, if you’re like, you know, full text search, full text that, you could always go back to the bad old days of, you know, double wildcard text searching and say, and title not like percent entity percent. With, of course, the correct unicode thing in front of the string if it is unicode.

Because one should always mind their data types. All right. Well, that hurt to say.

Anyway, that’s about good here. If you’re looking at the top there, you see that I have four more SSMS tabs open. So we have four more tabs of things to talk about. Well, three more tabs of things to talk about at least this week.

And then we will start afresh next week with office hours, getting AI ready and all sorts of other things. All right. Anyway, that’s probably about good here.

Thank you so much for watching. I hope you enjoyed yourselves. I hope you learned something. And I will see you in tomorrow’s video where we will be even more ready for AI in SQL Server 2025. Yo desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk

Video Summary

In this video, I dive into the world of vector search in SQL Server 2025 and its limitations. Starting off with a discussion on why vector search might not be ideal for precise queries like searching for specific versions of SQL Server (e.g., 2008 R2), I highlight how these searches can return irrelevant results due to the nature of vector matching. Moving forward, I explore other challenges such as understanding negations and how certain terms are interpreted by vector search algorithms. To address some of these issues, I propose using full text search—a tool that better understands logical operators like “not”—as a complementary method for enhancing vector search results. While full text search isn’t without its drawbacks, it offers a practical solution to refine searches in SQL Server 2025, making the overall process more robust and accurate.

Full Transcript

Erik Darling here with Darling Data, and we are going to get back into our SQL Server 2025 AI Preparedness Tour. This is, of course, in support of my new album, Get AI Ready with Erik. If you are interested in purchasing this album, not perching on this album, that would be uncomfortable for everyone. You can do that at the link up there. The blue thing, it starts with HTTPS. I realize there’s some other blue things down there, but there’s definitely only just one blue thing way at the top. That link is also down in the video description where you can very conveniently click on it and change your life. So, we’ve talked about a bunch of vector-y stuff, so we’re going to continue on with that. And the thing that we have to, you know, first go through, and I mean, I’m going to say something unpopular at the end of this. I know you might be shocked by that, but I’m going to say something unpopular at the end of this one. But up front, vector search is not good for some things. It lacks precision, one might say. So, if, you know, let’s say you care desperately about SQL Server 2008 R2 for some reason, I don’t know, it’s just, it makes for a good example.

But, and you vector search for it. R2 isn’t like a big, significant, vector-y thing, right? Like, the vector sounds like, oh, R2, D2, right? So, it treats 2008 and 2008 R2, because like SQL Server 2008, that is like, kabow! But R2 is like, not so much. So, let’s run this query against our database. And we didn’t, like, normally, the first thing I do before I run a query is make sure execution plans are turned on, but that doesn’t matter here. So, that’s good for us, right? We don’t need to worry so much about that. We can just run our query and not care about the execution plan, at least not yet anyway.

But if we look at the results here, right? And using my lovely SQL Server SSMS 22 result zooming, look what we get back here. Like, sure, the top result is SQL Server 2008 R2, but with a distance of zero, I’m pretty sure that was the post title that we searched for, right? Because we didn’t filter out the exact search. Well, I don’t know. Actually, no. Yeah, no, actually, that won’t matter here. Good.

Because sometimes with these, you know, you have to, like, if it came from an original post, you should screen out the original post because you don’t want to match your, like, the original post to, like, the, like, candidate posts. But that doesn’t matter for this one. So, we got a bunch of, but if you look at this column, there are a lot of no’s in here, right? There’s, like, five in a row and then a sixth one. What is that? Six in a row? One, two, three, four. It looks like five, six, and then seven, eight, nine, ten.

Yeah. So, yeah, like, six out of the ten results that we got back were not 2008 R2. They were just SQL Server 2008. And that does not bode terribly. That may feel like it doesn’t bode terribly well for vector search. But the thing you need to understand is that, like, vector search is not meant to be, like, the end-all, be-all replacement search for everything else.

There are certain things that you might have to do to bolster searches like this. And we’ll talk about an unpopular option for that at the end here. Another thing that vector searches do not have a very good understanding of is the word not.

Which, I don’t know, if you’ve ever talked to an LLM, like, I don’t know, those things seem to do okay with stuff here. Like, I am not interested in buying a Lamborghini. I am interested in buying a different kind of sports car.

Like, I don’t know, I’m pretty sure they would figure out. But, you know, they’re probably using a whole different set of, I don’t know, maybe smarter, better things inside of them. They can understand not, right?

But if we were to search something like this, right, and we were to say connection string not related to, like, not entity framework. I want a different connection string. I want, like, a Python connection string or something, right?

You should probably say that. But this, again, just makes it for a pretty good example. Where, if we run this query and we look at what we get back, well, we get problems with entity. I mean, this one says EF4.

So, I mean, it doesn’t exactly match to identity. But it’s still that. This one doesn’t have entity in it. But, you know, I don’t know that that’s going to help. These ones are all entities.

And, you know, I mean, this one is still ADO.net. Maybe that’s close enough to entity framework. But at least in my mind, they’re basically the same thing, which is untrue. But, you know, in my mind, I’m just like, you do that stuff?

Cool. I don’t, I don’t, I don’t. You need database help. You should hire me for that. But you can see that a lot of stuff in here does come back as entity framework related. And this is probably another good example of computers maybe not understanding strings too well.

Right? Because this one is EF4. And we were just searching for entity. But I’ll be damned if I’m going to list out all the EFs in this query to search for. So, you know, it’s close enough.

I can still see stuff got weird in there. So the unpopular thing and the thing that you might need if you are going to do vector search in SQL Server is one of my least favorite things in SQL Server.

Both from a, you know, from a performance perspective, certainly. But at least, you know, like, like I used to do a lot of work on a database product called Relativity. It’s a e-discovery product.

I used to be a DBA at a shop that had them. And for a long time, like they’re like early, much earlier. I’m sure it’s much better now. Like way earlier on, they were using full text search for like document search stuff. Because it’s all like lawyers searching through cases, like trying to sue people and find documents and stuff like that.

So there’s a lot of full text search in that. And my, like, my head goes to terrible places when I think about full text search. Because there was like terabytes and terabytes of data in those things.

But, you know, full text search, for what it’s worth, does understand not. Because you can say not contains entity. And you can say contains connection string when you want to search for stuff.

And boy, howdy, does entity framework not show up in this one, right? So like at least, at least it understands not contains, right? We can say not contains and we can skip the entity framework stuff, right?

Or at least entity, right? Or whatever we did. Entity. Yeah, entity. So we are able to get rid of entity using full text search. But there is, of course, a lot of, you know, sort of backroom pain involved with full text search.

Despite its, I guess, advanced understanding of not. So, you know, you could choose to go this path, you know, which kind of sucks, right? It’s maybe not the greatest thing in the world.

However, it could be used to bolster vector search in some places. And, you know, I guess your alternative, if you’re like, you know, full text search, full text that, you could always go back to the bad old days of, you know, double wildcard text searching and say, and title not like percent entity percent. With, of course, the correct unicode thing in front of the string if it is unicode.

Because one should always mind their data types. All right. Well, that hurt to say.

Anyway, that’s about good here. If you’re looking at the top there, you see that I have four more SSMS tabs open. So we have four more tabs of things to talk about. Well, three more tabs of things to talk about at least this week.

And then we will start afresh next week with office hours, getting AI ready and all sorts of other things. All right. Anyway, that’s probably about good here.

Thank you so much for watching. I hope you enjoyed yourselves. I hope you learned something. And I will see you in tomorrow’s video where we will be even more ready for AI in SQL Server 2025. Yo desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk desk

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.



Leave a Reply

Your email address will not be published. Required fields are marked *