Get AI-Ready With Erik: Vector Indexes Just Not There Yet
Summary
In this video, I discuss vector indexes in SQL Server 2025, focusing on their current state as a preview feature. I walk through the process of inserting data into a table to prepare for creating a vector index and explain why these indexes are not yet fully production-ready. I highlight key limitations such as read-only tables, lack of multi-column support, and the absence of inline index definitions, emphasizing that while they offer some functionality, there is still much work to be done before vector indexes can be considered mature features in SQL Server.
Topics
`SQL Server 2025`, `Vector Indexes`, `Vector Data Type`, `Vector Distance`, `Vector Norm`, `Vector Normalize`, `AI Generate Embeddings`, `AI Generate Chunks`, `Preview Features`, `Read-Only Table`, `Clustered Primary Key`, `Compressible Columnstore Index`, `Vector Index Quantization`, `System Databases`, `Inline Index Definition`, `Multi Column Vector Index`, `Multi Key Vector Index`, `Enterprise Database Product`, `SQL Server 2025 Update`, `Erik Darling`, `Darling Data`, `Get AI Ready with Erik Course`
Chapters
- *00:00:00* – Introduction
- *00:00:28* – Embeddings Progress
- *00:01:13* – Why Vector Indexes Aren’t There Yet
- *00:02:04* – SQL Server 2025 Preview Features
- *00:03:11* – Creating a Vector Index
- *00:04:26* – Query to Create Vector Index
- *00:06:47* – After Creating a Vector Index
- *00:10:30* – Conclusion
Full Transcript
Erik Darling here with Darling Data, and we are just waiting for the last, oh, I don’t know, thousand or so records to go into this table to make it an even 10,000 rows so that I can show you some stuff about vector indexes. You can see maybe from, let me, well, you know, let’s go in here and let’s zoom in a little bit. We have local zoomit running in here. So, I was putting, like, 8,000 rows into the Taylor, and it’s gonna, it’s gonna take a little while, right, because I had 2,000 rows in there, but I wanted to get it up to 10,000 rows. So, you know, I’m just waiting on that to finish up. We’re almost done. Yeah, it’s 7,500 out of 8,000 embeddings, and, oh, I don’t know, let’s see.
Did this finally stop moving? Yeah, it did. All right, so our local olama has finished creating embeddings. Some input tokens were not marked as outputs. Okay, that’s great. Well, anyway, nice to know. Let’s talk a little bit about why vector indexes are just kind of not there yet.
This is all part of my Get AI Ready with Erik course. You can buy it with the link up there that is also handily down in the video description below. And the nice tidally added coupon code will get you 100 full US American dollars off. So right now in SQL Server 2025, the vector things that you have generally available are, of course, the vector data type, which currently only allows for a float 32 in GA.
You have three vector centric functions, vector distance, vector norm, and vector normalize. I go over all those in more detail in the full course material. I’m not going to, I can’t, I can’t do everything here. Give it all away for free. What’s the point?
You can create external models. So you can do things. Well, not like I just showed you. Well, actually you could do that. I’m going to show you just not in this video, but you can use like functions within SQL Server, like AI generate embeddings and AI generate chunks to sort of do the job that the Python script I was just showing you would do. So SQL Server 2025 has a database scope configuration with the ability to allow you to use preview features, which you have to explicitly opt into.
So you don’t accidentally use a preview feature in production and then be like, oh, I didn’t know. Whoops. If you want to turn on preview features, if you just don’t care, like I don’t care about this local VM. So screw it, just do whatever I want with it. You can do that by saying, by altering the database scope configuration to set preview features equals on.
Preview features has not quite made it to blue text in SSMS 22 point, whatever this is yet. I don’t know. Is this thing telling me there’s an update? What is this saying? Oh, there is. 22 point, 2.1 is now available. Well, we’ll get to that later.
Maybe, maybe preview features will turn blue in that one. We don’t know. But creating a vector index requires a single column integer primary key of the clustered variety. And of course, a vector column vector data type column on which to create it.
Our post embeddings table meets all of those qualifications. And as long as everything goes right, we should have exactly 10,000 rows in there. Now we clearly don’t have a vector index currently on the table because if we did all of those inserts that we were just doing would have failed.
All right. So we can run this and it’ll just say it wasn’t there. So we did not do anything. But like there are two, I mean, the two main problems with vector indexes. One, they make the table that you create them on read only in full, not just the vector column, the entire table read only.
No, no writes whatsoever. Insert, update, delete, merge out. All right. No. I mean, obviously no partition switching because you can’t partition this stuff, but all the things in the world. So what I want to show you is what happens when you create a vector index in SQL Server by running this.
All right. And like I’m not going to go through all the code because it’s a pretty hellacious mess. But, you know, this thing took about 10 seconds for about 10,000 rows. You know, fairly long time for an index create.
I believe it is sped up with cumulative update one a bit for 2025. But this is the this is the query that currently builds a vector index. And there’s a lot of stuff going on in here. Right.
Like this is not a normal index create. And it is kind of funny that like when you when you like turn on execution plans, normally when you create an index, you get an execution plan back. Doing it for creating a vector index, you don’t get an execution plan back for it.
But you do get this handy warning when you create a vector index that the join order has been enforced because a local join hint is used. So, you know, real, real professional there. And that’s that’s always a good sign.
But this is the code that currently does it right where we’re doing all this wacky stuff. And this is, you know, not how a normal index gets created. Microsoft right now has to do all of this stuff in order to build a vector index, which is why it’s so slow and why this is a preview feature.
And this is not, you know, production ready for anyone, because if you like to tell someone that, like, you know, like when you create an index in your database, it runs a series of like, you know, I don’t know. I don’t know if internally this is from a store procedure. This is just ad hoc SQL that runs.
But if you were to tell someone this is what you’re doing, like a serious database person, I think there’s a pretty good chance you would get like laughed at pretty hard. You know, cool workaround. But this isn’t this is not what you expect from an enterprise database product.
Not not at all. Yeah. So anyway, this is why vector index is preview only.
And once you create a vector index, there are some other some other things in there, too. Right. Once you create a vector index, you can see a bit about it by using.
I mean, it’s kind of also kind of we have to use sys.internal tables and join that off to some other stuff to find like information about it. But like like this is looking at the post embeddings table. This is the full table that we have, which is like one point something million rows.
I think it’s a you know, it’s a pretty big index. And it’s not like you can compress this thing. I mean, you can create a columnstore index, which is compressible. Like you would still have like the base table or like if it was a non clustered columnstore.
So like you could there is some compression that like kind of works on this, but not not like page or row compression. Right. So this is the index that you get back.
There’s also this other funny row called vector index quantization table, which apparently would be like a compressed vector index thing. I don’t have the full details on that, but rather anything more than what I see there. But it’s not in use yet.
There’s also sort of a funny thing where like you can’t turn on preview features and system databases. So like when I when I when I was first reading about this and I was like vector indexes are read only. Well, that’s kind of whack.
But we could we could like maybe put them into a temp table, put like some stuff into a temp table and then create the vector index on like a smaller set of things and then search like in that. But since you can’t turn on preview features and system databases, you can’t create a temp table with a vector index on it. Which I don’t know, I mean, you know, you’re messing with my work around here.
You’re messing with like the like the perf tuner stuff in here. I don’t I don’t like I don’t particularly care for that. There’s also some other funny like interoperability things where like like most index things you can like create in line with the table definition.
You can not do that with vector indexes. Vector indexes don’t allow you the inline index definition thing. You get one vector index per column, meaning that if you for some reason wanted to like measure like if some reason like you wanted like, you know, like like Euclidean or dot product or you wanted to mix cosine with the other two, you couldn’t index the same vector column two ways, which is weird. And you also can’t have multi column vector indexes, you know, like the vector index like no includes, which is, you know, not awesome for people who might want to like, you know, hit their vector column and, you know, select other columns, things like that.
But, you know, there’s also like like like like no multi key vector indexes. So there’s kind of a lot of stuff about the vector implementation right now that feels really half baked. It feels like this was like rushed out the door, like there was a lot of pressure to get, you know, some movement on this.
And, you know, it just shows us really unpolished, you know, I’m not going to say unprofessional because screw it. Look at me. But, you know, it’s just like really unpolished as as a as a feature at the moment. And again, like we’re not we’re not getting like at least I’m not getting any like feedback about like roadmaps or timeline for when any of this stuff will get worked out.
So, you know, I can’t tell you any of that. Maybe maybe some MVP out there can can tell you because, you know, they’ll they’ll they’ll they’ll kiss the right end of someone and get some details. But that ain’t me, babe. All right. Anyway, vector indexes, maybe someday.
Right now, you know, no, no, no, no buenos to be had. All right. Thank you for watching. Thank you for watching.
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.