Get AI-Ready With Erik: Vector Query Plans

Summary

In this video, I delve into vector material and query plans, specifically addressing common pitfalls when working with vectors in SQL Server. I share insights from the full course on vector search and indexes, which you can access at training.erikdarling.com using the coupon code AIREADY to save $100 off. The key takeaway is to avoid putting vectors into a temp table; instead, use local variables or formal parameters for better performance. I also highlight how query plans can behave unexpectedly when dealing with vector distances and nested loops joins, emphasizing the importance of parallelism and memory grants in SQL Server’s execution strategy.

Chapters

*00:00:00* – Introduction
*00:00:31* – Query Plan Quirk
*00:02:09* – Distribute Streams Issue
*00:03:27* – Avoid Temp Tables
*00:04:56* – Vector Distance Filtering
*00:06:05* – Execution Plan Comparison
*00:08:07* – Memory Grant Considerations
*00:10:15* – Better Query Approach
*00:10:37* – Vector Search Performance
*00:11:42* – Summary

Full Transcript

Erik Darling here with Darling Data. And we’re going to finish out this week with sort of the last of the vector material. Now, two things about that. One is that this was not all of the content from all of the modules in the class. There are a few that I did not include in this video series. And the other is that this is not nearly, this is like a fifth of the content that’s actually in the full course. If you want to buy the full course, which hopefully some of these videos will persuade you to do. Hopefully I am persuasive in my tactics. You can go to training.erikdarling.com. And if you use the coupon code AIREADY, you can get AI ready with Erik, question mark. And you can get a hundred bucks off the course with that. We’re going to spend some time looking at query plans and solving vector stuff. I know that we’ve sort of peaked at them a little bit, but there’s still some fun stuff to talk about. And there’s, there’s actually one thing in here that I want to hopefully get you to not do when you’re doing that. And I only learned that lesson when trying to make the demos a bit more portable. So the thing that I’m going to try to persuade you to eventually not do is to put your vectors into a temp table and use that. And there’s a good reason for that. Now the first thing here is kind of a funny query plan quirk. I mean, the funny query plan quirk is we very clearly have a where clause on vector distance here. We have a cross join between these two tables here. And when we run this and we look at the execution plan, something kind of funny happens. The vector predicate is applied here at the nested loops join. You can see minus two, ba-da-ba-da, all that stuff there.

But this is the reason why I don’t want you to do this in real life. Everything looks okay here, except we have this distribute streams of the round robin variety. And this is not performance tuning content. I do have performance tuning content that goes deeper into this, if you’re quite interested in that. But the thing that you end up with here is that, you know, after the distribute streams, we have thread one with one row on it. And when we come down here, we look at the properties, all those rows over here end up on thread one.

So if I can convince you to not do one thing in this video, that would be to put your embeddings into a temp table and use that, you are far better off at this point in time, at this juncture, using a local variable or a formal parameter to store your embedding in for which to gauge the vector distance between two things on. Now, coming back to the query plan a little bit, you know, it is, you know, we don’t really have a better way of doing this at the moment. It is a cross join. So we do get like vanilla nested loops here. We don’t get apply nested loops.

We can’t get that with other query forms. But, you know, since the only like we’re basically just like saying, hey, I want all these rows and all these rows to match up and then I need to figure out which ones like have, you know, a reasonable vector distance between them, meaning they are reasonably similar. Right. And just for the data that I have in here asking for a vector distance of less than 0.2, remember that cosine is zero to the number two and the closer you are to zero, the better off you are. We want to just preemptively filter out any results that might not be terribly similar.

If we have this query without a filter on vector distance, we need to join into the post table in order to filter on anything else. So if we wanted to, say, compare tags in the post table to some search in order to sort of reinforce the vector distance accuracy a little bit, just by like, like, it’s not obvious from this, but the question, the embedding that I’m searching for is about Git. And so it helps, it helps a bit to look for Git in the tags of the post table, right?

So we can do that here. And of course, we get a bit more of a complicated execution line. This one, this one does not have the weird row imbalance stuff because there are copious repartitions of streams throughout this plan that help rebalance things. So, like, even over here, like, just because we get, we get a pretty good row spread coming out of the post table here.

I’m not even, I’m not saying that parallelism is terribly useful here. Like, for row numbers this low, almost like, it’s kind of dumb, but it makes a good comparison. For row numbers this low, it’s like, why are we even going parallel?

But, like, just because of all that, like, the stuff that ends up over here is usually also pretty well spread out. Like, I’m not saying it’s perfect. Some threads got no rows and others got, you know, weird numbers of rows. But it’s not like the one before where all the rows ended up on a single thread.

That’s, that’s, like, that’s crappy, right? Get your stuff together, SQL Server. But if we, if we, like, wind all that out and we just run this query where we’re saying, hey, just give me the vector distance order by distance.

This is where, the only thing that I really want to point out here is, I mean, we still get all the rows on one thread, right? So, like, this is still the issue here. But since, like, we’re, we’re saying, like, we want to calculate this distance, this is a runtime calculation.

Right? You don’t know this ahead of time. Like, based on this, it’s not even really, like, a good, like, like, computed column you can make. Because you have to, like, you don’t know what the vector is going to be.

It’s like, okay, well, you showed up. I got to use you. So, again, the, the, the cross-join to a temp table thing is not, not a particularly good idea. But it’s also particularly not a good idea because with all the rows ending up on one thread, and it doesn’t happen here, but it is worth warning you about.

With all the rows ending up on a single thread, you know, something like, again, this is not, like, a big performance tuning focused course. But, you know, all plans in SQL Server start off as serial plans. They only graduate to parallel plans under certain circumstances, the subestimated cost of the query being above cost threshold for parallelism.

There’s no, nothing inhibiting parallelism, stuff like that. And then, finally, the parallel plan has to be cheaper than the serial plan. When SQL Server assigns a memory grant to a query plan, that memory grant is assigned when the query plan is single thread, when it’s a serial plan.

And when it, if it graduates to a parallel plan, that memory grant gets divided equally across dot threads. So, if it was dot beta, we get divided by eight. With all the memory getting divided, and one thread having all the rows on it, it’d be very easy for this query to start spilling and maybe showing some signs of performance degradation because of that.

And you team that up with the fact that vector distance is a runtime calculation, and that’s what we’re sorting by, and there’s no way to index this, like, give that data to SQL Server an order. It can be a pretty dangerous game to play. So, you know, please, you know, avoid the temp table thing for these reasons.

You know, we’re only getting 10 rows up here, but if you go out and get bigger result sets, which probably wouldn’t be too, too common with vector search, depending on what you’re doing, but, you know, like, like, more rows does equal more memory, right? So, we run these two queries, the top 10 one asks for an 1152 KB memory grant, right? And, oops, I didn’t want that.

And the one that asks for the top 1,000 rows or whatever asks for a 700 memory grant. So, memory does graduate with this. Even though we’re not, like, saying, even though we’re not including, like, the embedding column in this, which is, you know, a big old thing, there’s absolutely no reason to look at the embeddings. Like, there’s no reason to return the embeddings to a user and be like, here was your array of floats.

Enjoy. There’s no reason for that. So, don’t do that.

But, if we were to add that to the select list, we would get much bigger memory grants out of these. So, generally, the better option is to assign your embedding to a local variable or formal parameter. I’m just using a formal parameter here to make life easy.

And I’ve written this query. Like, I decided to try and get a little bit fancy with stuff. So, like, rather than, you know, put the vector distance up here, I was like, well, why don’t we cross apply to the top one thing in here? And then we can sort that out out here, and we’ll filter out the original post out here.

Just, you know, throw some mud at the wall and see what sticks. This ends up being a fairly fair query plan to the vector search thing up here, right? Because we get, like, we start with the post embeddings table out here.

We get a very even row distribution here. And then on the inner side, we also get a very even row distribution. And, like, we just avoid the, or rather, not avoid, but we reduce the chance that any single thread will get all the rows.

And the memory grant division for a parallel plan would be less likely to spill. Because you don’t have, like, one thread that has an eighth of the memory with seven threads that have seven eighths of the memory but no rows, right? So, just sort of a better way of doing things.

Of course, vector search is, you know, much, much faster, but it is still in preview along with vector indexes. But, you know, all those query plans took somewhere between, like, 200 milliseconds and, like, 1.2 seconds. This, you know, the vector search thing, you know, it is much faster.

There are trade-offs with it, though. You know, like, again, like I said, like, creating vector indexes is incredibly painful. I was working up some different demo stuff last night, and I was, like, I’m going to make a 10 million row table with embeddings in it.

It took me six hours to create a vector index on that thing. And I was, like, I was, like, I’m going to start this and go to bed. So, and then, like, even, like, a million row table took, like, a half hour.

So, you know, like, it’s a real nightmare at this point. Like, I got, and again, it’s in preview, so you shouldn’t probably be using this stuff, but when it gets, when this stuff finally gets out of preview, it’ll be, make your life a lot easier, like, performance tuning vector stuff. There’s even trade-offs in here, though, because, you know, like, the vector distance thing is an exact, is sort of, like, an exact search where it’s, like, it will give you the exact distance between one vector, one embedding and another, one vector and another.

With vector search, it’s more approximate, because you’re sort of, like, navigating this graph structure in the index, but we’re going to talk more about that in another video. Anyway, thank you for watching. I hope you enjoyed yourselves.

I hope you learned something. I hope that you will buy my wonderful educational content from me at a discount, which you can get the coupon code for that down in the video description. So, thank you for watching.

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.