Seriously
If you’re using Entity Framework, and sending in queries that build up IN clauses, they won’t end up getting parameterized.
Even Forced Parameterization won’t help you if you’re sending in other parameters. One limitation is that it doesn’t kick in for partially parameterized queries.
Even if they did get parameterized, well, what’s one funny thing about IN clauses? You don’t know how many values will be in them.
You’d end up with a different plan for every variation in the number of values, though at least you’d get plan reuse within those groups.
Griper
Like most people, I use Twitter to gripe and brag. So here we are:
there are three surprising things about this issue:
1. it was ever an issue
2. it was opened in 2018
3. it’s still an issuehttps://t.co/v8qAh4oiD8— Erik Darling Data (@erikdarlingdata) August 5, 2021
Yes, I’m surprised that a code base that has been around for as long as Entity Framework hasn’t already dealt with this problem. I’ve said it before: someone ought to introduce the Entity Framework team to the SQL Server team.
But thankfully, some people use Twitter to be helpful, like ErikEJ (b|t|g), who has a solution that works with EF Core.
Alt Code
My friend Josh (b|t) helped me out with some code that works in non-Core versions of Entity Framework too.
using (var context = new StackOverflowContext()) { context.Database.Log = Console.WriteLine; // http://www.albahari.com/nutshell/predicatebuilder.aspx var predicate = PredicateBuilder.False<User>(); for (int i = 0; i < 100; i++) { var value = userIds[i >= userIds.Count - 1 ? userIds.Count - 1 : i]; predicate = predicate.Or(u => u.Id == value); } var users = context.Users .AsExpandable() // http://www.albahari.com/nutshell/linqkit.aspx .Where(predicate) .ToList();
This is helpful when you have an upper limit to the number of values that could end up in your IN clause. This is cool because you’ll always generate 20 parameters, and pad out the list with the last value. That means you’ll get one query plan regardless of how many parameters actually end up in there.
I do not suggest setting this to an arbitrarily high number as a catch all. You will not be happy.
If you don’t have a known number, you can use this:
using (var context = new StackOverflowContext()) { // http://www.albahari.com/nutshell/predicatebuilder.aspx var predicate = PredicateBuilder.False<User>(); foreach (var id in userIds) { predicate = predicate.Or(u => u.Id == id); } var users = context.Users .AsExpandable() // http://www.albahari.com/nutshell/linqkit.aspx .Where(predicate) .ToList();
And of course, even though it’s probably better than no parameterization at all, you will still get different query plans for different numbers of values.
And did I mention that Josh has a Great Series Of Posts™ on bad EF practices?
- ORM Queries: Too Many Roundtrips
- ORM Queries: Premature Materialization (Part 3)
- ORM Queries: Premature Materialization (Part 2)
- ORM Queries: Premature Materialization (Part 1)
- ORM Queries: Problematic Lazy Loading
- ORM Queries: SELECTing All Columns
- ORM Queries: Too Much NULL Checking
Thanks for reading!
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.
Related Posts
- How To Write SQL Server Queries Correctly: Literally Everything You Need To Know About OUTPUT
- How To Write SQL Server Queries Correctly: Literally Everything You Need To Know About CTEs
- How To Write SQL Server Queries Correctly: Case Expressions
- How To Write SQL Server Queries Correctly: Joins With OR Clauses Part 2
very useful (now to see if there is something similar for nHibernate 🙁 )
I had a similar (but a bit different) issue with SAP software that was generating IN clauses that were so long that the queries were failing due the length. For example: SELECT CustomerID, SalesOrderID, [SubTotal], [TotalDue] FROM Sales.SalesOrderHeader WHERE CustomerID IN ({lists over 10,0000}). This was throwing Event ID 8623 errors and the queries never ran. The solution was to generate a temporary table and use a join instead.
Would parameterization work any better with a temp table and join solution?
Perhaps less so, but the more important thing would be getting queries to run without errors.