Partilhar via


Tip 8 - How to write 'WHERE IN' style queries using LINQ to Entities

Imagine if you have a table of People and you want to retrieve only those whose the Firstname is in a list of interesting firstnames. This is trivial in SQL, you write something like this:

SELECT * FROM People
WHERE Firstname IN ('Alex', 'Colin', 'Danny', 'Diego')

A SQL IN is equivalent to a LINQ Contains

In the LINQ (to objects) world there is no 'IN' so you need to reverse the order like and use the Contains method:

var names = new string[] { "Alex", "Colin", "Danny", "Diego" };

var matches = from person in people
        where names.Contains(person.Firstname)
select person;

Notice that semantically we've gone from:

value.IN(set)

in SQL to

set.Contains(value)

In LINQ. The result however is the same.

Support for Contains in .NET 3.5 SP1 vs .NET 4.0

IEnumerable<T>.Contains(T t)will be supported in EF in .NET 4.0, so you can write queries like the LINQ query above in the next version of EF.

Unfortunately though it doesn't work in .NET 3.5 SP1: LINQ to Entities complains when it encounters a LINQ expression like this because it doesn't know how to translate calls to Contains to SQL.

But what about all our .NET 3.5 SP1 users out there? What can they do in the meantime?

Workaround for .NET 3.5 SP1

Well there is a workaround courtesy of Colin one of the big brains on the EF team.

The essence of the workaround is to recognize that you can rewrite the above query like this:

var matches = from person in people
        where person.Firstname == "Alex" ||
person.Firstname == "Colin" ||
person.Firstname == "Danny" ||
person.Firstname == "Diego"
        select person;

Certainly this is more 'wordy' and a pain to write but it works all the same.

So if we had some utility method that made it easy to create these kind of LINQ expressions we'd be in business.

Well that is exactly what Colin did in our forums a while back, with his utility method in place you can write something like this:

var matches = ctx.People.Where(
BuildOrExpression<People, string>(
p => p.Firstname, names
)
);

This builds an expression that has the same effect as:

var matches = from p in ctx.People
        where names.Contains(p.Firstname)
select p;

But which more importantly actually works against .NET 3.5 SP1.

Extra Credit...

If you've read this far, well done!

Here is the plumbing function that makes this possible:

public static Expression<Func<TElement, bool>> BuildOrExpression<TElement, TValue>(
        Expression<Func<TElement, TValue>> valueSelector,
        IEnumerable<TValue> values
    )
{     
    if (null == valueSelector)
throw new ArgumentNullException("valueSelector");

    if (null == values)
throw new ArgumentNullException("values");  

    ParameterExpression p = valueSelector.Parameters.Single();

    if (!values.Any())
return e => false;

    var equals = values.Select(value =>
(Expression)Expression.Equal(
valueSelector.Body,
Expression.Constant(
value,
typeof(TValue)
)
)
);

   var body = equals.Aggregate<Expression>(
(accumulate, equal) => Expression.Or(accumulate, equal)
);

   return Expression.Lambda<Func<TElement, bool>>(body, p);
}

I'm not going to try to explain this method, other than to say it essentially builds a predicate expression for all the values using the valueSelector (i.e. p => p.Firstname) and ORs those predicates together to create an expression for the complete predicate.

For more information on the technique Colin uses here mosey on over to his blog and read this.

Comments

  • Anonymous
    March 25, 2009
    Hopefully if you're reading this you've noticed that I've started a series of Tips recently. The Tips

  • Anonymous
    March 27, 2009
    Hi Alex,unfortunately in T-SQL there is no easy way to express a recordset of constants like the array initializer in C#. The analogous way would be insertrs into a temporary table.With that in mind one can rewrite the query with the IN operator using an equi-join. As far as I remember, LINQ-2-SQL was able to transform a query likevar matches = from person in people             join name in names on person.Firstname equals name             select person;into the query with the IN operator. I wonder if EF isn't capable of that. Do I miss something?Best regardsMartin

  • Anonymous
    March 27, 2009
    This query isn't supported in EF in .NET 3.5 SP1But it will be supported (along with Contains()) in .NET 4.0 in the same way L2S supports it today, namely by translation to IN (...)Trivia:Actually the way this is actually supported is that the EF produces a list of OR predicates in our Canonical Query Trees (provider agnostic trees), and then the SqlClient provider, recognizes that all those predicates share the same parameter on the same side, which means it can translate it into a SQL IN (...)  Alex

  • Anonymous
    March 27, 2009
    bare in mind names collection can contain only up to ~2100 elements due to the number of parameters a sql query accepthttp://stackoverflow.com/questions/656167/hitting-the-2100-parameter-limit-sql-server-when-using-contains/656196elegant solution:http://stackoverflow.com/questions/567963/linq-expression-to-return-property-value/568771#568771

  • Anonymous
    August 26, 2009
    I think you want Expression.OrElse (logical OR) rather than Expression.Or (bitwise OR).Also, rather than considering Any() a special case, I would use DefaultIfEmpty():<pre>var equals = values.Select(value =>       (Expression)Expression.Equal(            valueSelector.Body,            Expression.Constant(                value,                typeof(TValue)            )       )   ).DefaultIfEmpty(       Expression.Constant(false)   );</pre>Cheers ~Keith

  • Anonymous
    August 26, 2009
    Keith, The reason I use Expression.Or(..) is that no matter what we do we can't convince the database to do a short-circuiting evaluations, so from the EF's perspective Or and OrElse are identical. So by using Or we 'produce' CLR expressions who's intent can actually be satisfied by the database! Nice spot though! Alex

  • Anonymous
    September 16, 2009
    I made a suggestion of how the 2100 parameter limit could be addressed using xml parameters on the Microsoft .Connect site but it looks like they are not interested in this solution.https://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=475984&wa=wsignin1.0I really think we need a decent way of handling large mufti valued parameters which doesn't require hacking the expression tree or changing our queries to call some other workaround method.

  • Anonymous
    September 25, 2009
    Hi Alex,How to write a LINQ query like this SQL statement in Entity Framework (with the above workaround)?SELECT * FROM PeopleWHERE Id = 123 OR Firstname IN ('Alex', 'Colin', 'Danny', 'Diego')

  • Anonymous
    November 10, 2009
    Hi Alex,I am trying to fit the below Buildexpression function in my join query. My query isvar q = from l in blendedItemsjoin v in blendedItemsViewon l.LineItemID equals v.LineItemIDwhere v.Status == _readyStatus&& strArray.Contains(v.BillingGroupID)select l;I am able to do this in a saperate statement like this:var query = dao.LineItemInterface.Where(BuildContainsExpression<LineItemInterface, string>(l => l.BillingGroupID, strArray));Thanks in advance.Abrar

  • Anonymous
    November 10, 2009
    The comment has been removed

  • Anonymous
    November 10, 2009
    The comment has been removed

  • Anonymous
    November 10, 2009
    Hi Alex,I have a "WHERE IN" query which works fine with one condition, now i want to add one more condition in my existing query.Can you please help me out in writing the second condition.my second condition is "WHERE cust.Region == _strRegionName"below is my query with first condition with "WHERE IN or BuildContainsExpression" which is working fine, i want to add second condition in this query:var query = dao.LineItemInterface.Where(BuildContainsExpression<LineItemInterface, string>(l => l.BillingGroupID, strArray));static Expression<Func<TElement, bool>> BuildContainsExpression<TElement, TValue>(              Expression<Func<TElement, TValue>> valueSelector, IEnumerable<TValue> values)               {                   if (null == valueSelector) { throw new ArgumentNullException("valueSelector"); }                   if (null == values) { throw new ArgumentNullException("values"); }                   ParameterExpression p = valueSelector.Parameters.Single();                   // p => valueSelector(p) == values[0] || valueSelector(p) == ...                   if (!values.Any())                   {                       return e => false;                   }                   var equals = values.Select(value => (Expression)Expression.Equal(valueSelector.Body, Expression.Constant(value, typeof(TValue))));                   var body = equals.Aggregate<Expression>((accumulate, equal) => Expression.Or(accumulate, equal));                   return Expression.Lambda<Func<TElement, bool>>(body, p);               }Thanks,M

  • Anonymous
    November 12, 2009
    @Maria,Can you write some pseudo code - i.e. what you want to write, imagine that the language allows it.That will help me understand your requirement.Alex

  • Anonymous
    November 12, 2009
    @AbrarI sure there are better ways, but one option would be to do this:(  from l in blendedItems  join v in blendedItemsView  on l.LineItemID equals v.LineItemID  where v.Status == "Blended"  select new {l, v}).Where(  BuildContainsExpression<LineItemInterface, string>(x => x.l.BillingGroupID, strArray))).Select(x => x.l);The database should do a pretty good job of optimizing this...Alex

  • Anonymous
    November 19, 2009
    The comment has been removed

  • Anonymous
    November 19, 2009
    Chris,I understand your frustration with the difference between EF and L2S. Thankfully most of those differences will be gone soon when .NET 4 comes out.As for running into the RPC limit, wouldn't you run into the same problem in L2S too?Alex

  • Anonymous
    December 30, 2009
    Could any body explain the method BuildOrExpression to me please?I'm just not quite understand,thanks!

  • Anonymous
    January 20, 2010
    Hello,I've found BuildOrExpression very useful but now I have a problem. I need to make a query likeselect X, Y from Twhere X in (value1, value2, ... valueN)or Y in (value1, value2, ... valueN)Any suggestions ?

  • Anonymous
    January 29, 2010
    Great post!!But one thing let me curious... why do you use null before ==? just style? Or is there any reason?Thanks for the Tip ;)

  • Anonymous
    March 13, 2010
    For those of you using ObjectQuery like me. Below is the extension I use.Example:ObjectQuery<DocumentTypes> qry =ctx.DocumentTypes                        .BuildOrExpression("DocumentTypeId", ids) .OrderBy("it.DocumentTypeName");public static ObjectQuery<T> BuildOrExpression<T, TList>(this ObjectQuery<T> qry,           string column, TList [] list)       {           if (list != null && list.Length > 0)           {               string uniqueName = "k" + Guid.NewGuid().ToString("N");               ObjectParameter[] parameters = new ObjectParameter[list.Length];               System.Text.StringBuilder commandText = new System.Text.StringBuilder();               commandText.AppendFormat("(it.{0} = @{1}0", column, uniqueName);               parameters[0] = new ObjectParameter(uniqueName + "0", list[0]);               for (int i = 1; i < list.Length; ++i)               {                   commandText.AppendFormat(" OR it.{0} = @{1}{2}", column, uniqueName, i);                   parameters[i] = new ObjectParameter(uniqueName + i.ToString(), list[i]);               }               commandText.Append(")");               qry = qry.Where(commandText.ToString(), parameters);           }           return qry;       }

  • Anonymous
    September 12, 2010
    Awesome extension method. Thanks a ton!!

  • Anonymous
    October 17, 2011
    I am using .NET 4.0 and I am still not able to use contains.  I get this error: "Unable to create a constant value of type <TYPE HERE>.  Only primitive types ('such as Int32, String, and Guid') are supported in this context"  Don't know if I am doing it wrong, but if it is not supported in .net 4.0 then you may want to update your post to indicate that.

  • Anonymous
    February 09, 2012
    Given the following example:   var matches = from p in ctx.People   where listOfIds.Contains(p.Id)   select p;I noticed some preformance issues in my case when 'listOfIds' is a collection of about 16000 recordsI tracked down the problem and it takes about 90% of the time generating the SQL to excecuteWith the following code I was able to optimize this and reduce the execution time by 85%:           List<Person> people = new List<Person>();           int index = 0;           int pageSize = 1000;           do           {               int skip = index * pageSize;               List<int> idsInCurrentPage = listOfIds.Skip(skip).Take(pageSize).ToList();               var peopleInCurrentPage = from p in ctx.People                                           where idsInCurrentPage.Contains(p.Id)                                           select p;               people.AddRange(peopleInCurrentPage.ToList<Person>());               index++;           }           while (listOfIds.Count > (index * pageSize));           return people;This leads me to beleive that LINQ is using str = str + "<some parameter>" to generate the SQL query instead of using the StringBuilder

  • Anonymous
    February 14, 2012
    did you receive my comment about the performance issue when the names list has around 16 thousand records?

  • Anonymous
    February 23, 2012
    Sorry ivo - my blog occasionally thinks comments are spam and puts them on holdYour approach to send multiple independent request is interesting.As for where the problem is, it could be anywhere in the stack, all the way down to how SQL handle long statements. I wonder where the problem gets introduced.

  • Anonymous
    February 23, 2012
    @ivo - PS I've passed you issue onto Kati & Diego from the EF team... so maybe they'll weigh in.

  • Anonymous
    February 23, 2012
    Hello Ivo, thanks for bringing this up to our attention. For a query like this, we have found that most of the time is spent inside EF manipulating expression trees. The cost seems to grow more than linearly with the number of elements in the collection and that is why a workaround like yours helps so much. There is more information and a couple of workarounds (one of them very similar to yours) in this thread in StackOverflow:stackoverflow.com/.../why-does-the-contains-operator-degrade-entity-frameworks-performance-so-dramaThe long term plan is to add native support for Contains/IN expression in EF, to avoid expanding Contains to a tree of ORs between equality comparisons as we do today.I encourage you to vote for this suggestion at data.uservoice.com/.../2598644-improve-the-performance-of-the-contains-operator.Thanks,Diego

  • Anonymous
    March 09, 2012
    thanks!