Can I split an IEnumerable into two by a boolean criteria without two queries?

Can I split an IEnumerable<T> into two IEnumerable<T> using LINQ and only a single query/LINQ statement?

I want to avoid iterating through the IEnumerable<T> twice. For example, is it possible to combine the last two statements below so allValues is only traversed once?

IEnumerable<MyObj> allValues = ...
List<MyObj> trues = allValues.Where( val => val.SomeProp ).ToList();
List<MyObj> falses = allValues.Where( val => !val.SomeProp ).ToList();
12532 次浏览

You can use this:

var groups = allValues.GroupBy(val => val.SomeProp);

To force immediate evaluation like in your example:

var groups = allValues.GroupBy(val => val.SomeProp)
.ToDictionary(g => g.Key, g => g.ToList());
List<MyObj> trues = groups[true];
List<MyObj> falses = groups[false];

Some people like Dictionaries, but I prefer Lookups due to the behavior when a key is missing.

IEnumerable<MyObj> allValues = ...
ILookup<bool, MyObj> theLookup = allValues.ToLookup(val => val.SomeProp);


// does not throw when there are not any true elements.
List<MyObj> trues = theLookup[true].ToList();
// does not throw when there are not any false elements.
List<MyObj> falses = theLookup[false].ToList();

Unfortunately, this approach enumerates twice - once to create the lookup, then once to create the lists.

If you don't really need lists, you can get this down to a single iteration:

IEnumerable<MyObj> trues = theLookup[true];
IEnumerable<MyObj> falses = theLookup[false];

Copy pasta extension method for your convenience.

public static void Fork<T>(
this IEnumerable<T> source,
Func<T, bool> pred,
out IEnumerable<T> matches,
out IEnumerable<T> nonMatches)
{
var groupedByMatching = source.ToLookup(pred);
matches = groupedByMatching[true];
nonMatches = groupedByMatching[false];
}

Or using tuples in C# 7.0

public static (IEnumerable<T> matches, IEnumerable<T> nonMatches) Fork<T>(
this IEnumerable<T> source,
Func<T, bool> pred)
{
var groupedByMatching = source.ToLookup(pred);
return (groupedByMatching[true], groupedByMatching[false]);
}


// Ex.
var numbers = new [] { 1, 2, 3, 4, 5, 6, 7, 8 };
var (numbersLessThanEqualFour, numbersMoreThanFour) = numbers.Fork(x => x <= 4);

Modern C# example using just Linq, no custom extension methods:

(IEnumerable<MyObj> trues, IEnumerable<MyObj> falses)
= ints.Aggregate<MyObj,(IEnumerable<MyObj> trues, IEnumerable<MyObj> falses)>(
(new List<MyObj>(),new List<MyObj>()),
(a, i) => i.SomeProp ? (a.trues.Append(i), a.falses) : (a.trues, a.falses.Append(i))
);

Does this answer the question, yes; is this better or more readable than a foreach, no.

Had some fun coming up with this extension method based on the ToLookup suggestion in other answers:

public static (IEnumerable<T> XS, IEnumerable<T> YS) Bifurcate<T>(this IEnumerable<T> source, Func<T, bool> predicate)
{
var lookup = source.ToLookup(predicate);
return (lookup[true], lookup[false]);
}

The callsite will look like this:

var numbers = new []{ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };
var (evens, odds) = numbers.Bifurcate(n => n % 2 == 0);

I think the usability of this is nice which is why I'm posting this answer.

We can go even further:

public static (IEnumerable<T> XS, IEnumerable<T> YS, IEnumerable<T> ZS) Trifurcate<T>(this IEnumerable<T> source, Func<T, bool> predicate1, Func<T, bool> predicate2)
{
var lookup = source.ToLookup(x =>
{
if (predicate1(x))
return 1;


if (predicate2(x))
return 2;


return 3;
});


return (lookup[1], lookup[2], lookup[3]);
}

The order of predicates matters with this one. If you pass n => n > 5 and n => n > 100 in that order for example, the second collection will always be empty.

One might even have an itch to come up with a version of this that would work with a variable number of predicates(I know I did) but as far as I know that's not possible with tuple return values in C#.

In all of these answers you lose LINQ's 2nd greatest power (after expressiveness of course); laziness! When we call ToDictionary() or ToLookup() we are forcing an enumeration.

Let's take a look at the implementation of partition in Haskell, a great lazy functional programming language.

From Hoogle:

'partition' p xs = ('filter' p xs, 'filter' (not . p) xs)

As you can see it, partition is an expression which returns a tuple of two other expressions. First, where the predicate is applied to the elements and second, where the inverse of the predicate is applied to the elements. Haskell is lazily evaluated implicitly in this case, similar to how LINQ is lazy through its usage of expressions rather than delegates.

So why don't we implement our partition extension method the same way. In LINQ, filter is called where so lets use that.

public static (IEnumerable<T>, IEnumerable<T>) Partition<T>(
this IEnumerable<T> source, Func<T, bool> predicate)
=> (source.Where(predicate), source.Where(x => !predicate(x)));

One caveat with this is that if you force an evaluation on the matches AND the rest, you will perform a double enumeration. However, don't try to optimise early. With this approach you can express partition in terms of LINQ thereby preserving its beneficial characteristics.