多个列表与 IEnumable.Intersect()的交集

我有一个列表,我想找到这样的交集:

var list1 = new List<int>() { 1, 2, 3 };
var list2 = new List<int>() { 2, 3, 4 };
var list3 = new List<int>() { 3, 4, 5 };
var listOfLists = new List<List<int>>() { list1, list2, list3 };


// expected intersection is List<int>() { 3 };

有什么方法可以实现 IEnumable. Intersect () ?

编辑: 我应该更清楚这一点: 我真的有一个列表,我不知道有多少,上面的三个列表只是一个例子,我有的实际上是一个 IEnumerable<IEnumerable<SomeClass>>

解决方案

谢谢你的回答。事实证明,解决这个问题有四种选择: 列表 + 聚合(@Marcel Gosselin)、 列表 + foreach(@JaredPar,@Gabe Moothart)、 HashSet + 聚合(@jesperll)和 HashSet + foreach(@Tony the Pony)。我对这些解决方案进行了一些性能测试(在每个列表中改变 名单数目元素个数随机数最大值的大小。

事实证明,在大多数情况下,HashSet 比 List 表现得更好(除了大型列表和小型随机数大小,我猜这是因为 HashSet 的性质) 我找不到 foreach 方法和聚合方法之间的任何实际差异(foreach 方法执行 有点更好)

To me, the aggregate method is really appealing (and I'm going with that as the accepted answer) but I wouldn't say it's the most readable solution.. Thanks again all!

42616 次浏览

You could do the following

var result = list1.Intersect(list2).Intersect(list3).ToList();

You can indeed use Intersect twice. However, I believe this will be more efficient:

HashSet<int> hashSet = new HashSet<int>(list1);
hashSet.IntersectWith(list2);
hashSet.IntersectWith(list3);
List<int> intersection = hashSet.ToList();

Not an issue with small sets of course, but if you have a lot of large sets it could be significant.

Basically Enumerable.Intersect needs to create a set on each call - if you know that you're going to be doing more set operations, you might as well keep that set around.

As ever, keep a close eye on performance vs readability - the method chaining of calling Intersect twice is very appealing.

EDIT: For the updated question:

public List<T> IntersectAll<T>(IEnumerable<IEnumerable<T>> lists)
{
HashSet<T> hashSet = null;
foreach (var list in lists)
{
if (hashSet == null)
{
hashSet = new HashSet<T>(list);
}
else
{
hashSet.IntersectWith(list);
}
}
return hashSet == null ? new List<T>() : hashSet.ToList();
}

Or if you know it won't be empty, and that Skip will be relatively cheap:

public List<T> IntersectAll<T>(IEnumerable<IEnumerable<T>> lists)
{
HashSet<T> hashSet = new HashSet<T>(lists.First());
foreach (var list in lists.Skip(1))
{
hashSet.IntersectWith(list);
}
return hashSet.ToList();
}

Try this, it works but I'd really like to get rid of the .ToList() in the aggregate.

var list1 = new List<int>() { 1, 2, 3 };
var list2 = new List<int>() { 2, 3, 4 };
var list3 = new List<int>() { 3, 4, 5 };
var listOfLists = new List<List<int>>() { list1, list2, list3 };
var intersection = listOfLists.Aggregate((previousList, nextList) => previousList.Intersect(nextList).ToList());

Update:

Following comment from @pomber, it is possible to get rid of the ToList() inside the Aggregate call and move it outside to execute it only once. I did not test for performance whether previous code is faster than the new one. The change needed is to specify the generic type parameter of the Aggregate method on the last line like below:

var intersection = listOfLists.Aggregate<IEnumerable<int>>(
(previousList, nextList) => previousList.Intersect(nextList)
).ToList();

How about:

var intersection = listOfLists
.Skip(1)
.Aggregate(
new HashSet<T>(listOfLists.First()),
(h, e) => { h.IntersectWith(e); return h; }
);

That way it's optimized by using the same HashSet throughout and still in a single statement. Just make sure that the listOfLists always contains at least one list.

This is my version of the solution with an extension method that I called IntersectMany.

public static IEnumerable<TResult> IntersectMany<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, IEnumerable<TResult>> selector)
{
using (var enumerator = source.GetEnumerator())
{
if(!enumerator.MoveNext())
return new TResult[0];


var ret = selector(enumerator.Current);


while (enumerator.MoveNext())
{
ret = ret.Intersect(selector(enumerator.Current));
}


return ret;
}
}

So the usage would be something like this:

var intersection = (new[] { list1, list2, list3 }).IntersectMany(l => l).ToList();

This is my one-row solution for List of List (ListOfLists) without intersect function:

var intersect = ListOfLists.SelectMany(x=>x).Distinct().Where(w=> ListOfLists.TrueForAll(t=>t.Contains(w))).ToList()

This should work for .net 4 (or later)

This is a simple solution if your lists are all small. If you have larger lists, it's not as performing as hash set:

public static IEnumerable<T> IntersectMany<T>(this IEnumerable<IEnumerable<T>> input)
{
if (!input.Any())
return new List<T>();


return input.Aggregate(Enumerable.Intersect);
}

After searching the 'net and not really coming up with something I liked (or that worked), I slept on it and came up with this. Mine uses a class (SearchResult) which has an EmployeeId in it and that's the thing I need to be common across lists. I return all records that have an EmployeeId in every list. It's not fancy, but it's simple and easy to understand, just what I like. For small lists (my case) it should perform just fine—and anyone can understand it!

private List<SearchResult> GetFinalSearchResults(IEnumerable<IEnumerable<SearchResult>> lists)
{
Dictionary<int, SearchResult> oldList = new Dictionary<int, SearchResult>();
Dictionary<int, SearchResult> newList = new Dictionary<int, SearchResult>();


oldList = lists.First().ToDictionary(x => x.EmployeeId, x => x);


foreach (List<SearchResult> list in lists.Skip(1))
{
foreach (SearchResult emp in list)
{
if (oldList.Keys.Contains(emp.EmployeeId))
{
newList.Add(emp.EmployeeId, emp);
}
}


oldList = new Dictionary<int, SearchResult>(newList);
newList.Clear();
}


return oldList.Values.ToList();
}

Here's an example just using a list of ints, not a class (this was my original implementation).

static List<int> FindCommon(List<List<int>> items)
{
Dictionary<int, int> oldList = new Dictionary<int, int>();
Dictionary<int, int> newList = new Dictionary<int, int>();


oldList = items[0].ToDictionary(x => x, x => x);


foreach (List<int> list in items.Skip(1))
{
foreach (int i in list)
{
if (oldList.Keys.Contains(i))
{
newList.Add(i, i);
}
}


oldList = new Dictionary<int, int>(newList);
newList.Clear();
}


return oldList.Values.ToList();
}