比较两个集合以求相等，而不管其中项的顺序如何

小开

创建一个 Dictionary“ dict”，然后对第一个集合中的每个成员执行 dict [ member ] + + ;

然后，以相同的方式循环第二个集合，但是对于每个成员都使用 dit [ member ]——。

最后，循环遍历字典中的所有成员:

    private bool SetEqual (List<int> left, List<int> right) {


if (left.Count != right.Count)
return false;


Dictionary<int, int> dict = new Dictionary<int, int>();


foreach (int member in left) {
if (dict.ContainsKey(member) == false)
dict[member] = 1;
else
dict[member]++;
}


foreach (int member in right) {
if (dict.ContainsKey(member) == false)
return false;
else
dict[member]--;
}


foreach (KeyValuePair<int, int> kvp in dict) {
if (kvp.Value != 0)
return false;
}


return true;


}

编辑: 据我所知，这和最有效的算法是一个顺序的。这个算法是 O (N) ，假设 Dictionary 使用 O (1)查找。

小开

Erickson 几乎是正确的: 因为您希望匹配重复的计数，所以需要包。在 Java 中，这看起来像是:

(new HashBag(collection1)).equals(new HashBag(collection2))

我相信 C # 有一个内置的 Set 实现。我将首先使用它; 如果性能有问题，您总是可以使用不同的 Set 实现，但是使用相同的 Set 接口。

小开

一个简单而且相当有效的解决方案是对两个集合进行排序，然后比较它们是否相等:

bool equal = collection1.OrderBy(i => i).SequenceEqual(
collection2.OrderBy(i => i));

这个算法是 O (N * logN) ，而上面的解是 O (N ^ 2)。

如果集合具有某些属性，则可以实现更快的解决方案。例如，如果两个集合都是哈希集，则它们不能包含重复项。此外，检查哈希集是否包含某些元素也非常快。在这种情况下，类似于你的算法可能是最快的。

小开

这是我(深受 D.Jennings 影响)的比较方法的通用实现(在 C # 中) :

/// <summary>
/// Represents a service used to compare two collections for equality.
/// </summary>
/// <typeparam name="T">The type of the items in the collections.</typeparam>
public class CollectionComparer<T>
{
/// <summary>
/// Compares the content of two collections for equality.
/// </summary>
/// <param name="foo">The first collection.</param>
/// <param name="bar">The second collection.</param>
/// <returns>True if both collections have the same content, false otherwise.</returns>
public bool Execute(ICollection<T> foo, ICollection<T> bar)
{
// Declare a dictionary to count the occurence of the items in the collection
Dictionary<T, int> itemCounts = new Dictionary<T,int>();


// Increase the count for each occurence of the item in the first collection
foreach (T item in foo)
{
if (itemCounts.ContainsKey(item))
{
itemCounts[item]++;
}
else
{
itemCounts[item] = 1;
}
}


// Wrap the keys in a searchable list
List<T> keys = new List<T>(itemCounts.Keys);


// Decrease the count for each occurence of the item in the second collection
foreach (T item in bar)
{
// Try to find a key for the item
// The keys of a dictionary are compared by reference, so we have to
// find the original key that is equivalent to the "item"
// You may want to override ".Equals" to define what it means for
// two "T" objects to be equal
T key = keys.Find(
delegate(T listKey)
{
return listKey.Equals(item);
});


// Check if a key was found
if(key != null)
{
itemCounts[key]--;
}
else
{
// There was no occurence of this item in the first collection, thus the collections are not equal
return false;
}
}


// The count of each item should be 0 if the contents of the collections are equal
foreach (int value in itemCounts.Values)
{
if (value != 0)
{
return false;
}
}


// The collections are equal
return true;
}
}

小开

您可以使用哈希特。

小开

这个问题有很多解决办法。如果您不关心重复，那么就不必同时对两个重复进行排序。首先确保它们具有相同数量的项目。在那之后。然后在已排序的集合中对第二个集合中的每个项进行双向搜索。如果没有找到给定的项，则停止并返回 false。这个问题的复杂性: - 对第一个集合进行排序: N < em > Log (N) - 从第二项搜寻至第一项: N LOG (N) 所以最终得到2 * N * LOG (N) ，假设它们匹配，然后查找所有内容。这类似于对两者进行排序的复杂性。而且，如果有差异的话，这也给了你提前停止的好处。但是，请记住，如果在进行这种比较之前对两者都进行了排序，并尝试使用类似于 qsort 的方法进行排序，那么排序的成本将更高。这方面有一些优化措施。另一种替代方法是使用位掩码索引，这对于知道元素范围的小集合非常有用。这将给你一个 O (n)表演。另一种选择是使用散列并查找它。对于较小的集合，排序或位掩码索引通常要好得多。哈希表有局部性差的缺点，所以请记住这一点。再说一次，这只是在你不关心重复的情况下，如果你想要解释重复，那就两个都排序吧。

小开

编辑: 我一提出这个问题就意识到这个方法只适用于集合——它不能正确处理有重复项的集合。例如，从该算法的角度来看，{1,1,2}和{2,2,1}被认为是相等的。但是，如果您的集合是集合(或者可以通过这种方式度量它们的相等性) ，我希望下面的内容对您有用。

我的解决办法是:

return c1.Count == c2.Count && c1.Intersect(c2).Count() == c1.Count;

Linq 在表面下做字典的工作，所以这也是 O (N)。(注意，如果集合的大小不相同，则为 O (1))。

我使用 Daniel 建议的“ Setequals”方法、 Igor 建议的 OrderBy/SequenceEquals 方法和我的建议进行了一次理智检查。结果如下，显示 O (N * LogN)为 Igor，O (N)为 mine 和 Daniel’s。

我认为 Linq intersect 代码的简单性使它成为更好的解决方案。

__Test Latency(ms)__
N, SetEquals, OrderBy, Intersect
1024, 0, 0, 0
2048, 0, 0, 0
4096, 31.2468, 0, 0
8192, 62.4936, 0, 0
16384, 156.234, 15.6234, 0
32768, 312.468, 15.6234, 46.8702
65536, 640.5594, 46.8702, 31.2468
131072, 1312.3656, 93.7404, 203.1042
262144, 3765.2394, 187.4808, 187.4808
524288, 5718.1644, 374.9616, 406.2084
1048576, 11420.7054, 734.2998, 718.6764
2097152, 35090.1564, 1515.4698, 1484.223

小开

在没有重复和没有顺序的情况下，可以使用以下 EqualityComparer 允许集合作为字典键:

public class SetComparer<T> : IEqualityComparer<IEnumerable<T>>
where T:IComparable<T>
{
public bool Equals(IEnumerable<T> first, IEnumerable<T> second)
{
if (first == second)
return true;
if ((first == null) || (second == null))
return false;
return first.ToHashSet().SetEquals(second);
}


public int GetHashCode(IEnumerable<T> enumerable)
{
int hash = 17;


foreach (T val in enumerable.OrderBy(x => x))
hash = hash * 23 + val.GetHashCode();


return hash;
}
}

这里是我使用的 ToHashSet ()实现。

小开

这是一篇类似的文章，但是是查看我的集合比较解决方案。它非常简单:

这将执行相等比较，无论顺序如何:

var list1 = new[] { "Bill", "Bob", "Sally" };
var list2 = new[] { "Bob", "Bill", "Sally" };
bool isequal = list1.Compare(list2).IsSame;

这将检查项目是否添加/删除:

var list1 = new[] { "Billy", "Bob" };
var list2 = new[] { "Bob", "Sally" };
var diff = list1.Compare(list2);
var onlyinlist1 = diff.Removed; //Billy
var onlyinlist2 = diff.Added;   //Sally
var inbothlists = diff.Equal;   //Bob

这将看到字典中哪些条目发生了变化:

var original = new Dictionary<int, string>() { { 1, "a" }, { 2, "b" } };
var changed = new Dictionary<int, string>() { { 1, "aaa" }, { 2, "b" } };
var diff = original.Compare(changed, (x, y) => x.Value == y.Value, (x, y) => x.Value == y.Value);
foreach (var item in diff.Different)
Console.Write("{0} changed to {1}", item.Key.Value, item.Value.Value);
//Will output: a changed to aaa

原帖给你。

小开

最佳答案

事实证明，微软在其测试框架收集断言中已经涵盖了这一点

备注

两个集合是等效的，如果它们有相同的元素在相同的数量，但以任何顺序。元素如果它们的值相等, 如果它们引用同一个对象就不会。

通过使用反射器，我修改了 Areequals ()背后的代码，以创建相应的相等性比较器。它比现有的答案更加完整，因为它考虑了 null，实现了 IEqualityComparer，并且有一些效率和边界检查。另外，这是微软:)

public class MultiSetComparer<T> : IEqualityComparer<IEnumerable<T>>
{
private readonly IEqualityComparer<T> m_comparer;
public MultiSetComparer(IEqualityComparer<T> comparer = null)
{
m_comparer = comparer ?? EqualityComparer<T>.Default;
}


public bool Equals(IEnumerable<T> first, IEnumerable<T> second)
{
if (first == null)
return second == null;


if (second == null)
return false;


if (ReferenceEquals(first, second))
return true;


if (first is ICollection<T> firstCollection && second is ICollection<T> secondCollection)
{
if (firstCollection.Count != secondCollection.Count)
return false;


if (firstCollection.Count == 0)
return true;
}


return !HaveMismatchedElement(first, second);
}


private bool HaveMismatchedElement(IEnumerable<T> first, IEnumerable<T> second)
{
int firstNullCount;
int secondNullCount;


var firstElementCounts = GetElementCounts(first, out firstNullCount);
var secondElementCounts = GetElementCounts(second, out secondNullCount);


if (firstNullCount != secondNullCount || firstElementCounts.Count != secondElementCounts.Count)
return true;


foreach (var kvp in firstElementCounts)
{
var firstElementCount = kvp.Value;
int secondElementCount;
secondElementCounts.TryGetValue(kvp.Key, out secondElementCount);


if (firstElementCount != secondElementCount)
return true;
}


return false;
}


private Dictionary<T, int> GetElementCounts(IEnumerable<T> enumerable, out int nullCount)
{
var dictionary = new Dictionary<T, int>(m_comparer);
nullCount = 0;


foreach (T element in enumerable)
{
if (element == null)
{
nullCount++;
}
else
{
int num;
dictionary.TryGetValue(element, out num);
num++;
dictionary[element] = num;
}
}


return dictionary;
}


public int GetHashCode(IEnumerable<T> enumerable)
{
if (enumerable == null) throw new
ArgumentNullException(nameof(enumerable));


int hash = 17;


foreach (T val in enumerable)
hash ^= (val == null ? 42 : m_comparer.GetHashCode(val));


return hash;
}
}

使用方法:

var set = new HashSet<IEnumerable<int>>(new[] {new[]{1,2,3}}, new MultiSetComparer<int>());
Console.WriteLine(set.Contains(new [] {3,2,1})); //true
Console.WriteLine(set.Contains(new [] {1, 2, 3, 3})); //false

或者，如果您只是想直接比较两个集合:

var comp = new MultiSetComparer<string>();
Console.WriteLine(comp.Equals(new[] {"a","b","c"}, new[] {"a","c","b"})); //true
Console.WriteLine(comp.Equals(new[] {"a","b","c"}, new[] {"a","b"})); //false

最后，您可以使用自己选择的一个相等比较器:

var strcomp = new MultiSetComparer<string>(StringComparer.OrdinalIgnoreCase);
Console.WriteLine(strcomp.Equals(new[] {"a", "b"}, new []{"B", "A"})); //true

小开

为什么不使用。除了()

// Create the IEnumerable data sources.
string[] names1 = System.IO.File.ReadAllLines(@"../../../names1.txt");
string[] names2 = System.IO.File.ReadAllLines(@"../../../names2.txt");
// Create the query. Note that method syntax must be used here.
IEnumerable<string> differenceQuery =   names1.Except(names2);
// Execute the query.
Console.WriteLine("The following lines are in names1.txt but not names2.txt");
foreach (string s in differenceQuery)
Console.WriteLine(s);

Http://msdn.microsoft.com/en-us/library/bb397894.aspx

小开

下面是 ohadsc 答案的扩展方法变体，以防对某些人有用

static public class EnumerableExtensions
{
static public bool IsEquivalentTo<T>(this IEnumerable<T> first, IEnumerable<T> second)
{
if ((first == null) != (second == null))
return false;


if (!object.ReferenceEquals(first, second) && (first != null))
{
if (first.Count() != second.Count())
return false;


if ((first.Count() != 0) && HaveMismatchedElement<T>(first, second))
return false;
}


return true;
}


private static bool HaveMismatchedElement<T>(IEnumerable<T> first, IEnumerable<T> second)
{
int firstCount;
int secondCount;


var firstElementCounts = GetElementCounts<T>(first, out firstCount);
var secondElementCounts = GetElementCounts<T>(second, out secondCount);


if (firstCount != secondCount)
return true;


foreach (var kvp in firstElementCounts)
{
firstCount = kvp.Value;
secondElementCounts.TryGetValue(kvp.Key, out secondCount);


if (firstCount != secondCount)
return true;
}


return false;
}


private static Dictionary<T, int> GetElementCounts<T>(IEnumerable<T> enumerable, out int nullCount)
{
var dictionary = new Dictionary<T, int>();
nullCount = 0;


foreach (T element in enumerable)
{
if (element == null)
{
nullCount++;
}
else
{
int num;
dictionary.TryGetValue(element, out num);
num++;
dictionary[element] = num;
}
}


return dictionary;
}


static private int GetHashCode<T>(IEnumerable<T> enumerable)
{
int hash = 17;


foreach (T val in enumerable.OrderBy(x => x))
hash = hash * 23 + val.GetHashCode();


return hash;
}
}

小开

在许多情况下，唯一合适的答案是 Igor Ostrovsky 的答案，其他的答案是基于对象散列码。但是，当您为一个对象生成一个哈希代码时，您只能基于它的 IMMUTABLE 字段来生成，比如对象 Id 字段(在数据库实体的情况下) 当 Equals 方法被重写时，为什么重写 GetHashCode 很重要？

这意味着，如果比较两个集合，即使不同项的字段不相等，比较方法的结果也可能为真。要深入比较集合，您需要使用 Igor 方法并实现 IEquality。

请看我和 Schnider 先生，在他投票最多的帖子上的评论。

詹姆斯

小开

static bool SetsContainSameElements<T>(IEnumerable<T> set1, IEnumerable<T> set2) {
var setXOR = new HashSet<T>(set1);
setXOR.SymmetricExceptWith(set2);
return (setXOR.Count == 0);
}

解决方案需要。NET 3.5和 System.Collections.Generic命名空间。根据微软的说法，SymmetricExceptWith是一个 O (n + m)操作，N表示第一个集合中的元素数，我表示第二个集合中的元素数。如果需要，总是可以向该函数添加相等比较器。

小开

这里有一个解决方案，它是对这个的一个改进。

public static bool HasSameElementsAs<T>(
this IEnumerable<T> first,
IEnumerable<T> second,
IEqualityComparer<T> comparer = null)
{
var firstMap = first
.GroupBy(x => x, comparer)
.ToDictionary(x => x.Key, x => x.Count(), comparer);


var secondMap = second
.GroupBy(x => x, comparer)
.ToDictionary(x => x.Key, x => x.Count(), comparer);


if (firstMap.Keys.Count != secondMap.Keys.Count)
return false;


if (firstMap.Keys.Any(k1 => !secondMap.ContainsKey(k1)))
return false;


return firstMap.Keys.All(x => firstMap[x] == secondMap[x]);
}

小开

如果使用应该可以，则可以使用 ShouldAllBe 和 ContainsBe。

collection1 = {1, 2, 3, 4};
collection2 = {2, 4, 1, 3};


collection1.ShouldAllBe(item=>collection2.Contains(item)); // true

最后，你可以写一个扩展。

public static class ShouldlyIEnumerableExtensions
{
public static void ShouldEquivalentTo<T>(this IEnumerable<T> list, IEnumerable<T> equivalent)
{
list.ShouldAllBe(l => equivalent.Contains(l));
}
}

更新

应该是方法上有一个可选参数。

collection1.ShouldBe(collection2, ignoreOrder: true); // true

小开

允许在 IEnumerable<T>中重复(如果设置是不可取的)和“忽略顺序”你应该能够使用一个 .GroupBy()。

我不是复杂度度量方面的专家，但我的基本理解是，这应该是 O (n)。我理解 O (n ^ 2)来自于在另一个 O (n)操作(如 ListA.Where(a => ListB.Contains(a)).ToList())中执行 O (n)操作。对列表 B 中的每个项与列表 A 中的每个项进行相等性计算。

就像我说的，我对复杂性的理解是有限的，所以如果我错了，请纠正我。

public static bool IsSameAs<T, TKey>(this IEnumerable<T> source, IEnumerable<T> target, Expression<Func<T, TKey>> keySelectorExpression)
{
// check the object
if (source == null && target == null) return true;
if (source == null || target == null) return false;


var sourceList = source.ToList();
var targetList = target.ToList();


// check the list count :: { 1,1,1 } != { 1,1,1,1 }
if (sourceList.Count != targetList.Count) return false;


var keySelector = keySelectorExpression.Compile();
var groupedSourceList = sourceList.GroupBy(keySelector).ToList();
var groupedTargetList = targetList.GroupBy(keySelector).ToList();


// check that the number of grouptings match :: { 1,1,2,3,4 } != { 1,1,2,3,4,5 }
var groupCountIsSame = groupedSourceList.Count == groupedTargetList.Count;
if (!groupCountIsSame) return false;


// check that the count of each group in source has the same count in target :: for values { 1,1,2,3,4 } & { 1,1,1,2,3,4 }
// key:count
// { 1:2, 2:1, 3:1, 4:1 } != { 1:3, 2:1, 3:1, 4:1 }
var countsMissmatch = groupedSourceList.Any(sourceGroup =>
{
var targetGroup = groupedTargetList.Single(y => y.Key.Equals(sourceGroup.Key));
return sourceGroup.Count() != targetGroup.Count();
});
return !countsMissmatch;
}

小开

这个简单的解决方案强制 IEnumerable的泛型类型实现 IComparable OrderBy 的定义。

如果你不想做这样的假设，但仍然想使用这个解决方案，你可以使用下面的代码:

bool equal = collection1.OrderBy(i => i?.GetHashCode())
.SequenceEqual(collection2.OrderBy(i => i?.GetHashCode()));

小开

如果为了单元测试断言的目的而进行比较，那么在进行比较之前将一些效率抛出窗口并简单地将每个列表转换为字符串表示(csv)可能是有意义的。这样，默认的测试断言消息将显示错误消息中的差异。

用法:

using Microsoft.VisualStudio.TestTools.UnitTesting;


// define collection1, collection2, ...


Assert.Equal(collection1.OrderBy(c=>c).ToCsv(), collection2.OrderBy(c=>c).ToCsv());

家务助理推广方法:

public static string ToCsv<T>(
this IEnumerable<T> values,
Func<T, string> selector,
string joinSeparator = ",")
{
if (selector == null)
{
if (typeof(T) == typeof(Int16) ||
typeof(T) == typeof(Int32) ||
typeof(T) == typeof(Int64))
{
selector = (v) => Convert.ToInt64(v).ToStringInvariant();
}
else if (typeof(T) == typeof(decimal))
{
selector = (v) => Convert.ToDecimal(v).ToStringInvariant();
}
else if (typeof(T) == typeof(float) ||
typeof(T) == typeof(double))
{
selector = (v) => Convert.ToDouble(v).ToString(CultureInfo.InvariantCulture);
}
else
{
selector = (v) => v.ToString();
}
}


return String.Join(joinSeparator, values.Select(v => selector(v)));
}

小开

基于这个重复问题的回答，以及答案下面的评论，和@brian-genisio 回答，我想出了这些:

        public static bool AreEquivalentIgnoringDuplicates<T>(this IEnumerable<T> items, IEnumerable<T> otherItems)
{
var itemList = items.ToList();
var otherItemList = otherItems.ToList();
var except = itemList.Except(otherItemList);
return itemList.Count == otherItemList.Count && except.IsEmpty();
}


public static bool AreEquivalent<T>(this IEnumerable<T> items, IEnumerable<T> otherItems)
{
var itemList = items.ToList();
var otherItemList = otherItems.ToList();
var except = itemList.Except(otherItemList);
return itemList.Distinct().Count() == otherItemList.Count && except.IsEmpty();
}

对这两个人的测试:

        [Test]
public void collection_with_duplicates_are_equivalent()
{
var a = new[] {1, 5, 5};
var b = new[] {1, 1, 5};


a.AreEquivalentIgnoringDuplicates(b).ShouldBe(true);
}


[Test]
public void collection_with_duplicates_are_not_equivalent()
{
var a = new[] {1, 5, 5};
var b = new[] {1, 1, 5};


a.AreEquivalent(b).ShouldBe(false);
}

小开

下面是我对这个问题的尝试。它是基于这个策略的，但也借鉴了公认的答案的一些想法。

public static class EnumerableExtensions
{
public static bool SequenceEqualUnordered<TSource>(this IEnumerable<TSource> source, IEnumerable<TSource> second)
{
return SequenceEqualUnordered(source, second, EqualityComparer<TSource>.Default);
}
   

public static bool SequenceEqualUnordered<TSource>(this IEnumerable<TSource> source, IEnumerable<TSource> second, IEqualityComparer<TSource> comparer)
{
if (source == null)
throw new ArgumentNullException(nameof(source));


if (second == null)
throw new ArgumentNullException(nameof(second));


if (source.TryGetCount(out int firstCount) && second.TryGetCount(out int secondCount))
{
if (firstCount != secondCount)
return false;


if (firstCount == 0)
return true;
}


IEqualityComparer<ValueTuple<TSource>> wrapperComparer = comparer != null ? new WrappedItemComparer<TSource>(comparer) : null;


Dictionary<ValueTuple<TSource>, int> counters;
ValueTuple<TSource> key;
int counter;


using (IEnumerator<TSource> enumerator = source.GetEnumerator())
{
if (!enumerator.MoveNext())
return !second.Any();


counters = new Dictionary<ValueTuple<TSource>, int>(wrapperComparer);


do
{
key = new ValueTuple<TSource>(enumerator.Current);


if (counters.TryGetValue(key, out counter))
counters[key] = counter + 1;
else
counters.Add(key, 1);
}
while (enumerator.MoveNext());
}


foreach (TSource item in second)
{
key = new ValueTuple<TSource>(item);


if (counters.TryGetValue(key, out counter))
{
if (counter <= 0)
return false;


counters[key] = counter - 1;
}
else
return false;
}


return counters.Values.All(cnt => cnt == 0);
}


private static bool TryGetCount<TSource>(this IEnumerable<TSource> source, out int count)
{
switch (source)
{
case ICollection<TSource> collection:
count = collection.Count;
return true;
case IReadOnlyCollection<TSource> readOnlyCollection:
count = readOnlyCollection.Count;
return true;
case ICollection nonGenericCollection:
count = nonGenericCollection.Count;
return true;
default:
count = default;
return false;
}
}


private sealed class WrappedItemComparer<TSource> : IEqualityComparer<ValueTuple<TSource>>
{
private readonly IEqualityComparer<TSource> _comparer;


public WrappedItemComparer(IEqualityComparer<TSource> comparer)
{
_comparer = comparer;
}


public bool Equals(ValueTuple<TSource> x, ValueTuple<TSource> y) => _comparer.Equals(x.Item1, y.Item1);


public int GetHashCode(ValueTuple<TSource> obj) => _comparer.GetHashCode(obj.Item1);
}
}

MS 解决方案的改进:

没有走 ReferenceEquals(first, second)的捷径，因为这是有争议的。例如，考虑一个定制的 IEnumerable<T>，其实现如下: public IEnumerator<T> GetEnumerator() => Enumerable.Repeat(default(T), new Random().Next(10)).GetEnumerator()。
当两个可枚举数都是集合但不仅检查 ICollection<T>而且检查其他集合接口时采用可能的快捷方式。
正确处理空值。将 null 值与其他(非 null)值分开计算，看起来也不是100% 万无一失。考虑一个自定义相等比较器，它以非标准方式处理 null 值。

这个解决方案也可在我的实用 NuGet 软件包。