c# LINQ在列表中查找重复项

使用LINQ,从List<int>中,如何检索包含重复多次的条目及其值的列表?

446577 次浏览

解决这个问题最简单的方法是根据元素的值对它们进行分组,然后如果组中有多个元素,则选择组中的一个代表。在LINQ中,这转换为:

var query = lst.GroupBy(x => x)
.Where(g => g.Count() > 1)
.Select(y => y.Key)
.ToList();

如果你想知道元素重复了多少次,你可以使用:

var query = lst.GroupBy(x => x)
.Where(g => g.Count() > 1)
.Select(y => new { Element = y.Key, Counter = y.Count() })
.ToList();

这将返回一个匿名类型的List,每个元素将具有属性ElementCounter,以检索您需要的信息。

最后,如果你要找的是字典,你可以用

var query = lst.GroupBy(x => x)
.Where(g => g.Count() > 1)
.ToDictionary(x => x.Key, y => y.Count());

这将返回一个字典,以您的元素为键,并将其重复的次数作为值。

你可以这样做:

var list = new[] {1,2,3,1,4,2};
var duplicateItems = list.Duplicates();

使用这些扩展方法:

public static class Extensions
{
public static IEnumerable<TSource> Duplicates<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> selector)
{
var grouped = source.GroupBy(selector);
var moreThan1 = grouped.Where(i => i.IsMultiple());
return moreThan1.SelectMany(i => i);
}


public static IEnumerable<TSource> Duplicates<TSource, TKey>(this IEnumerable<TSource> source)
{
return source.Duplicates(i => i);
}


public static bool IsMultiple<T>(this IEnumerable<T> source)
{
var enumerator = source.GetEnumerator();
return enumerator.MoveNext() && enumerator.MoveNext();
}
}

在duplicate方法中使用IsMultiple()比Count()更快,因为这不会迭代整个集合。

另一种方法是使用HashSet:

var hash = new HashSet<int>();
var duplicates = list.Where(i => !hash.Add(i));

如果你想在你的重复列表中的唯一值:

var myhash = new HashSet<int>();
var mylist = new List<int>(){1,1,2,2,3,3,3,4,4,4};
var duplicates = mylist.Where(item => !myhash.Add(item)).Distinct().ToList();

下面是与通用扩展方法相同的解决方案:

public static class Extensions
{
public static IEnumerable<TSource> GetDuplicates<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> selector, IEqualityComparer<TKey> comparer)
{
var hash = new HashSet<TKey>(comparer);
return source.Where(item => !hash.Add(selector(item))).ToList();
}


public static IEnumerable<TSource> GetDuplicates<TSource>(this IEnumerable<TSource> source, IEqualityComparer<TSource> comparer)
{
return source.GetDuplicates(x => x, comparer);
}


public static IEnumerable<TSource> GetDuplicates<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> selector)
{
return source.GetDuplicates(selector, null);
}


public static IEnumerable<TSource> GetDuplicates<TSource>(this IEnumerable<TSource> source)
{
return source.GetDuplicates(x => x, null);
}
}

找出一个枚举数是否包含任何重复的:

var anyDuplicate = enumerable.GroupBy(x => x.Key).Any(g => g.Count() > 1);

找出枚举对象中的所有值是否为独特的:

var allUnique = enumerable.GroupBy(x => x.Key).All(g => g.Count() == 1);

我创建了一个扩展来响应这个,你可以把它包括在你的项目中,我认为这返回的大多数情况下,当你在列表或Linq中搜索重复。

例子:

//Dummy class to compare in list
public class Person
{
public int Id { get; set; }
public string Name { get; set; }
public string Surname { get; set; }
public Person(int id, string name, string surname)
{
this.Id = id;
this.Name = name;
this.Surname = surname;
}
}




//The extention static class
public static class Extention
{
public static IEnumerable<T> getMoreThanOnceRepeated<T>(this IEnumerable<T> extList, Func<T, object> groupProps) where T : class
{ //Return only the second and next reptition
return extList
.GroupBy(groupProps)
.SelectMany(z => z.Skip(1)); //Skip the first occur and return all the others that repeats
}
public static IEnumerable<T> getAllRepeated<T>(this IEnumerable<T> extList, Func<T, object> groupProps) where T : class
{
//Get All the lines that has repeating
return extList
.GroupBy(groupProps)
.Where(z => z.Count() > 1) //Filter only the distinct one
.SelectMany(z => z);//All in where has to be retuned
}
}


//how to use it:
void DuplicateExample()
{
//Populate List
List<Person> PersonsLst = new List<Person>(){
new Person(1,"Ricardo","Figueiredo"), //fist Duplicate to the example
new Person(2,"Ana","Figueiredo"),
new Person(3,"Ricardo","Figueiredo"),//second Duplicate to the example
new Person(4,"Margarida","Figueiredo"),
new Person(5,"Ricardo","Figueiredo")//third Duplicate to the example
};


Console.WriteLine("All:");
PersonsLst.ForEach(z => Console.WriteLine("{0} -> {1} {2}", z.Id, z.Name, z.Surname));
/* OUTPUT:
All:
1 -> Ricardo Figueiredo
2 -> Ana Figueiredo
3 -> Ricardo Figueiredo
4 -> Margarida Figueiredo
5 -> Ricardo Figueiredo
*/


Console.WriteLine("All lines with repeated data");
PersonsLst.getAllRepeated(z => new { z.Name, z.Surname })
.ToList()
.ForEach(z => Console.WriteLine("{0} -> {1} {2}", z.Id, z.Name, z.Surname));
/* OUTPUT:
All lines with repeated data
1 -> Ricardo Figueiredo
3 -> Ricardo Figueiredo
5 -> Ricardo Figueiredo
*/
Console.WriteLine("Only Repeated more than once");
PersonsLst.getMoreThanOnceRepeated(z => new { z.Name, z.Surname })
.ToList()
.ForEach(z => Console.WriteLine("{0} -> {1} {2}", z.Id, z.Name, z.Surname));
/* OUTPUT:
Only Repeated more than once
3 -> Ricardo Figueiredo
5 -> Ricardo Figueiredo
*/
}

完整的集Linq到SQL扩展的重复功能检查在MS SQL Server。不使用. tolist()或IEnumerable。这些查询在SQL Server中执行,而不是在内存中执行。。结果只在内存中返回。

public static class Linq2SqlExtensions {


public class CountOfT<T> {
public T Key { get; set; }
public int Count { get; set; }
}


public static IQueryable<TKey> Duplicates<TSource, TKey>(this IQueryable<TSource> source, Expression<Func<TSource, TKey>> groupBy)
=> source.GroupBy(groupBy).Where(w => w.Count() > 1).Select(s => s.Key);


public static IQueryable<TSource> GetDuplicates<TSource, TKey>(this IQueryable<TSource> source, Expression<Func<TSource, TKey>> groupBy)
=> source.GroupBy(groupBy).Where(w => w.Count() > 1).SelectMany(s => s);


public static IQueryable<CountOfT<TKey>> DuplicatesCounts<TSource, TKey>(this IQueryable<TSource> source, Expression<Func<TSource, TKey>> groupBy)
=> source.GroupBy(groupBy).Where(w => w.Count() > 1).Select(y => new CountOfT<TKey> { Key = y.Key, Count = y.Count() });


public static IQueryable<Tuple<TKey, int>> DuplicatesCountsAsTuble<TSource, TKey>(this IQueryable<TSource> source, Expression<Func<TSource, TKey>> groupBy)
=> source.GroupBy(groupBy).Where(w => w.Count() > 1).Select(s => Tuple.Create(s.Key, s.Count()));
}

只查找重复的值:

var duplicates = list.GroupBy(x => x.Key).Where(g => g.Count() > 1);

如。

var list = new[] {1,2,3,1,4,2};

GroupBy将根据键对数字进行分组,并使用它保持计数(重复的次数)。在那之后,我们只是检查重复了不止一次的值。

要查找唯一的值:

var unique = list.GroupBy(x => x.Key).Where(g => g.Count() == 1);

如。

var list = new[] {1,2,3,1,4,2};

GroupBy将根据键对数字进行分组,并将与其保持计数(重复的次数)。在此之后,我们只是检查那些只重复一次的值是否惟一。

有一个答案,但我不明白为什么不工作;

var anyDuplicate = enumerable.GroupBy(x => x.Key).Any(g => g.Count() > 1);

我的解是这样的;

var duplicates = model.list
.GroupBy(s => s.SAME_ID)
.Where(g => g.Count() > 1).Count() > 0;
if(duplicates) {
doSomething();
}

Linq查询:

var query = from s2 in (from s in someList group s by new { s.Column1, s.Column2 } into sg select sg) where s2.Count() > 1 select s2;

按键删除重复项

myTupleList = myTupleList.GroupBy(tuple => tuple.Item1).Select(group => group.First()).ToList();

这个更简单的方法没有使用组只需要获取District元素,然后遍历它们并在列表中检查它们的计数,如果他们的Count是>1这意味着它出现超过一个项目,所以将它添加到重复列表

var mylist = new List<int>() { 1, 1, 2, 3, 3, 3, 4, 4, 4 };
var distList=  mylist.Distinct().ToList();
var Repeteditemlist = new List<int>();
foreach (var item in distList)
{
if(mylist.Count(e => e == item) > 1)
{
Repeteditemlist.Add(item);
}
}
foreach (var item in Repeteditemlist)
{
Console.WriteLine(item);
}

预期的输出:

< p > <强> 1 3. 4 < /强> < / p >

所有GroupBy的答案都是最简单的,但不是最有效的。它们对内存性能尤其不利,因为构建大型内部集合需要分配成本。

一个不错的替代方法是HuBeZa的基于HashSet.Add的方法。它表现得更好。

如果你不关心空值,就我所知,像这样的东西是最有效的(CPU和内存):

public static IEnumerable<TProperty> Duplicates<TSource, TProperty>(
this IEnumerable<TSource> source,
Func<TSource, TProperty> duplicateSelector,
IEqualityComparer<TProperty> comparer = null)
{
comparer ??= EqualityComparer<TProperty>.Default;


Dictionary<TProperty, int> counts = new Dictionary<TProperty, int>(comparer);


foreach (var item in source)
{
TProperty property = duplicateSelector(item);
counts.TryGetValue(property, out int count);


switch (count)
{
case 0:
counts[property] = ++count;
break;


case 1:
counts[property] = ++count;
yield return property;
break;
}
}
}

这里的技巧是在重复数达到1时避免额外的查找成本。当然,如果您还想知道每个项重复出现的次数,则可以使用count不断更新字典。对于null,你只需要一些额外的处理,仅此而已。

这是另一种方法:

对于HasDuplicate:

bool hasAnyDuplicate = list.Count > list.Distinct().Count;

对于重复的值

List<string> duplicates = new List<string>();
duplicates.AddRange(list);
list.Distinct().ToList().ForEach(x => duplicates.Remove(x));


// for unique duplicate values:
duplicates.Distinct():