Java8,Streams 来查找重复的元素

我试图在整数列表中列出重复的元素,例如,

List<Integer> numbers = Arrays.asList(new Integer[]{1,2,1,3,4,4});

使用 JDK8流。有没有人试过。要删除重复项,我们可以使用独特的() api。但是找到重复的元素呢?有人能帮我吗?

216458 次浏览

You can get the duplicated like this :

List<Integer> numbers = Arrays.asList(1, 2, 1, 3, 4, 4);
Set<Integer> duplicated = numbers
.stream()
.filter(n -> numbers
.stream()
.filter(x -> x == n)
.count() > 1)
.collect(Collectors.toSet());

You need a set (allItems below) to hold the entire array contents, but this is O(n):

Integer[] numbers = new Integer[] { 1, 2, 1, 3, 4, 4 };
Set<Integer> allItems = new HashSet<>();
Set<Integer> duplicates = Arrays.stream(numbers)
.filter(n -> !allItems.add(n)) //Set.add() returns false if the item was already in the set.
.collect(Collectors.toSet());
System.out.println(duplicates); // [1, 4]

Basic example. First-half builds the frequency-map, second-half reduces it to a filtered list. Probably not as efficient as Dave's answer, but more versatile (like if you want to detect exactly two etc.)

List<Integer> duplicates = IntStream.of( 1, 2, 3, 2, 1, 2, 3, 4, 2, 2, 2 )
.boxed()
.collect( Collectors.groupingBy( Function.identity(), Collectors.counting() ) )
.entrySet()
.stream()
.filter( p -> p.getValue() > 1 )
.map( Map.Entry::getKey )
.collect( Collectors.toList() );

An O(n) way would be as below:

List<Integer> numbers = Arrays.asList(1, 2, 1, 3, 4, 4);
Set<Integer> duplicatedNumbersRemovedSet = new HashSet<>();
Set<Integer> duplicatedNumbersSet = numbers.stream().filter(n -> !duplicatedNumbersRemovedSet.add(n)).collect(Collectors.toSet());

The space complexity would go double in this approach, but that space is not a waste; in-fact, we now have the duplicated alone only as a Set as well as another Set with all the duplicates removed too.

My StreamEx library which enhances the Java 8 streams provides a special operation distinct(atLeast) which can retain only elements appearing at least the specified number of times. So your problem can be solved like this:

List<Integer> repeatingNumbers = StreamEx.of(numbers).distinct(2).toList();

Internally it's similar to @Dave solution, it counts objects, to support other wanted quantities and it's parallel-friendly (it uses ConcurrentHashMap for parallelized stream, but HashMap for sequential). For big amounts of data you can get a speed-up using .parallel().distinct(2).

You can use Collections.frequency:

numbers.stream().filter(i -> Collections.frequency(numbers, i) >1)
.collect(Collectors.toSet()).forEach(System.out::println);

I think I have good solution how to fix problem like this - List => List with grouping by Something.a & Something.b. There is extended definition:

public class Test {


public static void test() {


class A {
private int a;
private int b;
private float c;
private float d;


public A(int a, int b, float c, float d) {
this.a = a;
this.b = b;
this.c = c;
this.d = d;
}
}




List<A> list1 = new ArrayList<A>();


list1.addAll(Arrays.asList(new A(1, 2, 3, 4),
new A(2, 3, 4, 5),
new A(1, 2, 3, 4),
new A(2, 3, 4, 5),
new A(1, 2, 3, 4)));


Map<Integer, A> map = list1.stream()
.collect(HashMap::new, (m, v) -> m.put(
Objects.hash(v.a, v.b, v.c, v.d), v),
HashMap::putAll);


list1.clear();
list1.addAll(map.values());


System.out.println(list1);
}


}

class A, list1 it's just incoming data - magic is in the Objects.hash(...) :)

Do you have to use the java 8 idioms (steams)? Perphaps a simple solution would be to move the complexity to a map alike data structure that holds numbers as key (without repeating) and the times it ocurrs as a value. You could them iterate that map an only do something with those numbers that are ocurrs > 1.

import java.lang.Math;
import java.util.Arrays;
import java.util.List;
import java.util.Map;
import java.util.HashMap;
import java.util.Iterator;


public class RemoveDuplicates
{
public static void main(String[] args)
{
List<Integer> numbers = Arrays.asList(new Integer[]{1,2,1,3,4,4});
Map<Integer,Integer> countByNumber = new HashMap<Integer,Integer>();
for(Integer n:numbers)
{
Integer count = countByNumber.get(n);
if (count != null) {
countByNumber.put(n,count + 1);
} else {
countByNumber.put(n,1);
}
}
System.out.println(countByNumber);
Iterator it = countByNumber.entrySet().iterator();
while (it.hasNext()) {
Map.Entry pair = (Map.Entry)it.next();
System.out.println(pair.getKey() + " = " + pair.getValue());
}
}
}

Try this solution:

public class Anagramm {


public static boolean isAnagramLetters(String word, String anagramm) {
if (anagramm.isEmpty()) {
return false;
}


Map<Character, Integer> mapExistString = CharCountMap(word);
Map<Character, Integer> mapCheckString = CharCountMap(anagramm);
return enoughLetters(mapExistString, mapCheckString);
}


private static Map<Character, Integer> CharCountMap(String chars) {
HashMap<Character, Integer> charCountMap = new HashMap<Character, Integer>();
for (char c : chars.toCharArray()) {
if (charCountMap.containsKey(c)) {
charCountMap.put(c, charCountMap.get(c) + 1);
} else {
charCountMap.put(c, 1);
}
}
return charCountMap;
}


static boolean enoughLetters(Map<Character, Integer> mapExistString, Map<Character,Integer> mapCheckString) {
for( Entry<Character, Integer> e : mapCheckString.entrySet() ) {
Character letter = e.getKey();
Integer available = mapExistString.get(letter);
if (available == null || e.getValue() > available) return false;
}
return true;
}


}

I think basic solutions to the question should be as below:

Supplier supplier=HashSet::new;
HashSet has=ls.stream().collect(Collectors.toCollection(supplier));


List lst = (List) ls.stream().filter(e->Collections.frequency(ls,e)>1).distinct().collect(Collectors.toList());

well, it is not recommended to perform a filter operation, but for better understanding, i have used it, moreover, there should be some custom filtration in future versions.

A multiset is a structure maintaining the number of occurrences for each element. Using Guava implementation:

Set<Integer> duplicated =
ImmutableMultiset.copyOf(numbers).entrySet().stream()
.filter(entry -> entry.getCount() > 1)
.map(Multiset.Entry::getElement)
.collect(Collectors.toSet());

What about checking of indexes?

        numbers.stream()
.filter(integer -> numbers.indexOf(integer) != numbers.lastIndexOf(integer))
.collect(Collectors.toSet())
.forEach(System.out::println);

If you only need to detect the presence of duplicates (instead of listing them, which is what the OP wanted), just convert them into both a List and Set, then compare the sizes:

    List<Integer> list = ...;
Set<Integer> set = new HashSet<>(list);
if (list.size() != set.size()) {
// duplicates detected
}

I like this approach because it has less places for mistakes.

the creating of an additional map or stream is time- and space-consuming…

Set<Integer> duplicates = numbers.stream().collect( Collectors.collectingAndThen(
Collectors.groupingBy( Function.identity(), Collectors.counting() ),
map -> {
map.values().removeIf( cnt -> cnt < 2 );
return( map.keySet() );
} ) );  // [1, 4]


…and for the question of which is claimed to be a [duplicate]

public static int[] getDuplicatesStreamsToArray( int[] input ) {
return( IntStream.of( input ).boxed().collect( Collectors.collectingAndThen(
Collectors.groupingBy( Function.identity(), Collectors.counting() ),
map -> {
map.values().removeIf( cnt -> cnt < 2 );
return( map.keySet() );
} ) ).stream().mapToInt( i -> i ).toArray() );
}

Set.add() is faster if you're looking for performance.

public class FindDuplicatedBySet {


public static void main(String[] args) {
List<Integer> list = Arrays.asList(5, 3, 4, 1, 3, 7, 2,3,1, 9, 9, 4,1);
Set<Integer> result = findDuplicatedBySetAdd(list);
result.forEach(System.out::println);
}


public static <T> Set<T> findDuplicatedBySetAdd(List<T> list) {
Set<T> items = new HashSet<>();
return list.stream()
.filter(n -> !items.add(n))
.collect(Collectors.toSet());
}
}

Using stream Set<Integer> set = new HashSet<>(); list.stream().filter(data -> !set.add(data)).forEach(data -> System.out.println("duplicates "+data));

Using distinct on stream would filter duplicates, you can either collect as set or List.

 numbers.stream().distinct().collect(Collectors.toSet())