将元组的 List 转换为 map (并处理重复的键?)

我正在考虑一种很好的方法来转换一个元组列表与重复的关键字 [("a","b"),("c","d"),("a","f")]到地图 ("a" -> ["b", "f"], "c" -> ["d"])。通常(在 python 中) ,我会在列表上创建一个空 map 和 for-loop,并检查是否有重复的键。但是我在这里寻找一些更加规模化和聪明的解决方案。

顺便说一下,我在这里使用的键值的实际类型是 (Int, Node),我想转换成 (Int -> NodeSeq)的映射

99042 次浏览

Group and then project:

scala> val x = List("a" -> "b", "c" -> "d", "a" -> "f")
//x: List[(java.lang.String, java.lang.String)] = List((a,b), (c,d), (a,f))
scala> x.groupBy(_._1).map { case (k,v) => (k,v.map(_._2))}
//res1: scala.collection.immutable.Map[java.lang.String,List[java.lang.String]] = Map(c -> List(d), a -> List(b, f))

More scalish way to use fold, in the way like there (skip map f step).

Here's another alternative:

x.groupBy(_._1).mapValues(_.map(_._2))

For Googlers that don't expect duplicates or are fine with the default duplicate handling policy:

List("a" -> 1, "b" -> 2, "a" -> 3).toMap
// Result: Map(a -> 3, c -> 2)

As of 2.12, the default policy reads:

Duplicate keys will be overwritten by later keys: if this is an unordered collection, which key is in the resulting map is undefined.

For Googlers that do care about duplicates:

implicit class Pairs[A, B](p: List[(A, B)]) {
def toMultiMap: Map[A, List[B]] = p.groupBy(_._1).mapValues(_.map(_._2))
}


> List("a" -> "b", "a" -> "c", "d" -> "e").toMultiMap
> Map("a" -> List("b", "c"), "d" -> List("e"))

Here is a more Scala idiomatic way to convert a list of tuples to a map handling duplicate keys. You want to use a fold.

val x = List("a" -> "b", "c" -> "d", "a" -> "f")


x.foldLeft(Map.empty[String, Seq[String]]) { case (acc, (k, v)) =>
acc.updated(k, acc.getOrElse(k, Seq.empty[String]) ++ Seq(v))
}


res0: scala.collection.immutable.Map[String,Seq[String]] = Map(a -> List(b, f), c -> List(d))

You can try this

scala> val b = new Array[Int](3)
// b: Array[Int] = Array(0, 0, 0)
scala> val c = b.map(x => (x -> x * 2))
// c: Array[(Int, Int)] = Array((1,2), (2,4), (3,6))
scala> val d = Map(c : _*)
// d: scala.collection.immutable.Map[Int,Int] = Map(1 -> 2, 2 -> 4, 3 -> 6)

Below you can find a few solutions. (GroupBy, FoldLeft, Aggregate, Spark)

val list: List[(String, String)] = List(("a","b"),("c","d"),("a","f"))

GroupBy variation

list.groupBy(_._1).map(v => (v._1, v._2.map(_._2)))

Fold Left variation

list.foldLeft[Map[String, List[String]]](Map())((acc, value) => {
acc.get(value._1).fold(acc ++ Map(value._1 -> List(value._2))){ v =>
acc ++ Map(value._1 -> (value._2 :: v))
}
})

Aggregate Variation - Similar to fold Left

list.aggregate[Map[String, List[String]]](Map())(
(acc, value) => acc.get(value._1).fold(acc ++ Map(value._1 ->
List(value._2))){ v =>
acc ++ Map(value._1 -> (value._2 :: v))
},
(l, r) => l ++ r
)

Spark Variation - For big data sets (Conversion to a RDD and to a Plain Map from RDD)

import org.apache.spark.rdd._
import org.apache.spark.{SparkContext, SparkConf}


val conf: SparkConf = new
SparkConf().setAppName("Spark").setMaster("local")
val sc: SparkContext = new SparkContext (conf)


// This gives you a rdd of the same result
val rdd: RDD[(String, List[String])] = sc.parallelize(list).combineByKey(
(value: String) => List(value),
(acc: List[String], value) => value :: acc,
(accLeft: List[String], accRight: List[String]) => accLeft ::: accRight
)


// To convert this RDD back to a Map[(String, List[String])] you can do the following
rdd.collect().toMap

Starting Scala 2.13, most collections are provided with the groupMap method which is (as its name suggests) an equivalent (more efficient) of a groupBy followed by mapValues:

List("a" -> "b", "c" -> "d", "a" -> "f").groupMap(_._1)(_._2)
// Map[String,List[String]] = Map(a -> List(b, f), c -> List(d))

This:

  • groups elements based on the first part of tuples (group part of groupMap)

  • maps grouped values by taking their second tuple part (map part of groupMap)

This is an equivalent of list.groupBy(_._1).mapValues(_.map(_._2)) but performed in one pass through the List.