使用正则表达式的 Scala 捕获组

假设我有这个密码:

val string = "one493two483three"
val pattern = """two(\d+)three""".r
pattern.findAllIn(string).foreach(println)

我原以为 findAllIn只返回 483,但实际上它返回的是 two483three。我知道我可以使用 unapply只提取那个部分,但是我必须有一个整个字符串的模式,比如:

 val pattern = """one.*two(\d+)three""".r
val pattern(aMatch) = string
println(aMatch) // prints 483

有没有其他方法可以实现这一点,不直接使用来自 java.util的类,也不使用 unapplication?

64796 次浏览

You want to look at group(1), you're currently looking at group(0), which is "the entire matched string".

See this regex tutorial.

Here's an example of how you can access group(1) of each match:

val string = "one493two483three"
val pattern = """two(\d+)three""".r
pattern.findAllIn(string).matchData foreach {
m => println(m.group(1))
}

This prints "483" (as seen on ideone.com).


The lookaround option

Depending on the complexity of the pattern, you can also use lookarounds to only match the portion you want. It'll look something like this:

val string = "one493two483three"
val pattern = """(?<=two)\d+(?=three)""".r
pattern.findAllIn(string).foreach(println)

The above also prints "483" (as seen on ideone.com).

References

val string = "one493two483three"
val pattern = """.*two(\d+)three.*""".r


string match {
case pattern(a483) => println(a483) //matched group(1) assigned to variable a483
case _ => // no match
}
def extractFileNameFromHttpFilePathExpression(expr: String) = {
//define regex
val regex = "http4.*\\/(\\w+.(xlsx|xls|zip))$".r
// findFirstMatchIn/findAllMatchIn returns Option[Match] and Match has methods to access capture groups.
regex.findFirstMatchIn(expr) match {
case Some(i) => i.group(1)
case None => "regex_error"
}
}
extractFileNameFromHttpFilePathExpression(
"http4://testing.bbmkl.com/document/sth1234.zip")

Starting Scala 2.13, as an alternative to regex solutions, it's also possible to pattern match a String by unapplying a string interpolator:

"one493two483three" match { case s"${x}two${y}three" => y }
// String = "483"

Or even:

val s"${x}two${y}three" = "one493two483three"
// x: String = one493
// y: String = 483

If you expect non matching input, you can add a default pattern guard:

"one493deux483three" match {
case s"${x}two${y}three" => y
case _                   => "no match"
}
// String = "no match"