如何在 scala 中列出子目录中的所有文件?

是否有一种好的“ scala-esque”(我想我的意思是函数式的)方法来递归地列出目录中的文件?匹配一个特定的模式怎么样?

例如,在 c:\temp中递归地匹配 "a*.foo"的所有文件。

71031 次浏览

Take a look at scala.tools.nsc.io

There are some very useful utilities there including deep listing functionality on the Directory class.

If I remember correctly this was highlighted (possibly contributed) by retronym and were seen as a stopgap before io gets a fresh and more complete implementation in the standard library.

Scala code typically uses Java classes for dealing with I/O, including reading directories. So you have to do something like:

import java.io.File
def recursiveListFiles(f: File): Array[File] = {
val these = f.listFiles
these ++ these.filter(_.isDirectory).flatMap(recursiveListFiles)
}

You could collect all the files and then filter using a regex:

myBigFileArray.filter(f => """.*\.html$""".r.findFirstIn(f.getName).isDefined)

Or you could incorporate the regex into the recursive search:

import scala.util.matching.Regex
def recursiveListFiles(f: File, r: Regex): Array[File] = {
val these = f.listFiles
val good = these.filter(f => r.findFirstIn(f.getName).isDefined)
good ++ these.filter(_.isDirectory).flatMap(recursiveListFiles(_,r))
}

Scala is a multi-paradigm language. A good "scala-esque" way of iterating a directory would be to reuse an existing code!

I'd consider using commons-io a perfectly scala-esque way of iterating a directory. You can use some implicit conversions to make it easier. Like

import org.apache.commons.io.filefilter.IOFileFilter
implicit def newIOFileFilter (filter: File=>Boolean) = new IOFileFilter {
def accept (file: File) = filter (file)
def accept (dir: File, name: String) = filter (new java.io.File (dir, name))
}
for (file <- new File("c:\\").listFiles) { processFile(file) }

http://langref.org/scala+java/files

I would prefer solution with Streams because you can iterate over infinite file system(Streams are lazy evaluated collections)

import scala.collection.JavaConversions._


def getFileTree(f: File): Stream[File] =
f #:: (if (f.isDirectory) f.listFiles().toStream.flatMap(getFileTree)
else Stream.empty)

Example for searching

getFileTree(new File("c:\\main_dir")).filter(_.getName.endsWith(".scala")).foreach(println)

I like yura's stream solution, but it (and the others) recurses into hidden directories. We can also simplify by making use of the fact that listFiles returns null for a non-directory.

def tree(root: File, skipHidden: Boolean = false): Stream[File] =
if (!root.exists || (skipHidden && root.isHidden)) Stream.empty
else root #:: (
root.listFiles match {
case null => Stream.empty
case files => files.toStream.flatMap(tree(_, skipHidden))
})

Now we can list files

tree(new File(".")).filter(f => f.isFile && f.getName.endsWith(".html")).foreach(println)

or realise the whole stream for later processing

tree(new File("dir"), true).toArray

Here's a similar solution to Rex Kerr's, but incorporating a file filter:

import java.io.File
def findFiles(fileFilter: (File) => Boolean = (f) => true)(f: File): List[File] = {
val ss = f.list()
val list = if (ss == null) {
Nil
} else {
ss.toList.sorted
}
val visible = list.filter(_.charAt(0) != '.')
val these = visible.map(new File(f, _))
these.filter(fileFilter) ++ these.filter(_.isDirectory).flatMap(findFiles(fileFilter))
}

The method returns a List[File], which is slightly more convenient than Array[File]. It also ignores all directories that are hidden (ie. beginning with '.').

It's partially applied using a file filter of your choosing, for example:

val srcDir = new File( ... )
val htmlFiles = findFiles( _.getName endsWith ".html" )( srcDir )

And here's a mixture of the stream solution from @DuncanMcGregor with the filter from @Rick-777:

  def tree( root: File, descendCheck: File => Boolean = { _ => true } ): Stream[File] = {
require(root != null)
def directoryEntries(f: File) = for {
direntries <- Option(f.list).toStream
d <- direntries
} yield new File(f, d)
val shouldDescend = root.isDirectory && descendCheck(root)
( root.exists, shouldDescend ) match {
case ( false, _) => Stream.Empty
case ( true, true ) => root #:: ( directoryEntries(root) flatMap { tree( _, descendCheck ) } )
case ( true, false) => Stream( root )
}
}


def treeIgnoringHiddenFilesAndDirectories( root: File ) = tree( root, { !_.isHidden } ) filter { !_.isHidden }

This gives you a Stream[File] instead of a (potentially huge and very slow) List[File] while letting you decide which sorts of directories to recurse into with the descendCheck() function.

This incantation works for me:

  def findFiles(dir: File, criterion: (File) => Boolean): Seq[File] = {
if (dir.isFile) Seq()
else {
val (files, dirs) = dir.listFiles.partition(_.isFile)
files.filter(criterion) ++ dirs.toSeq.map(findFiles(_, criterion)).foldLeft(Seq[File]())(_ ++ _)
}
}

Apache Commons Io's FileUtils fits on one line, and is quite readable:

import scala.collection.JavaConversions._ // important for 'foreach'
import org.apache.commons.io.FileUtils


FileUtils.listFiles(new File("c:\temp"), Array("foo"), true).foreach{ f =>


}

How about

   def allFiles(path:File):List[File]=
{
val parts=path.listFiles.toList.partition(_.isDirectory)
parts._2 ::: parts._1.flatMap(allFiles)
}

Why are you using Java's File instead of Scala's AbstractFile?

With Scala's AbstractFile, the iterator support allows writing a more concise version of James Moore's solution:

import scala.reflect.io.AbstractFile
def tree(root: AbstractFile, descendCheck: AbstractFile => Boolean = {_=>true}): Stream[AbstractFile] =
if (root == null || !root.exists) Stream.empty
else
(root.exists, root.isDirectory && descendCheck(root)) match {
case (false, _) => Stream.empty
case (true, true) => root #:: root.iterator.flatMap { tree(_, descendCheck) }.toStream
case (true, false) => Stream(root)
}

Scala has library 'scala.reflect.io' which considered experimental but does the work

import scala.reflect.io.Path
Path(path) walkFilter { p =>
p.isDirectory || """a*.foo""".r.findFirstIn(p.name).isDefined
}

I personally like the elegancy and simplicity of @Rex Kerr's proposed solution. But here is what a tail recursive version might look like:

def listFiles(file: File): List[File] = {
@tailrec
def listFiles(files: List[File], result: List[File]): List[File] = files match {
case Nil => result
case head :: tail if head.isDirectory =>
listFiles(Option(head.listFiles).map(_.toList ::: tail).getOrElse(tail), result)
case head :: tail if head.isFile =>
listFiles(tail, head :: result)
}
listFiles(List(file), Nil)
}

The simplest Scala-only solution (if you don't mind requiring the Scala compiler library):

val path = scala.reflect.io.Path(dir)
scala.tools.nsc.io.Path.onlyFiles(path.walk).foreach(println)

Otherwise, @Renaud's solution is short and sweet (if you don't mind pulling in Apache Commons FileUtils):

import scala.collection.JavaConversions._  // enables foreach
import org.apache.commons.io.FileUtils
FileUtils.listFiles(dir, null, true).foreach(println)

Where dir is a java.io.File:

new File("path/to/dir")

It seems nobody mentions the scala-io library from scala-incubrator...

import scalax.file.Path


Path.fromString("c:\temp") ** "a*.foo"

Or with implicit

import scalax.file.ImplicitConversions.string2path


"c:\temp" ** "a*.foo"

Or if you want implicit explicitly...

import scalax.file.Path
import scalax.file.ImplicitConversions.string2path


val dir: Path = "c:\temp"
dir ** "a*.foo"

Documentation is available here: http://jesseeichar.github.io/scala-io-doc/0.4.3/index.html#!/file/glob_based_path_sets

As of Java 1.7 you all should be using java.nio. It offers close-to-native performance (java.io is very slow) and has some useful helpers

But Java 1.8 introduces exactly what you are looking for:

import java.nio.file.{FileSystems, Files}
import scala.collection.JavaConverters._
val dir = FileSystems.getDefault.getPath("/some/path/here")


Files.walk(dir).iterator().asScala.filter(Files.isRegularFile(_)).foreach(println)

You also asked for file matching. Try java.nio.file.Files.find and also java.nio.file.Files.newDirectoryStream

See documentation here: http://docs.oracle.com/javase/tutorial/essential/io/walk.html

No-one has mentioned yet https://github.com/pathikrit/better-files

val dir = "src"/"test"
val matches: Iterator[File] = dir.glob("**/*.{java,scala}")
// above code is equivalent to:
dir.listRecursively.filter(f => f.extension ==
Some(".java") || f.extension == Some(".scala"))

You can use tail recursion for it:

object DirectoryTraversal {
import java.io._


def main(args: Array[String]) {
val dir = new File("C:/Windows")
val files = scan(dir)


val out = new PrintWriter(new File("out.txt"))


files foreach { file =>
out.println(file)
}


out.flush()
out.close()
}


def scan(file: File): List[File] = {


@scala.annotation.tailrec
def sc(acc: List[File], files: List[File]): List[File] = {
files match {
case Nil => acc
case x :: xs => {
x.isDirectory match {
case false => sc(x :: acc, xs)
case true => sc(acc, xs ::: x.listFiles.toList)
}
}
}
}


sc(List(), List(file))
}
}

os-lib is the easiest way to recursively list files in Scala.

os.walk(os.pwd/"countries").filter(os.isFile(_))

Here's how to recursively list all the files that match the "a*.foo" pattern specified in the question:

os.walk(os.pwd/"countries").filter(_.segments.toList.last matches "a.*\\.foo")

os-lib is way more elegant and powerful than other alternatives. It returns os objects that you can easily move, rename, whatever. You don't need to suffer with the clunky Java libraries anymore.

Here's a code snippet you can run if you'd like to experiment with this library on your local machine:

os.makeDir(os.pwd/"countries")
os.makeDir(os.pwd/"countries"/"colombia")
os.write(os.pwd/"countries"/"colombia"/"medellin.txt", "q mas pues")
os.write(os.pwd/"countries"/"colombia"/"a_something.foo", "soy un rolo")
os.makeDir(os.pwd/"countries"/"brasil")
os.write(os.pwd/"countries"/"brasil"/"a_whatever.foo", "carnaval")
os.write(os.pwd/"countries"/"brasil"/"a_city.txt", "carnaval")

println(os.walk(os.pwd/"countries").filter(os.isFile(_))) will return this:

ArraySeq(
/.../countries/brasil/a_whatever.foo,
/.../countries/brasil/a_city.txt,
/.../countries/colombia/a_something.foo,
/.../countries/colombia/medellin.txt)

os.walk(os.pwd/"countries").filter(_.segments.toList.last matches "a.*\\.foo") will return this:

ArraySeq(
/.../countries/brasil/a_whatever.foo,
/.../countries/colombia/a_something.foo)

See here for more details on how to use the os-lib.

The deepFiles method of scala.reflect.io.Directory provides a pretty nice way of recursively getting all the files in a directory:

import scala.reflect.io.Directory
new Directory(f).deepFiles.filter(x => x.startsWith("a") && x.endsWith(".foo"))

deepFiles returns an iterator so you can convert it some other collection type if you don't need/want lazy evaluation.

Minor improvement to the accepted answer.
By partitioning on the _.isDirectory this function returns list of files only.
(Directories are excluded)

import java.io.File
def recursiveListFiles(f: File): Array[File] = {
val (dir, files)  = f.listFiles.partition(_.isDirectory)
files ++ dir.flatMap(recursiveListFiles)
}

获取路径下所有文件,剔除文件夹

import java.io.File
import scala.collection.mutable.{ArrayBuffer, ListBuffer}


object pojo2pojo {


def main(args: Array[String]): Unit = {
val file = new File("D:\\tmp\\tmp")
val files = recursiveListFiles(file)
println(files.toList)
// List(D:\tmp\tmp\1.txt, D:\tmp\tmp\a\2.txt)
}


def recursiveListFiles(f: File):ArrayBuffer[File] = {
val all = collection.mutable.ArrayBuffer(f.listFiles:_*)
val files = all.filter(_.isFile)
val dirs = all.filter(_.isDirectory)
files ++ dirs.flatMap(recursiveListFiles)
}


}