逐行读取字符串

给定一个不太长的字符串,逐行读取它的最佳方法是什么?

我知道你能做到:

BufferedReader reader = new BufferedReader(new StringReader(<string>));
reader.readLine();

另一种方法是使用 eol 上的子字符串:

final String eol = System.getProperty("line.separator");
output = output.substring(output.indexOf(eol + 1));

还有其他更简单的方法吗?我对上述方法没有任何问题,只是想知道你们中是否有人知道一些看起来更简单和更有效的方法?

356642 次浏览

还有 Scanner,你可以像使用 BufferedReader一样使用它:

Scanner scanner = new Scanner(myString);
while (scanner.hasNextLine()) {
String line = scanner.nextLine();
// process the line
}
scanner.close();

我认为这是一个比上述两个建议更简洁的方法。

你也可以使用字符串的 split方法:

String[] lines = myString.split(System.getProperty("line.separator"));

这将为您提供方便的数组中的所有行。

我不知道 Split 的性能如何,它使用正则表达式。

使用 Apache Commons IOUtils,您可以通过

List<String> lines = IOUtils.readLines(new StringReader(string));

它没有做任何聪明的事情,但它很好,紧凑。它还可以处理流,如果您愿意,还可以获得 LineIterator

你亦可使用:

String[] lines = someString.split("\n");

如果这不起作用,试着用 \r\n代替 \n

由于我对效率角度特别感兴趣,我创建了一个小测试类(如下):

Comparing line breaking performance of different solutions
Testing 5000000 lines
Split (all): 14665 ms
Split (CR only): 3752 ms
Scanner: 10005
Reader: 2060

和往常一样,精确的时间可能会有所不同,但是不管我多么频繁地运行它,这个比率都是正确的。

结论: OP 的“更简单”和“更高效”的需求不能同时满足,split解决方案(在任何一种情况下)更简单,但是 Reader的实现击败了其他解决方案。

import java.io.BufferedReader;
import java.io.IOException;
import java.io.StringReader;
import java.util.ArrayList;
import java.util.List;
import java.util.Scanner;


/**
* Test class for splitting a string into lines at linebreaks
*/
public class LineBreakTest {
/** Main method: pass in desired line count as first parameter (default = 10000). */
public static void main(String[] args) {
int lineCount = args.length == 0 ? 10000 : Integer.parseInt(args[0]);
System.out.println("Comparing line breaking performance of different solutions");
System.out.printf("Testing %d lines%n", lineCount);
String text = createText(lineCount);
testSplitAllPlatforms(text);
testSplitWindowsOnly(text);
testScanner(text);
testReader(text);
}


private static void testSplitAllPlatforms(String text) {
long start = System.currentTimeMillis();
text.split("\n\r|\r");
System.out.printf("Split (regexp): %d%n", System.currentTimeMillis() - start);
}


private static void testSplitWindowsOnly(String text) {
long start = System.currentTimeMillis();
text.split("\n");
System.out.printf("Split (CR only): %d%n", System.currentTimeMillis() - start);
}


private static void testScanner(String text) {
long start = System.currentTimeMillis();
List<String> result = new ArrayList<>();
try (Scanner scanner = new Scanner(text)) {
while (scanner.hasNextLine()) {
result.add(scanner.nextLine());
}
}
System.out.printf("Scanner: %d%n", System.currentTimeMillis() - start);
}


private static void testReader(String text) {
long start = System.currentTimeMillis();
List<String> result = new ArrayList<>();
try (BufferedReader reader = new BufferedReader(new StringReader(text))) {
String line = reader.readLine();
while (line != null) {
result.add(line);
line = reader.readLine();
}
} catch (IOException exc) {
// quit
}
System.out.printf("Reader: %d%n", System.currentTimeMillis() - start);
}


private static String createText(int lineCount) {
StringBuilder result = new StringBuilder();
StringBuilder lineBuilder = new StringBuilder();
for (int i = 0; i < 20; i++) {
lineBuilder.append("word ");
}
String line = lineBuilder.toString();
for (int i = 0; i < lineCount; i++) {
result.append(line);
result.append("\n");
}
return result.toString();
}
}

从 Java11开始,有了一个新的方法 String.lines:

/**
* Returns a stream of lines extracted from this string,
* separated by line terminators.
* ...
*/
public Stream<String> lines() { ... }

用法:

"line1\nline2\nlines3"
.lines()
.forEach(System.out::println);

或者将 new try with resources 子句与 Scanner 结合使用:

   try (Scanner scanner = new Scanner(value)) {
while (scanner.hasNextLine()) {
String line = scanner.nextLine();
// process the line
}
}

您可以使用包装在 BufferedReader 中的流 api 和 StringReader,它在 java 8中获得了 lines ()流输出:

import java.util.stream.*;
import java.io.*;
class test {
public static void main(String... a) {
String s = "this is a \nmultiline\rstring\r\nusing different newline styles";


new BufferedReader(new StringReader(s)).lines().forEach(
(line) -> System.out.println("one line of the string: " + line)
);
}
}

给予

one line of the string: this is a
one line of the string: multiline
one line of the string: string
one line of the string: using different newline styles

与 BufferedReader 的 readLine 一样,不包括换行符本身。支持所有类型的换行分隔符(甚至在同一个字符串中)。

解决方案使用 Java 8功能,如 Stream APIMethod references

new BufferedReader(new StringReader(myString))
.lines().forEach(System.out::println);

或者

public void someMethod(String myLongString) {


new BufferedReader(new StringReader(myLongString))
.lines().forEach(this::parseString);
}


private void parseString(String data) {
//do something
}

您可以尝试下面的正则表达式:

\r?\n

密码:

String input = "\nab\n\n    \n\ncd\nef\n\n\n\n\n";
String[] lines = input.split("\\r?\\n", -1);
int n = 1;
for(String line : lines) {
System.out.printf("\tLine %02d \"%s\"%n", n++, line);
}

产出:

Line 01 ""
Line 02 "ab"
Line 03 ""
Line 04 "    "
Line 05 ""
Line 06 "cd"
Line 07 "ef"
Line 08 ""
Line 09 ""
Line 10 ""
Line 11 ""
Line 12 ""

最简单和最通用的方法是只使用匹配 Any Unicode linebreak sequence的正则表达式 Linebreak matcher \R:

Pattern NEWLINE = Pattern.compile("\\R")
String lines[] = NEWLINE.split(input)

@ see https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/regex/Pattern.html