将 InputStream 读取为 UTF-8

我正在尝试通过互联网一行一行地读取 text/plain文件,我现在的代码是:

URL url = new URL("http://kuehldesign.net/test.txt");
BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));
LinkedList<String> lines = new LinkedList();
String readLine;


while ((readLine = in.readLine()) != null) {
lines.add(readLine);
}


for (String line : lines) {
out.println("> " + line);
}

文件 test.txt包含 ¡Hélló!,我使用它来测试编码。

当我回顾 OutputStream(out)时,我把它看作 > ¬°H√©ll√≥!。我不相信这是一个与 OutputStream的问题,因为我可以做 out.println("é");没有问题。

读取 InputStream的 UTF-8有什么想法吗? 谢谢!

244192 次浏览

Solved my own problem. This line:

BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));

needs to be:

BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream(), "UTF-8"));

or since Java 7:

BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream(), StandardCharsets.UTF_8));
String file = "";


try {


InputStream is = new FileInputStream(filename);
String UTF8 = "utf8";
int BUFFER_SIZE = 8192;


BufferedReader br = new BufferedReader(new InputStreamReader(is,
UTF8), BUFFER_SIZE);
String str;
while ((str = br.readLine()) != null) {
file += str;
}
} catch (Exception e) {


}

Try this,.. :-)

I ran into the same problem every time it finds a special character marks it as ��. to solve this, I tried using the encoding: ISO-8859-1

BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream("txtPath"),"ISO-8859-1"));


while ((line = br.readLine()) != null) {


}

I hope this can help anyone who sees this post.

If you use the constructor InputStreamReader(InputStream in, Charset cs), bad characters are silently replaced. To change this behaviour, use a CharsetDecoder :

public static Reader newReader(Inputstream is) {
new InputStreamReader(is,
StandardCharsets.UTF_8.newDecoder()
.onMalformedInput(CodingErrorAction.REPORT)
.onUnmappableCharacter(CodingErrorAction.REPORT)
);
}

Then catch java.nio.charset.CharacterCodingException.