字节数组转换为字符串(Java)

我正在用谷歌应用程序引擎写一个网络应用程序。它基本上允许人们编辑 html 代码,这些代码以 .html文件的形式存储在 blobstore 中。

我使用 fetchData 返回文件中所有字符的 byte[]。我试图打印到一个 html,以便用户编辑 html 代码。一切都很顺利!

我现在唯一的问题是:

字节数组在转换回字符串时遇到了一些问题。聪明的引用和几个角色看起来很时髦。(?或者日文符号等)具体来说,我看到的有负值的几个字节导致了这个问题。

在字节数组中,智能引号返回为 -108-109。为什么会这样? 我如何解码负字节以显示正确的字符编码?

225436 次浏览

The byte array contains characters in a special encoding (that you should know). The way to convert it to a String is:

String decoded = new String(bytes, "UTF-8");  // example for one encoding type

By The Way - the raw bytes appear may appear as negative decimals just because the java datatype byte is signed, it covers the range from -128 to 127.


-109 = 0x93: Control Code "Set Transmit State"

The value (-109) is a non-printable control character in UNICODE. So UTF-8 is not the correct encoding for that character stream.

0x93 in "Windows-1252" is the "smart quote" that you're looking for, so the Java name of that encoding is "Cp1252". The next line provides a test code:

System.out.println(new String(new byte[]{-109}, "Cp1252"));

The previous answer from Andreas_D is good. I'm just going to add that wherever you are displaying the output there will be a font and a character encoding and it may not support some characters.

To work out whether it is Java or your display that is a problem, do this:

    for(int i=0;i<str.length();i++) {
char ch = str.charAt(i);
System.out.println(i+" : "+ch+" "+Integer.toHexString(ch)+((ch=='\ufffd') ? " Unknown character" : ""));
}

Java will have mapped any characters it cannot understand to 0xfffd the official character for unknown characters. If you see a '?' in the output, but it is not mapped to 0xfffd, it is your display font or encoding that is the problem, not Java.

public class Main {


/**
* Example method for converting a byte to a String.
*/
public void convertByteToString() {


byte b = 65;


//Using the static toString method of the Byte class
System.out.println(Byte.toString(b));


//Using simple concatenation with an empty String
System.out.println(b + "");


//Creating a byte array and passing it to the String constructor
System.out.println(new String(new byte[] {b}));


}


/**
* @param args the command line arguments
*/
public static void main(String[] args) {
new Main().convertByteToString();
}
}

Output

65
65
A

You can try this.

String s = new String(bytearray);
public static String readFile(String fn)   throws IOException
{
File f = new File(fn);


byte[] buffer = new byte[(int)f.length()];
FileInputStream is = new FileInputStream(fn);
is.read(buffer);
is.close();


return  new String(buffer, "UTF-8"); // use desired encoding
}

Java 7 and above

You can also pass your desired encoding to the String constructor as a Charset constant from StandardCharsets. This may be safer than passing the encoding as a String, as suggested in the other answers.

For example, for UTF-8 encoding

String bytesAsString = new String(bytes, StandardCharsets.UTF_8);

I suggest Arrays.toString(byte_array);

It depends on your purpose. For example, I wanted to save a byte array exactly like the format you can see at time of debug that is something like this : [1, 2, 3] If you want to save exactly same value without converting the bytes to character format, Arrays.toString (byte_array) does this,. But if you want to save characters instead of bytes, you should use String s = new String(byte_array). In this case, s is equal to equivalent of [1, 2, 3] in format of character.