Java: 字符串与 ByteBuffer 之间的转换及相关问题

我使用 Java NIO 进行套接字连接,我的协议是基于文本的,所以我需要能够在将 String 写入 SocketChannel 之前将它们转换为 ByteBuffers,并将传入的 ByteBuffers 转换回 String。目前,我正在使用以下代码:

public static Charset charset = Charset.forName("UTF-8");
public static CharsetEncoder encoder = charset.newEncoder();
public static CharsetDecoder decoder = charset.newDecoder();


public static ByteBuffer str_to_bb(String msg){
try{
return encoder.encode(CharBuffer.wrap(msg));
}catch(Exception e){e.printStackTrace();}
return null;
}


public static String bb_to_str(ByteBuffer buffer){
String data = "";
try{
int old_position = buffer.position();
data = decoder.decode(buffer).toString();
// reset buffer's position to its original so it is not altered:
buffer.position(old_position);
}catch (Exception e){
e.printStackTrace();
return "";
}
return data;
}

这在大多数情况下是可行的,但我怀疑这是否是实现这种转换的各个方向的首选(或最简单)方法,或者是否还有其他尝试的方法。有时,对 encode()decode()的调用会抛出一个 java.lang.IllegalStateException: Current state = FLUSHED, new state = CODING_END异常或类似的异常,即使每次完成转换时都使用一个新的 ByteBuffer 对象。我需要同步这些方法吗?有没有更好的方法在 String 和 ByteBuffers 之间进行转换?谢谢!

161239 次浏览

Check out the CharsetEncoder and CharsetDecoder API descriptions - You should follow a specific sequence of method calls to avoid this problem. For example, for CharsetEncoder:

  1. Reset the encoder via the reset method, unless it has not been used before;
  2. Invoke the encode method zero or more times, as long as additional input may be available, passing false for the endOfInput argument and filling the input buffer and flushing the output buffer between invocations;
  3. Invoke the encode method one final time, passing true for the endOfInput argument; and then
  4. Invoke the flush method so that the encoder can flush any internal state to the output buffer.

By the way, this is the same approach I am using for NIO although some of my colleagues are converting each char directly to a byte in the knowledge they are only using ASCII, which I can imagine is probably faster.

Answer by Adamski is a good one and describes the steps in an encoding operation when using the general encode method (that takes a byte buffer as one of the inputs)

However, the method in question (in this discussion) is a variant of encode - encode(CharBuffer in). This is a convenience method that implements the entire encoding operation. (Please see java docs reference in P.S.)

As per the docs, This method should therefore not be invoked if an encoding operation is already in progress (which is what is happening in ZenBlender's code -- using static encoder/decoder in a multi threaded environment).

Personally, I like to use convenience methods (over the more general encode/decode methods) as they take away the burden by performing all the steps under the covers.

ZenBlender and Adamski have already suggested multiple ways options to safely do this in their comments. Listing them all here:

  • Create a new encoder/decoder object when needed for each operation (not efficient as it could lead to a large number of objects). OR,
  • Use a ThreadLocal to avoid creating new encoder/decoder for each operation. OR,
  • Synchronize the entire encoding/decoding operation (this might not be preferred unless sacrificing some concurrency is ok for your program)

P.S.

java docs references:

  1. Encode (convenience) method: http://docs.oracle.com/javase/6/docs/api/java/nio/charset/CharsetEncoder.html#encode%28java.nio.CharBuffer%29
  2. General encode method: http://docs.oracle.com/javase/6/docs/api/java/nio/charset/CharsetEncoder.html#encode%28java.nio.CharBuffer,%20java.nio.ByteBuffer,%20boolean%29

Unless things have changed, you're better off with

public static ByteBuffer str_to_bb(String msg, Charset charset){
return ByteBuffer.wrap(msg.getBytes(charset));
}


public static String bb_to_str(ByteBuffer buffer, Charset charset){
byte[] bytes;
if(buffer.hasArray()) {
bytes = buffer.array();
} else {
bytes = new byte[buffer.remaining()];
buffer.get(bytes);
}
return new String(bytes, charset);
}

Usually buffer.hasArray() will be either always true or always false depending on your use case. In practice, unless you really want it to work under any circumstances, it's safe to optimize away the branch you don't need.