Do you see any problem with using a byte array as Map key? I could also do new String(byte[]) and hash by String but it is more straightforward to use byte[].
I believe that arrays in Java do not necessarily implement the hashCode() and equals(Object) methods intuitively. That is, two identical byte arrays will not necessarily share the same hash code and they will not necessarily claim to be equal. Without these two traits, your HashMap will behave unexpectedly.
Therefore, I recommend against using byte[] as keys in a HashMap.
It's okay so long as you only want reference equality for your key - arrays don't implement "value equality" in the way that you'd probably want. For example:
byte[] array1 = new byte[1];
byte[] array2 = new byte[1];
System.out.println(array1.equals(array2));
System.out.println(array1.hashCode());
System.out.println(array2.hashCode());
prints something like:
false
1671711
11394033
(The actual numbers are irrelevant; the fact that they're different is important.)
Assuming you actually want equality, I suggest you create your own wrapper which contains a byte[] and implements equality and hash code generation appropriately:
public final class ByteArrayWrapper
{
private final byte[] data;
public ByteArrayWrapper(byte[] data)
{
if (data == null)
{
throw new NullPointerException();
}
this.data = data;
}
@Override
public boolean equals(Object other)
{
if (!(other instanceof ByteArrayWrapper))
{
return false;
}
return Arrays.equals(data, ((ByteArrayWrapper)other).data);
}
@Override
public int hashCode()
{
return Arrays.hashCode(data);
}
}
Note that if you change the values within the byte array after using the ByteArrayWrapper, as a key in a HashMap (etc) you'll have problems looking up the key again... you could take a copy of the data in the ByteArrayWrapper constructor if you want, but obviously that will be a waste of performance if you know you won't be changing the contents of the byte array.
EDIT: As mentioned in the comments, you could also use ByteBuffer for this (in particular, its ByteBuffer#wrap(byte[]) method). I don't know whether it's really the right thing, given all the extra abilities that ByteBuffers have which you don't need, but it's an option.
The problem is that byte[] uses object identity for equals and hashCode, so that
byte[] b1 = {1, 2, 3}
byte[] b2 = {1, 2, 3}
will not match in a HashMap. I see three options:
Wrapping in a String, but then you have to be careful about encoding issues (you need to make certain that the byte -> String -> byte gives you the same bytes).
Use List<Byte> (can be expensive in memory).
Do your own wrapping class, writing hashCode and equals to use the contents of the byte array.
You could use java.math.BigInteger. It has a BigInteger(byte[] val) constructor. It's a reference type, so could be used as a key for hashtable. And .equals() and .hashCode() are defined as for respective integer numbers, which means BigInteger has consistent equals semantics as byte[] array.
You should use create a class somthing like ByteArrKey and overload hashcode and equal methods, remember the contract between them.
This will give you greater flexibility as you can skip 0 entries that are appended at the end of byte array, specially if you copy only some part form the other byte buffer.
This way you will decide how both objects SHOULD be equal.
I am very surprised that the answers are not pointing out the most simple alternative.
Yes, it is not possible to use a HashMap, but nobody prevents you from using a SortedMap as alternative. The only thing is to write a Comparator which needs to compare the arrays. It is not as performant as a HashMap, but if you want a simple alternative, here you go (you can replace SortedMap with Map if you want to hide the implementation):
private SortedMap<int[], String> testMap = new TreeMap<>(new ArrayComparator());
private class ArrayComparator implements Comparator<int[]> {
@Override
public int compare(int[] o1, int[] o2) {
int result = 0;
int maxLength = Math.max(o1.length, o2.length);
for (int index = 0; index < maxLength; index++) {
int o1Value = index < o1.length ? o1[index] : 0;
int o2Value = index < o2.length ? o2[index] : 0;
int cmp = Integer.compare(o1Value, o2Value);
if (cmp != 0) {
result = cmp;
break;
}
}
return result;
}
}
This implementation can be adjusted for other arrays, the only thing you must be aware of is that equal arrays (= equal length with equal members) must return 0 and that you have a determistic order
Other answers have not pointed out that not all byte[] covert into unique String. I fell into this trap doing new String(byteArray) as keys to a map only to find that many negative bytes are mapped to the same string. Here is a test that demonstrates that problem:
@Test
public void testByteAsStringMap() throws Exception {
HashMap<String, byte[]> kvs = new HashMap<>();
IntStream.range(Byte.MIN_VALUE, Byte.MAX_VALUE).forEach(b->{
byte[] key = {(byte)b};
byte[] value = {(byte)b};
kvs.put(new String(key), value);
});
Assert.assertEquals(255, kvs.size());
}
It will throw:
java.lang.AssertionError: Expected :255 Actual :128
It does that because a String is a sequence of character code points and any conversion from a byte[] is based on some byte encoding. In the above case, the platform default encoding happens to map many negative bytes to the same character. Another fact about String is that it always takes and gives a copy of its internal state. If the original bytes came from a String that was a copy, then wrapping it as a String to use it as the key to a map takes a second copy. That may generate a lot of garbage that might be avoidable.
There is a good answer here that suggests using java.nio.ByteBuffer with ByteBuffer.wrap(b). The problem with that is that byte[] is mutable and it doesn't take a copy so you must be careful to take a defensive copy of any the arrays passed to you with ByteBuffer.wrap(b.clone()) else the keys of your map will be corrupted. If you look at the result of a map with ByteBuffer keys in a debugger you will see that the buffers have a lot of internal references designed to track reading to and writing from each buffer. So the objects are much more heavyweight than wrapping in a simple String. Finally, even a string holds more state than needed. Looking at it in my debugger it stores characters as a two-byte UTF16 array and also stores a four-byte hashcode.
My preferred approach is to have Lombok generate at compile time the boilerplate to make a lightweight byte array wrapper that doesn't store additional state:
import lombok.Data;
import lombok.EqualsAndHashCode;
import lombok.ToString;
@ToString
@EqualsAndHashCode
@Data(staticConstructor="of")
class ByteSequence {
final byte[] bytes;
}
This then passes the test that checks that all possible bytes map to a unique string:
You then don't have to worry about getting the equals and hashcode logic correct as it is supplied by Lombok where it does Arrays.deepEquals which is documented at https://projectlombok.org/features/EqualsAndHashCode Note that lombok isn't a runtime dependency only a compile-time dependency and you can install an opensource plugin to your IDE so that your IDE "sees" all the generated boilerplate methods.
With this implementation, you still have to worry about above the mutability of the byte. If someone passes you a byte[] that might be mutated, you should take a defensive copy using clone():