在使用 ConcurrentMap 的 putIfAbsent 之前,是否应该检查 map 是否包含 Key

我一直在使用 Java 的 ConcurrentMap 来实现一个可以从多个线程使用的映射。PutIfAbsent 是一个很好的方法,并且比使用标准的 map 操作更容易读/写。我有一些这样的代码:

ConcurrentMap<String, Set<X>> map = new ConcurrentHashMap<String, Set<X>>();


// ...


map.putIfAbsent(name, new HashSet<X>());
map.get(name).add(Y);

在可读性方面,这非常好,但是它确实需要每次都创建一个新的 HashSet,即使它已经在地图中了。我可以这样写:

if (!map.containsKey(name)) {
map.putIfAbsent(name, new HashSet<X>());
}
map.get(name).add(Y);

通过这个更改,它失去了一些可读性,但不需要每次都创建 HashSet。在这种情况下,哪个更好?我倾向于支持第一个,因为它更易读。第二种方法会表现得更好,可能也更正确。也许有比这两个更好的办法。

以这种方式使用 putIfAbsent 的最佳实践是什么?

46735 次浏览

Concurrency is hard. If you are going to bother with concurrent maps instead of straightforward locking, you might as well go for it. Indeed, don't do lookups more than necessary.

Set<X> set = map.get(name);
if (set == null) {
final Set<X> value = new HashSet<X>();
set = map.putIfAbsent(name, value);
if (set == null) {
set = value;
}
}

(Usual stackoverflow disclaimer: Off the top of my head. Not tested. Not compiled. Etc.)

Update: 1.8 has added computeIfAbsent default method to ConcurrentMap (and Map which is kind of interesting because that implementation would be wrong for ConcurrentMap). (And 1.7 added the "diamond operator" <>.)

Set<X> set = map.computeIfAbsent(name, n -> new HashSet<>());

(Note, you are responsible for the thread-safety of any operations of the HashSets contained in the ConcurrentMap.)

Tom's answer is correct as far as API usage goes for ConcurrentMap. An alternative that avoids using putIfAbsent is to use the computing map from the GoogleCollections/Guava MapMaker which auto-populates the values with a supplied function and handles all the thread-safety for you. It actually only creates one value per key and if the create function is expensive, other threads asking getting the same key will block until the value becomes available.

Edit from Guava 11, MapMaker is deprecated and being replaced with the Cache/LocalCache/CacheBuilder stuff. This is a little more complicated in its usage but basically isomorphic.

By keeping a pre-initialized value for each thread you can improve on the accepted answer:

Set<X> initial = new HashSet<X>();
...
Set<X> set = map.putIfAbsent(name, initial);
if (set == null) {
set = initial;
initial = new HashSet<X>();
}
set.add(Y);

I recently used this with AtomicInteger map values rather than Set.

My generic approximation:

public class ConcurrentHashMapWithInit<K, V> extends ConcurrentHashMap<K, V> {
private static final long serialVersionUID = 42L;


public V initIfAbsent(final K key) {
V value = get(key);
if (value == null) {
value = initialValue();
final V x = putIfAbsent(key, value);
value = (x != null) ? x : value;
}
return value;
}


protected V initialValue() {
return null;
}
}

And as example of use:

public static void main(final String[] args) throws Throwable {
ConcurrentHashMapWithInit<String, HashSet<String>> map =
new ConcurrentHashMapWithInit<String, HashSet<String>>() {
private static final long serialVersionUID = 42L;


@Override
protected HashSet<String> initialValue() {
return new HashSet<String>();
}
};
map.initIfAbsent("s1").add("chao");
map.initIfAbsent("s2").add("bye");
System.out.println(map.toString());
}

You can use MutableMap.getIfAbsentPut(K, Function0<? extends V>) from Eclipse Collections (formerly GS Collections).

The advantage over calling get(), doing a null check, and then calling putIfAbsent() is that we'll only compute the key's hashCode once, and find the right spot in the hashtable once. In ConcurrentMaps like org.eclipse.collections.impl.map.mutable.ConcurrentHashMap, the implementation of getIfAbsentPut() is also thread-safe and atomic.

import org.eclipse.collections.impl.map.mutable.ConcurrentHashMap;
...
ConcurrentHashMap<String, MyObject> map = new ConcurrentHashMap<>();
map.getIfAbsentPut("key", () -> someExpensiveComputation());

The implementation of org.eclipse.collections.impl.map.mutable.ConcurrentHashMap is truly non-blocking. While every effort is made not to call the factory function unnecessarily, there's still a chance it will be called more than once during contention.

This fact sets it apart from Java 8's ConcurrentHashMap.computeIfAbsent(K, Function<? super K,? extends V>). The Javadoc for this method states:

The entire method invocation is performed atomically, so the function is applied at most once per key. Some attempted update operations on this map by other threads may be blocked while computation is in progress, so the computation should be short and simple...

Note: I am a committer for Eclipse Collections.

In 5+ years, I can't believe no one has mentioned or posted a solution that uses ThreadLocal to solve this problem; and several of the solutions on this page are not threadsafe and are just sloppy.

Using ThreadLocals for this specific problem isn't only considered best practices for concurrency, but for minimizing garbage/object creation during thread contention. Also, it's incredibly clean code.

For example:

private final ThreadLocal<HashSet<X>>
threadCache = new ThreadLocal<HashSet<X>>() {
@Override
protected
HashSet<X> initialValue() {
return new HashSet<X>();
}
};




private final ConcurrentMap<String, Set<X>>
map = new ConcurrentHashMap<String, Set<X>>();

And the actual logic...

// minimize object creation during thread contention
final Set<X> cached = threadCache.get();


Set<X> data = map.putIfAbsent("foo", cached);
if (data == null) {
// reset the cached value in the ThreadLocal
listCache.set(new HashSet<X>());
data = cached;
}


// make sure that the access to the set is thread safe
synchronized(data) {
data.add(object);
}