小开

The starting point would be either of Herb Sutter's DDJ articles for either a single producer and consumer or multiple ones. The code he gives (in-line starting on the second page of each article) uses the C++0x style atomic<T> template type; which you can imitate using the Boost interprocess library.

The boost code is buried in the depths of the interprocess library, but having read through the appropriate header file (atomic.hpp) the implementations for the necessary compare-and-swap operations on the systems I am familiar with look sound.

小开

To the best of my knowledge, there is no such thing publicly available yet. One issue an implementor needs to solve is that you need a lock-free memory allocator, which exists, though I cannot seem to find the link right now.

小开

And then Intel Threading Building Blocks came. And for a time, it was good.

PS : you are looking for concurrent_queue and concurrent_hash_map

小开

After having checked most of the given answers, i can only state:

The answer is NO.

There is no such thing right that could be used right out of the box.

小开

boost.lockfree is an attempt to create c++ implementations of lockfree stack and fifo classes.

public git repository

小开

If you have a multiple-producer / single-consumer Queue/FIFO, you can easily make one LockFree using SLIST or a trivial Lock Free LIFO stack. What you do is have a second "private" stack for the consumer (which can also be done as a SLIST for simplicity or any other stack model you choose). The consumer pops items off the private stack. Whenever the private LIFO is exhasted, you do a Flush rather than Pop off the shared concurrent SLIST (grabbing the entire SLIST chain) and then walk the Flushed list in-order pushing items onto the private stack.

That works for single-producer / single-consumer and for multiple-producer / single-consumer.

However, it does not work for multiple-consumer cases (with either single-producer or multiple-producers).

Also, as far as hash tables go, they are an ideal candidate for "striping" which is just dividing the hash into segments having a lock per segments of the cache. This is how the Java concurrent library does it (using 32-stripes). If you have a light-weight reader-writer lock, the hash table can be concurrently accessed for simultaneous reads and you will only stall when write are occurring on contested stripes (and possibly if you allow for growing the hash table).

If you roll your own, make sure to interleave your locks with the hash entries rather than put all your locks in an array next to each other so you're less likely to have false sharing.

小开

The closest thing I am aware of is Windows Interlocked Singly Linked Lists. Of course, it is Windows only.

小开

There is such a library, but it's in C.

Wrapping to C++ should be straightforward.

liblfds

小开

I found another solution written in c:

http://www.ddj.com/hpc-high-performance-computing/219500200

小开

The following is from Herb Sutter's article on Concurrent lock free Queue http://www.drdobbs.com/parallel/writing-a-generalized-concurrent-queue/211601363?pgno=1 . I have made some changes like compiler reordering stuff. One needs GCC v4.4+ to compile this code.

#include <atomic>
#include <iostream>
using namespace std;


//compile with g++ setting -std=c++0x


#define CACHE_LINE_SIZE 64


template <typename T>
struct LowLockQueue {
private:
struct Node {
Node( T* val ) : value(val), next(nullptr) { }
T* value;
atomic<Node*> next;
char pad[CACHE_LINE_SIZE - sizeof(T*)- sizeof(atomic<Node*>)];
};
char pad0[CACHE_LINE_SIZE];


// for one consumer at a time
Node* first;


char pad1[CACHE_LINE_SIZE
- sizeof(Node*)];


// shared among consumers
atomic<bool> consumerLock;


char pad2[CACHE_LINE_SIZE
- sizeof(atomic<bool>)];


// for one producer at a time
Node* last;


char pad3[CACHE_LINE_SIZE
- sizeof(Node*)];


// shared among producers
atomic<bool> producerLock;


char pad4[CACHE_LINE_SIZE
- sizeof(atomic<bool>)];


public:
LowLockQueue() {
first = last = new Node( nullptr );
producerLock = consumerLock = false;
}
~LowLockQueue() {
while( first != nullptr ) {      // release the list
Node* tmp = first;
first = tmp->next;
delete tmp->value;       // no-op if null
delete tmp;
}
}


void Produce( const T& t ) {
Node* tmp = new Node( new T(t) );
asm volatile("" ::: "memory");                            // prevent compiler reordering
while( producerLock.exchange(true) )
{ }   // acquire exclusivity
last->next = tmp;         // publish to consumers
last = tmp;             // swing last forward
producerLock = false;       // release exclusivity
}


bool Consume( T& result ) {
while( consumerLock.exchange(true) )
{ }    // acquire exclusivity
Node* theFirst = first;
Node* theNext = first-> next;
if( theNext != nullptr ) {   // if queue is nonempty
T* val = theNext->value;    // take it out
asm volatile("" ::: "memory");                            // prevent compiler reordering
theNext->value = nullptr;  // of the Node
first = theNext;          // swing first forward
consumerLock = false;             // release exclusivity
result = *val;    // now copy it back
delete val;       // clean up the value
delete theFirst;      // and the old dummy
return true;      // and report success
}
consumerLock = false;   // release exclusivity
return false;                  // report queue was empty
}
};


int main(int argc, char* argv[])
{
//Instead of this Mambo Jambo one can use pthreads in Linux to test comprehensively
LowLockQueue<int> Q;
Q.Produce(2);
Q.Produce(6);


int a;
Q.Consume(a);
cout<< a << endl;
Q.Consume(a);
cout<< a << endl;


return 0;
}

小开

As of 1.53, boost provides a set of lock free data structures, including queues, stacks and single-producer/single-consumer queues (i.e. ring buffers).

小开

Facebook's Folly seems to have lock free data structures based on C++11 <atomic>:

ProducerConsumerQueue with docs and example code here.
AtomicHashMap with docs and example code here

I would dare to say these are currently used in production, so I guess they could safely be used in other projects.

Cheers!

小开

Yes!

I wrote a lock-free queue. It has Features™:

Fully wait-free (no CAS loops)
Super fast (over a hundred million enqueue/dequeue operations per second)
Uses C++11 move semantics
Grows as needed (but only if you want it to)
Does lock-free memory management for the elements (using pre-allocated contiguous blocks)
Stand-alone (two headers plus a license and readme)
Compiles under MSVC2010+, Intel ICC 13, and GCC 4.7.2 (and should work under any C++11 fully-compliant compiler)

It's available on GitHub under the simplified BSD license (feel free to fork it!).

Caveats:

Only for single-producer single-consumer architecture (i.e. two threads)
Thoroughly tested on x86(-64), and should work on ARM, PowerPC, and other CPUs where aligned native-sized integers and pointer loads and stores are naturally atomic, but has not been field tested on non-x86 CPUs (if someone has one to test it on let me know)
No idea if any patents are infringed upon (use at your own risk, etc.). Note that I did design and implement it myself from scratch.

小开

I may come a bit late on this.

The absence of solutions (at the question was asked) are mainly due to an important issue in C++ (before C++0x/11): C++ have (has) no concurrent memory model.

Now, using std::atomic, you can control memory ordering issues and have proper compare-and-swap operations. I've written myself an implementation of Micheal&Scott's lock-free queue (PODC96) using C++11 and Micheal's Hazard Pointers (IEEE TPDS 2004) to avoid early free and ABA problems. It's working fine but it's a quick and dirty implementation and I'm not satisfied with the actual performances. Code is available on bitbucket: LockFreeExperiment

It's also possible to implement lock-free queue without hazard pointers using double-words CAS (but 64bit versions will only be possible on x86-64 using cmpxchg16b), I've a blog post about that (with untested code for the queue) here: Implementing generic double-word compare and swap for x86/x86-64 (LSE blog.)

My own benchmark show me that double-locked queue (also in Micheal&Scott 1996 paper) performs as well as the lock-free one (I haven't reach enough contention so that locked data structures have performances issues, but my bench are too light for now) and the concurrent queue from Intel's TBB seems even better (two time faster) for a relatively small number (depending on the operating system, under FreeBSD 9, the lowest bound I've found so far, this number is 8 threads on an i7 with 4 ht-core, and thus 8 logical CPUs) of threads and have very strange behavior (execution time of my simple benchmark move from seconds to hours !)

Another limitations about lock-free queues following the STL style: having iterators on lock-free queue has no sens.

小开

I wrote this at some point probably back in 2010, I'm sure with help from different references. It Multi-Producer Single Consumer.

template <typename T>
class MPSCLockFreeQueue
{
private:
struct Node
{
Node( T val ) : value(val), next(NULL) { }
T value;
Node* next;
};
Node * Head;
__declspec(align(4)) Node * InsertionPoint;  //__declspec(align(4)) forces 32bit alignment this must be changed for 64bit when appropriate.


public:
MPSCLockFreeQueue()
{
InsertionPoint = new Node( T() );
Head = InsertionPoint;
}
~MPSCLockFreeQueue()
{
// release the list
T result;
while( Consume(result) )
{
//The list should be cleaned up before the destructor is called as there is no way to know whether or not to delete the value.
//So we just do our best.
}
}


void Produce( const T& t )
{
Node * node = new Node(t);
Node * oldInsertionPoint = (Node *) InterLockedxChange((volatile void **)&InsertionPoint,node);
oldInsertionPoint->next = node;
}


bool Consume( T& result )
{
if (Head->next)
{
Node * oldHead = Head;
Head = Head->next;
delete oldHead;
result = Head->value;
return true;
}
return false;               // else report empty
}


};

在 C + + 中是否有生产就绪的无锁队列或散列实现

Yes!