为什么跨线程变异共享变量的代码显然没有受到竞态条件的影响?

我正在使用 Cygwin GCC 并运行以下代码:

#include <iostream>
#include <thread>
#include <vector>
using namespace std;


unsigned u = 0;


void foo()
{
u++;
}


int main()
{
vector<thread> threads;
for(int i = 0; i < 1000; i++) {
threads.push_back (thread (foo));
}
for (auto& t : threads) t.join();


cout << u << endl;
return 0;
}

用以下代码行编译: g++ -Wall -fexceptions -g -std=c++14 -c main.cpp -o main.o

它可以打印1000个,这是正确的。但是,由于线程覆盖了以前增加的值,所以我预计数量会少一些。为什么这个代码没有受到相互访问的影响?

我的测试机有4个核心,而且我对我所知道的程序没有任何限制。

当用更复杂的内容替换共享 foo的内容时,问题仍然存在,例如。

if (u % 3 == 0) {
u += 4;
} else {
u -= 1;
}
13339 次浏览

foo() is so short that each thread probably finishes before the next one even gets spawned. If you add a sleep for a random time in foo() before the u++, you may start seeing what you expect.

It is important to understand a race condition does not guarantee the code will run incorrectly, merely that it could do anything, as it is an undefined behavior. Including running as expected.

Particularly on X86 and AMD64 machines race conditions in some cases rarely cause issues as many of the instructions are atomic and the coherency guarantees are very high. These guarantees are somewhat reduced on multi processor systems where the lock prefix is needed for many instructions to be atomic.

If on your machine increment is an atomic op, this will likely run correctly even though according to the language standard it is Undefined Behavior.

Specifically I expect in this case the code may be being compiled to an atomic Fetch and Add instruction (ADD or XADD in X86 assembly) which is indeed atomic in single processor systems, however on multiprocessor systems this is not guaranteed to be atomic and a lock would be required to make it so. If you are running on a multiprocessor system there will be a window where threads could interfere and produce incorrect results.

Specifically I compiled your code to assembly using https://godbolt.org/ and foo() compiles to:

foo():
add     DWORD PTR u[rip], 1
ret

This means it is solely performing an add instruction which for a single processor will be atomic (though as mentioned above not so for a multi processor system).

It does suffer from a race condition. Put usleep(1000); before u++; in foo and I see different output (< 1000) each time.

I think it is not so much the thing if you put a sleep before or after the u++. It is rather that operation u++ translates to code that is - compared to the overhead of spawning threads that call foo - very quickly performed such that it is unlikely to get intercepted. However, if you "prolong" the operation u++, then the race condition will become much more likely:

void foo()
{
unsigned i = u;
for (int s=0;s<10000;s++);
u = i+1;
}

result: 694


BTW: I also tried

if (u % 2) {
u += 2;
} else {
u -= 1;
}

and it gave me most times 1997, but sometimes 1995.

  1. The likely answer to why the race condition didn't manifest for you, though it does exist, is that foo() is so fast, compared to the time it takes to start a thread, that each thread finishes before the next can even start. But...

  2. Even with your original version, the result varies by system: I tried it your way on a (quad-core) Macbook, and in ten runs, I got 1000 three times, 999 six times, and 998 once. So the race is somewhat rare, but clearly present.

  3. You compiled with '-g', which has a way of making bugs disappear. I recompiled your code, still unchanged but without the '-g', and the race became much more pronounced: I got 1000 once, 999 three times, 998 twice, 997 twice, 996 once, and 992 once.

  4. Re. the suggestion of adding a sleep -- that helps, but (a) a fixed sleep time leaves the threads still skewed by start time (subject to timer resolution), and (b) a random sleep spreads them out when what we want is to pull them closer together. Instead, I'd code them to wait for a start signal, so I can create them all before letting them get to work. With this version (with or without '-g'), I get results all over place, as low as 974, and no higher than 998:

    #include <iostream>
    #include <thread>
    #include <vector>
    using namespace std;
    
    
    unsigned u = 0;
    bool start = false;
    
    
    void foo()
    {
    while (!start) {
    std::this_thread::yield();
    }
    u++;
    }
    
    
    int main()
    {
    vector<thread> threads;
    for(int i = 0; i < 1000; i++) {
    threads.push_back (thread (foo));
    }
    start = true;
    for (auto& t : threads) t.join();
    
    
    cout << u << endl;
    return 0;
    }