After reading The JSR-133 Cookbook for Compiler Writers about the implementation of volatile, especially section "Interactions with Atomic Instructions" I assume that reading a volatile variable without updating it needs a LoadLoad or a LoadStore barrier. Further down the page I see that LoadLoad and LoadStore are effectively no-ops on X86 CPUs. Does this mean that volatile read operations can be done without a explicit cache invalidation on x86, and is as fast as a normal variable read (disregarding the reordering constraints of volatile)?
I believe I don't understand this correctly. Could someone care to enlighten me?
EDIT: I wonder if there are differences in multi-processor environments. On single CPU systems the CPU might look at it's own thread caches, as John V. states, but on multi CPU systems there must be some config option to the CPUs that this is not enough and main memory has to be hit, making volatile slower on multi cpu systems, right?
PS: On my way to learn more about this I stumbled about the following great articles, and since this question may be interesting to others, I'll share my links here: