是“易变性”的定义如此不稳定,还是海湾合作委员会有一些标准的遵从性问题?

我需要一个函数(比如 WinAPI 中的 SecureZeroMemory)总是清零内存并且不会被优化掉,即使编译器认为在此之后内存再也不会被访问。看起来是个不稳定的完美候选人。但我在与海湾合作委员会合作时遇到了一些问题。下面是一个示例函数:

void volatileZeroMemory(volatile void* ptr, unsigned long long size)
{
volatile unsigned char* bytePtr = (volatile unsigned char*)ptr;


while (size--)
{
*bytePtr++ = 0;
}
}

很简单。但是 GCC 实际上生成的代码随着编译器版本和实际尝试为零的字节数而变化很大。https://godbolt.org/g/cMaQm2

  • 海湾合作委员会4.4.7和4.5.3从不忽视挥发性。
  • GCC 4.6.4和4.7.3忽略数组大小1、2和4的易失性。
  • GCC 4.8.1 until 4.9.2忽略数组大小1和2的易失性。
  • GCC 5.1直到5.3忽略数组大小1,2,4,8的易失性。
  • GCC 6.1只是忽略了任何数组大小(一致性的额外加分)。

我测试过的任何其他编译器(clang,icc,vc)都会生成预期的存储,包括任何编译器版本和任何数组大小。所以在这一点上,我想知道,这是一个(相当老和严重?)GCC 编译器错误,或者是标准中易失性的定义不精确,这实际上是一致的行为,使得基本上不可能编写一个可移植的“ SecureZeroMemory”函数?

编辑: 一些有趣的观察。

#include <cstddef>
#include <cstdint>
#include <cstring>
#include <atomic>


void callMeMaybe(char* buf);


void volatileZeroMemory(volatile void* ptr, std::size_t size)
{
for (auto bytePtr = static_cast<volatile std::uint8_t*>(ptr); size-- > 0; )
{
*bytePtr++ = 0;
}


//std::atomic_thread_fence(std::memory_order_release);
}


std::size_t foo()
{
char arr[8];
callMeMaybe(arr);
volatileZeroMemory(arr, sizeof arr);
return sizeof arr;
}

来自 callMemaybe ()的可能写操作将使除6.1之外的所有 GCC 版本生成预期的存储。 在内存栅栏中注释也将使 GCC 6.1生成存储,尽管只能与 callMemaybe ()中可能的 write 结合使用。

还有人建议冲洗这些缓存。无论如何,缓存很可能很快就会失效,所以这可能没什么大不了的。此外,如果另一个程序试图探测数据,或者如果它将被写入页面文件,它将始终是零版本。

在独立函数中使用 memset ()的 GCC 6.1也有一些问题。Godbolt 上的 GCC 6.1编译器可能是一个破碎的构建,因为 GCC 6.1似乎为某些人的独立函数生成了一个正常的循环(就像 Godbolt 上的5.3一样)。(阅读 Zwol 回答的评论。)

5727 次浏览

I need a function that (like SecureZeroMemory from the WinAPI) always zeros memory and doesn't get optimized away,

This is what the standard function memset_s is for.


As to whether this behavior with volatile is conforming or not, that's a bit hard to say, and volatile has been said to have long been plagued with bugs.

One issue is that the specs say that "Accesses to volatile objects are evaluated strictly according to the rules of the abstract machine." But that only refers to 'volatile objects', not accessing a non-volatile object via a pointer that has had volatile added. So apparently if a compiler can tell that you're not really accessing a volatile object then it's not required to treat the object as volatile after all.

GCC's behavior may be conforming, and even if it isn't, you should not rely on volatile to do what you want in cases like these. The C committee designed volatile for memory-mapped hardware registers and for variables modified during abnormal control flow (e.g. signal handlers and setjmp). Those are the only things it is reliable for. It is not safe to use as a general "don't optimize this out" annotation.

In particular, the standard is unclear on a key point. (I've converted your code to C; there shouldn't be any divergence between C and C++ here. I've also manually done the inlining that would happen before the questionable optimization, to show what the compiler "sees" at that point.)

extern void use_arr(void *, size_t);
void foo(void)
{
char arr[8];
use_arr(arr, sizeof arr);


for (volatile char *p = (volatile char *)arr;
p < (volatile char *)(arr + 8);
p++)
*p = 0;
}

The memory-clearing loop accesses arr through a volatile-qualified lvalue, but arr itself is not declared volatile. It is, therefore, at least arguably allowed for the C compiler to infer that the stores made by the loop are "dead", and delete the loop altogether. There's text in the C Rationale that implies that the committee meant to require those stores to be preserved, but the standard itself does not actually make that requirement, as I read it.

For more discussion of what the standard does or does not require, see Why is a volatile local variable optimised differently from a volatile argument, and why does the optimiser generate a no-op loop from the latter?, Does accessing a declared non-volatile object through a volatile reference/pointer confer volatile rules upon said accesses?, and GCC bug 71793.

For more on what the committee thought volatile was for, search the C99 Rationale for the word "volatile". John Regehr's paper "Volatiles are Miscompiled" illustrates in detail how programmer expectations for volatile may not be satisfied by production compilers. The LLVM team's series of essays "What Every C Programmer Should Know About Undefined Behavior" does not touch specifically on volatile but will help you understand how and why modern C compilers are not "portable assemblers".


To the practical question of how to implement a function that does what you wanted volatileZeroMemory to do: Regardless of what the standard requires or was meant to require, it would be wisest to assume that you can't use volatile for this. There is an alternative that can be relied on to work, because it would break far too much other stuff if it didn't work:

extern void memory_optimization_fence(void *ptr, size_t size);
inline void
explicit_bzero(void *ptr, size_t size)
{
memset(ptr, 0, size);
memory_optimization_fence(ptr, size);
}


/* in a separate source file */
void memory_optimization_fence(void *unused1, size_t unused2) {}

However, you must make absolutely sure that memory_optimization_fence is not inlined under any circumstances. It must be in its own source file and it must not be subjected to link-time optimization.

There are other options, relying on compiler extensions, that may be usable under some circumstances and can generate tighter code (one of them appeared in a previous edition of this answer), but none are universal.

(I recommend calling the function explicit_bzero, because it is available under that name in more than one C library. There are at least four other contenders for the name, but each has been adopted only by a single C library.)

You should also know that, even if you can get this to work, it may not be enough. In particular, consider

struct aes_expanded_key { __uint128_t rndk[16]; };


void encrypt(const char *key, const char *iv,
const char *in, char *out, size_t size)
{
aes_expanded_key ek;
expand_key(key, ek);
encrypt_with_ek(ek, iv, in, out, size);
explicit_bzero(&ek, sizeof ek);
}

Assuming hardware with AES acceleration instructions, if expand_key and encrypt_with_ek are inline, the compiler may be able to keep ek entirely in the vector register file -- until the call to explicit_bzero, which forces it to copy the sensitive data onto the stack just to erase it, and, worse, doesn't do a darn thing about the keys that are still sitting in the vector registers!

It should be possible to write a portable version of the function by using a volatile object on the right-hand side and forcing the compiler to preserve the stores to the array.

void volatileZeroMemory(void* ptr, unsigned long long size)
{
volatile unsigned char zero = 0;
unsigned char* bytePtr = static_cast<unsigned char*>(ptr);


while (size--)
{
*bytePtr++ = zero;
}


zero = static_cast<unsigned char*>(ptr)[zero];
}

The zero object is declared volatile that ensures the compiler can make no assumptions about its value even though it always evaluates as zero.

The final assignment expression reads from a volatile index in the array and stores the value in a volatile object. Since this read cannot be optimized, it ensures that the compiler must generate the stores specified in the loop.

I offer this version as portable C++ (although the semantics are subtly different):

void volatileZeroMemory(volatile void* const ptr, unsigned long long size)
{
volatile unsigned char* bytePtr = new (ptr) volatile unsigned char[size];


while (size--)
{
*bytePtr++ = 0;
}
}

Now you have write accesses to a volatile object, not merely accesses to a non-volatile object made through a volatile view of the object.

The semantic difference is that it now formally ends the lifetime of whatever object(s) occupied the memory region, because the memory has been reused. So access to the object after zeroing its contents is now surely undefined behavior (formerly it would have been undefined behavior in most cases, but some exceptions surely existed).

To use this zeroing during an object's lifetime instead of at the end, the caller should use placement new to put a new instance of the original type back again.

The code can be made shorter (albeit less clear) by using value initialization:

void volatileZeroMemory(volatile void* const ptr, unsigned long long size)
{
new (ptr) volatile unsigned char[size] ();
}

and at this point it is a one-liner and barely warrants a helper function at all.

As explained in other answers, volatile was meant to ensure that accesses that have side effects are always generated, even if the access is considered redundant by optimization. Accessing memory-mapped peripherals is a typical use case. The usual solution to define memory-mapped registers looks like this:

#define PORT_A *(volatile short*)0x1234

Of course it can be defined as a structure pointer, describing all registers of a peripheral, and the address might come from some configuration structure, possibly populated runtime. The point is that the compiler must ALWAYS generate the volatile access, regardless of what memory area is accessed. The compiler cannot possibly speculate about what's at that address. Another typical solution is that reading a status register also clears all set status bits. If you wish to just clear the status, you make a dummy (volatile) read, and discard the result. The access must not, under any circumstances, be optimized out by any compiler. (Such dummy reads are also crucial to synchronize delayed transactions on slower peripheral buses) Another solution, used by the IAR (and some other) compilers, is creating volatile data structures in the standard way, and placing the structure at the fixed address of the peripheral with the non-standard @ directive. That works, too, the compiler never optimizes out peripheral accesses. That would be a disaster, bare-metal MCU programming would not work as it does.

My guess is that the array to be zeroed was not defined as volatile, and it was entirely eliminated. Casting its address to a volatile pointer does not preserve the array itself. It's an interesting glitch, because as a consequence the "sacred" volatile access is eliminated, too.