“未定义行为”真的允许任何事情发生吗?

当然,“未定义行为”的经典虚构例子是“鼻腔恶魔”ーー不管 C 和 C + + 标准允许什么,这在物理上是不可能的。

因为 c 和 c + + 社区倾向于强调未定义行为的不可预测性,以及编译器允许程序在遇到未定义行为时按照字面意思 什么都行运行的想法,所以我假设该标准对未定义行为的行为没有任何限制。

C + + 标准中的相关引用似乎是:

允许的未定义行为范围包括完全忽略情况和不可预测的结果,在翻译或程序执行过程中以有记录的方式表现出环境特征(有或没有发出诊断信息) ,以及终止翻译或执行(发出诊断信息)。[..]

这实际上指定了一小组可能的选项:

  • 忽略这种情况 ——是的,标准继续说这将有“不可预测的结果”,但这与编译器 插入代码不同(我认为这将是一个先决条件,你知道,鼻腔恶魔)。
  • 具有环境特征的有记录的行为方式——这听起来实际上是相对良性的。(我当然没有听说过任何记录在案的鼻腔恶魔病例。)
  • 终止翻译或执行 ——使用一个诊断,而不是更少。

I assume that in most cases, compilers choose to ignore the undefined behavior; for example, when reading uninitialized memory, it would presumably be an anti-optimization to insert any code to ensure consistent behavior. I suppose that the stranger types of undefined behavior (such as "时间旅行") would fall under the second category--but this requires that such behaviors be documented and "characteristic of the environment" (so I guess nasal demons are only produced by infernal computers?).

我理解错了吗?这些建议是否仅仅是构成未定义行为的 例子,而不是一份全面的备选方案清单?“任何事情都有可能发生”的说法是否仅仅意味着忽视这种情况会带来意想不到的副作用?

有两点需要澄清:

  • 我认为从最初的问题就已经很清楚了,而且我认为对大多数人来说也是这样,但我还是要把它说出来: 我确实意识到“鼻腔恶魔”是开玩笑的。
  • 请不要写一个(其他)答案来解释 UB 允许特定于平台的编译器优化,除非您解释 还有如何允许 实现定义行为 不会允许的优化。

这个问题并不是为了讨论未定义行为的(反)价值而设计的论坛,但事实就是如此。在任何情况下,对于那些认为 这个线索是关于一个没有未定义行为的假想的 C 编译器是一个重要话题的人来说,这个线索是关于一个没有未定义行为的假想的 C 编译器都可能是一个额外的兴趣。

9407 次浏览

Yes, it permits anything to happen. The note is just giving examples. The definition is pretty clear:

Undefined behavior: behavior for which this International Standard imposes no requirements.


Frequent point of confusion:

You should understand that "no requirement" also means means the implementation is NOT required to leave the behavior undefined or do something bizarre/nondeterministic!

The implementation is perfectly allowed by the C++ standard to document some sane behavior and behave accordingly.1 So, if your compiler claims to wrap around on signed overflow, logic (sanity?) would dictate that you're welcome to rely on that behavior on that compiler. Just don't expect another compiler to behave the same way if it doesn't claim to.

1Heck, it's even allowed to document one thing and do another. That'd be stupid, and it'd probably make you toss it into the trash—why would you trust a compiler whose documentation lies to you?—but it's not against the C++ standard.

The definition of undefined behaviour, in every C and C++ standard, is essentially that the standard imposes no requirements on what happens.

Yes, that means any outcome is permitted. But there are no particular outcomes that are required to happen, nor any outcomes that are required to NOT happen. It does not matter if you have a compiler and library that consistently yields a particular behaviour in response to a particular instance of undefined behaviour - such a behaviour is not required, and may change even in a future bugfix release of your compiler - and the compiler will still be perfectly correct according to each version of the C and C++ standards.

If your host system has hardware support in the form of connection to probes that are inserted in your nostrils, it is within the realms of possibility that an occurrence of undefined behaviour will cause undesired nasal effects.

Nitpicking: You have not quoted a standard.

These are the sources used to generate drafts of the C++ standard. These sources should not be considered an ISO publication, nor should documents generated from them unless officially adopted by the C++ working group (ISO/IEC JTC1/SC22/WG21).

Interpretation: Notes are not normative according to the ISO/IEC Directives Part 2.

Notes and examples integrated in the text of a document shall only be used for giving additional information intended to assist the understanding or use of the document. They shall not contain requirements ("shall"; see 3.3.1 and Table H.1) or any information considered indispensable for the use of the document e.g. instructions (imperative; see Table H.1), recommendations ("should"; see 3.3.2 and Table H.2) or permission ("may"; see Table H.3). Notes may be written as a statement of fact.

Emphasis mine. This alone rules out "comprehensive list of options". Giving examples however does count as "additional information intended to assist the understanding .. of the document".

Do keep in mind that the "nasal demon" meme is not meant to be taken literally, just as using a balloon to explain how universe expansion works holds no truth in physical reality. It's to illustrate that it's foolhardy to discuss what "undefined behavior" should do when it's permissible to do anything. Yes, this means that there isn't an actual rubber band in outer space.

One of the historical purposes of Undefined Behavior was to allow for the possibility that certain actions may have different potentially-useful effects on different platforms. For example, in the early days of C, given

int i=INT_MAX;
i++;
printf("%d",i);

some compilers could guarantee that the code would print some particular value (for a two's-complement machine it would typically be INT_MIN), while others would guarantee that the program would terminate without reaching the printf. Depending upon the application requirements, either behavior could be useful. Leaving the behavior undefined meant that an application where abnormal program termination was an acceptable consequence of overflow but producing seemingly-valid-but-wrong output would not be, could forgo overflow checking if run on a platform which would reliably trap it, and an application where abnormal termination in case of overflow would not be acceptable, but producing arithmetically-incorrect output would be, could forgo overflow checking if run on a platform where overflows weren't trapped.

Recently, however, some compiler authors seem to have gotten into a contest to see who can most efficiently eliminate any code whose existence would not be mandated by the standard. Given, for example...

#include <stdio.h>


int main(void)
{
int ch = getchar();
if (ch < 74)
printf("Hey there!");
else
printf("%d",ch*ch*ch*ch*ch);
}

a hyper-modern compiler may conclude that if ch is 74 or greater, the computation of ch*ch*ch*ch*ch would yield Undefined Behavior, and as a consequence the program should print "Hey there!" unconditionally regardless of what character was typed.

I thought I'd answer just one of your points, since the other answers answer the general question quite well, but have left this unaddressed.

"Ignoring the situation -- Yes, the standard goes on to say that this will have "unpredictable results", but that's not the same as the compiler inserting code (which I assume would be a prerequisite for, you know, nasal demons)."

A situation in which nasal demons could very reasonably be expected to occur with a sensible compiler, without the compiler inserting ANY code, would be the following:

if(!spawn_of_satan)
printf("Random debug value: %i\n", *x); // oops, null pointer deference
nasal_angels();
else
nasal_demons();

A compiler, if it can prove that that *x is a null pointer dereference, is perfectly entitled, as part of some optimisation, to say "OK, so I see that they've dereferenced a null pointer in this branch of the if. Therefore, as part of that branch I'm allowed to do anything. So I can therefore optimise to this:"

if(!spawn_of_satan)
nasal_demons();
else
nasal_demons();

"And from there, I can optimise to this:"

nasal_demons();

You can see how this sort of thing can in the right circumstances prove very useful for an optimising compiler, and yet cause disaster. I did see some examples a while back of cases where actually it IS important for optimisation to be able to optimise this sort of case. I might try to dig them out later when I have more time.

EDIT: One example that just came from the depths of my memory of such a case where it's useful for optimisation is where you very frequently check a pointer for being NULL (perhaps in inlined helper functions), even after having already dereferenced it and without having changed it. The optimising compiler can see that you've dereferenced it and so optimise out all the "is NULL" checks, since if you've dereferenced it and it IS null, anything is allowed to happen, including just not running the "is NULL" checks. I believe that similar arguments apply to other undefined behaviour.

Undefined behavior is simply the result of a situation coming up that the writers of the specification did not foresee.

Take the idea of a traffic light. Red means stop, yellow means prepare for red, and green means go. In this example people driving cars are the implementation of the spec.

What happens if both green and red are on? Do you stop, then go? Do you wait until red turns off and it's just green? This is a case that the spec did not describe, and as a result, anything the drivers do is undefined behavior. Some people will do one thing, some another. Since there is no guarantee about what will happen you want to avoid this situation. The same applies to code.

First, it is important to note that it is not only the behaviour of the user program that is undefined, it is the behaviour of the compiler that is undefined. Similarly, UB is not encountered at runtime, it is a property of the source code.

To a compiler writer, "the behaviour is undefined" means, "you do not have to take this situation into account", or even "you can assume no source code will ever produce this situation". A compiler can do anything, intentionally or unintentionally, when presented with UB, and still be standard compliant, so yes, if you granted access to your nose...

Then, it is not always possible to know if a program has UB or not. Example:

int * ptr = calculateAddress();
int i = *ptr;

Knowing if this can ever be UB or not would require knowing all possible values returned by calculateAddress(), which is impossible in the general case (See "Halting Problem"). A compiler has two choices:

  • assume ptr will always have a valid address
  • insert runtime checks to guarantee a certain behaviour

The first option produces fast programs, and puts the burden of avoiding undesired effects on the programmer, while the second option produces safer but slower code.

The C and C++ standards leave this choice open, and most compilers choose the first, while Java for example mandates the second.


Why is the behaviour not implementation-defined, but undefined?

Implementation-defined means (N4296, 1.9§2):

Certain aspects and operations of the abstract machine are described in this International Standard as implementation-defined (for example, sizeof(int) ). These constitute the parameters of the abstract machine. Each implementation shall include documentation describing its characteristics and behavior in these respects. Such documentation shall define the instance of the abstract machine that corresponds to that implementation (referred to as the “corresponding instance” below).

Emphasis mine. In other words: A compiler-writer has to document exactly how the machine-code behaves, when the source code uses implementation-defined features.

Writing to a random non-null invalid pointer is one of the most unpredictable things you can do in a program, so this would require performance-reducing runtime-checks too.
Before we had MMUs, you could destroy hardware by writing to the wrong address, which comes very close to nasal demons ;-)

Undefined behaviors allow compilers to generate faster code in some cases. Consider two different processor architectures that ADD differently: Processor A inherently discards the carry bit upon overflow, while processor B generates an error. (Of course, Processor C inherently generates Nasal Demons - its just the easiest way to discharge that extra bit of energy in a snot-powered nanobot...)

If the standard required that an error be generated, then all code compiled for processor A would basically be forced to include additional instructions, to perform some sort of check for overflow, and if so, generate an error. This would result in slower code, even if the developer know that they were only going to end up adding small numbers.

Undefined behavior sacrifices portability for speed. By allowing 'anything' to happen, the compiler can avoid writing safety-checks for situations that will never occur. (Or, you know... they might.)

Additionally, when a programmer knows exactly what an undefined behavior will actually cause in their given environment, they are free to exploit that knowledge to gain additional performance.

If you want to ensure that your code behaves exactly the same on all platforms, you need to ensure that no 'undefined behavior' ever occurs - however, this may not be your goal.

Edit: (In respons to OPs edit) Implementation Defined behavior would require the consistent generation of nasal demons. Undefined behavior allows the sporadic generation of nasal demons.

That's where the advantage that undefined behavior has over implementation specific behavior appears. Consider that extra code may be needed to avoid inconsistent behavior on a particular system. In these cases, undefined behavior allows greater speed.

One of the reasons for leaving behavior undefined is to allow the compiler to make whatever assumptions it wants when optimizing.

If there exists some condition that must hold if an optimization is to be applied, and that condition is dependent on undefined behavior in the code, then the compiler may assume that it's met, since a conforming program can't depend on undefined behavior in any way. Importantly, the compiler does not need to be consistent in these assumptions. (which is not the case for implementation-defined behavior)

So suppose your code contains an admittedly contrived example like the one below:

int bar = 0;
int foo = (undefined behavior of some kind);
if (foo) {
f();
bar = 1;
}
if (!foo) {
g();
bar = 1;
}
assert(1 == bar);

The compiler is free to assume that !foo is true in the first block and foo is true in the second, and thus optimize the entire chunk of code away. Now, logically either foo or !foo must be true, and so looking at the code, you would reasonably be able to assume that bar must equal 1 once you've run the code. But because the compiler optimized in that manner, bar never gets set to 1. And now that assertion becomes false and the program terminates, which is behavior that would not have happened if foo hadn't relied on undefined behavior.

Now, is it possible for the compiler to actually insert completely new code if it sees undefined behavior? If doing so will allow it to optimize more, absolutely. Is it likely to happen often? Probably not, but you can never guarantee it, so operating on the assumption that nasal demons are possible is the only safe approach.