在新的标准版本中,C + + 是否曾经发生过无声的行为变化?

(我在寻找一两个例子来证明这一点,而不是一个列表。)

C + + 标准的改变(例如从98到11,11到14等等)是否曾经悄无声息地改变过现有的、格式良好的、已定义行为的用户代码的行为?例如,在编译较新的标准版本时没有任何警告或错误?

备注:

  • 我问的是标准强制的行为,而不是实现者/编译器作者的选择。
  • 代码设计得越少越好(作为这个问题的答案)。
  • 我指的不是具有版本检测功能的代码,比如 #if __cplusplus >= 201103L
  • 涉及内存模型的答案很好。
6268 次浏览

The return type of string::data changes from const char* to char* in C++ 17. That could certainly make a difference

void func(char* data)
{
cout << data << " is not const\n";
}


void func(const char* data)
{
cout << data << " is const\n";
}


int main()
{
string s = "xyz";
func(s.data());
}

A bit contrived but this legal program would change its output going from C++14 to C++17.

The answer to this question shows how initializing a vector using a single size_type value can result in different behavior between C++03 and C++11.

std::vector<Something> s(10);

C++03 default-constructs a temporary object of the element type Something and copy-constructs each element in the vector from that temporary.

C++11 default-constructs each element in the vector.

In many (most?) cases these result in equivalent final state, but there is no reason they have to. It depends on the implementation of Something's default/copy constructors.

See this contrived example:

class Something {
private:
static int counter;


public:
Something() : v(counter++) {
std::cout << "default " << v << '\n';
}


Something(Something const & other) : v(counter++) {
std::cout << "copy " << other.v << " to " << v << '\n';
}


~Something() {
std::cout << "dtor " << v << '\n';
}


private:
int v;
};


int Something::counter = 0;

C++03 will default-construct one Something with v == 0 then copy-construct ten more from that one. At the end, the vector contains ten objects whose v values are 1 through 10, inclusive.

C++11 will default-construct each element. No copies are made. At the end, the vector contains ten objects whose v values are 0 through 9, inclusive.

The standard has a list of breaking changes in Annex C [diff]. Many of these changes can lead to silent behavior change.

An example:

int f(const char*); // #1
int f(bool);        // #2


int x = f(u8"foo"); // until C++20: calls #1; since C++20: calls #2

Oh boy... The link cpplearner provided is scary.

Among others, C++20 disallowed C-style struct declaration of C++ structs.

typedef struct
{
void member_foo(); // Ill-formed since C++20
} m_struct;

If you were taught writing structs like that (and people that teach "C with classes" teach exactly that) you're screwed.

Every time they add new methods (and often functions) to the standard library this happens.

Suppose you have a standard library type:

struct example {
void do_stuff() const;
};

pretty simple. In some standard revision, a new method or overload or next to anything is added:

struct example {
void do_stuff() const;
void method(); // a new method
};

this can silently change the behavior of existing C++ programs.

This is because C++'s currently limited reflection capabilities are sufficient to detect if such a method exists, and run different code based on it.

template<class T, class=void>
struct detect_new_method : std::false_type {};


template<class T>
struct detect_new_method< T, std::void_t< decltype( &T::method ) > > : std::true_type {};

this is just a relatively simple way to detect the new method, there are myriad of ways.

void task( std::false_type ) {
std::cout << "old code";
};
void task( std::true_type ) {
std::cout << "new code";
};


int main() {
task( detect_new_method<example>{} );
}

The same can happen when you remove methods from classes.

While this example directly detects the existence of a method, this kind of thing happening indirectly can be less contrived. As a concrete example, you might have a serialization engine that decides if something can be serialized as a container based on if it is iterable, or if it has a data pointing-to-raw-bytes and a size member, with one preferred over the other.

The standard goes and adds a .data() method to a container, and suddenly the type changes which path it uses for serialization.

All the C++ standard can do, if it doesn't want to freeze, is to make the kind of code that silently breaks be rare or somehow unreasonable.

I'm not sure if you'd consider this a breaking change to correct code, but ...

Before C++11, compilers were allowed, but not required, to elide copies in certain circumstances, even when the copy constructor has observable side effects. Now we have guaranteed copy elision. The behavior essentially went from implementation-defined to required.

This means that your copy constructor side effects may have occurred with older versions, but will never occur with newer ones. You could argue the correct code shouldn't rely on implementation-defined results, but I don't think that's quite the same as saying such code is incorrect.

Trigraphs dropped

Source files are encoded in a physical character set that is mapped in an implementation-defined way to the source character set, which is defined in the standard. To accommodate mappings from some physical character sets that didn't natively have all of the punctuation needed by the source character set, the language defined trigraphs—sequences of three common characters that could be used in place of a less common punctuation character. The preprocessor and compiler were required to handle these.

In C++17, trigraphs were removed. So some source files will not be accepted by newer compilers unless they are first translated from the physical character set to some other physical character set that maps one-to-one to the source character set. (In practice, most compilers just made interpretation of trigraphs optional.) This isn't a subtle behavior change, but a breaking change the prevents previously-acceptable source files from being compiled without an external translation process.

More constraints on char

The standard also refers to the execution character set, which is implementation defined, but must contain at least the entire source character set plus a small number of control codes.

The C++ standard defined char as a possibly-unsigned integral type that can efficiently represent every value in the execution character set. With the representation from a language lawyer, you can argue that a char has to be at least 8 bits.

If your implementation uses an unsigned value for char, then you know it can range from 0 to 255, and is thus suitable for storing every possible byte value.

But if your implementation uses a signed value, it has options.

Most would use two's complement, giving char a minimum range of -128 to 127. That's 256 unique values.

But another option was sign+magnitude, where one bit is reserved to indicate whether the number is negative and the other seven bits indicate the magnitude. That would give char a range of -127 to 127, which is only 255 unique values. (Because you lose one useful bit combination to represent -0.)

I'm not sure the committee ever explicitly designated this as a defect, but it was because you couldn't rely on the standard to guarantee a round-trip from unsigned char to char and back would preserve the original value. (In practice, all implementations did because they all used two's complement for signed integral types.)

Only recently (C++17?) was the wording fixed to ensure round-tripping. That fix, along with all the other requirements on char, effectively mandates two's complement for signed char without saying so explicitly (even as the standard continues to allow sign+magnitude representations for other signed integral types). There's a proposal out to require all signed integral types use two's complement, but I don't recall whether it made it into C++20.

So this one is sort of the opposite of what you're looking for because it gives previously incorrect overly presumptuous code a retroactive fix.

Here's an example that prints 3 in C++03 but 0 in C++11:

template<int I> struct X   { static int const c = 2; };
template<> struct X<0>     { typedef int c; };
template<class T> struct Y { static int const c = 3; };
static int const c = 4;
int main() { std::cout << (Y<X< 1>>::c >::c>::c) << '\n'; }

This change in behavior was caused by special handling for >>. Prior to C++11, >> was always the right shift operator. With C++11, >> can be part of a template declaration, too.

The behavior when reading (numeric) data from a stream, and reading fails, was changed since c++11.

For example, reading an integer from a stream, while it does not contain an integer:

#include <iostream>
#include <sstream>


int main(int, char **)
{
int a = 12345;
std::string s = "abcd";         // not an integer, so will fail
std::stringstream ss(s);
ss >> a;
std::cout << "fail = " << ss.fail() << " a = " << a << std::endl;        // since c++11: a == 0, before a still 12345
}

Since c++ 11 will set the read integer to 0 when it failed; at c++ < 11 the integer was not changed. That said, gcc, even when forcing the standard back to c++98 (with -std=c++98 ) always shows new behavior at least since version 4.4.7.

(Imho the old behavior was actually better: why change the value to 0, which is by itself valid, when nothing could be read?)

Reference: see https://en.cppreference.com/w/cpp/locale/num_get/get