C/C + + 中无符号左移之前的掩码是否过于偏执?

我在 C/C + + 中实现了加密算法(例如 SHA-1) ,编写了与平台无关的可移植代码,并彻底避免了 未定义行为

假设一个标准化的加密算法要求您实现以下内容:

b = (a << 31) & 0xFFFFFFFF

其中 ab是无符号32位整数。注意,在结果中,我们丢弃了最低有效32位以上的所有位。


作为第一个天真的近似值,我们可以假设在大多数平台上 int是32位宽的,因此我们会写:

unsigned int a = (...);
unsigned int b = a << 31;

我们知道这段代码不会在任何地方工作,因为 int在某些系统上是16位宽,在另一些系统上是64位宽,甚至可能是36位宽。但是使用 stdint.h,我们可以用 uint32_t类型改进这段代码:

uint32_t a = (...);
uint32_t b = a << 31;

所以我们结束了,对吧?这是我多年来的想法。不完全是。假设在某个平台上,我们有:

// stdint.h
typedef unsigned short uint32_t;

在 C/C + + 中执行算术运算的规则是,如果类型(比如 short)比 int窄,那么如果所有值都适合,那么它就会扩大到 int,否则就扩大到 unsigned int

假设编译器将 short定义为32位(有符号) ,将 int定义为48位(有符号)。然后这些代码行:

uint32_t a = (...);
uint32_t b = a << 31;

实际上意味着:

unsigned short a = (...);
unsigned short b = (unsigned short)((int)a << 31);

请注意,a被提升到 int是因为所有的 ushort(即 uint32)都适合 int(即 int48)。

但是现在我们有一个问题: 将非零位左移到有符号整数类型的符号位是未定义行为的。发生这个问题是因为我们的 uint32被提升到 int48-而不是被提升到 uint48(在那里左移是可以的)。


以下是我的问题:

  1. 我的推理是正确的吗? 这在理论上是一个合理的问题吗?

  2. 这个问题是否可以忽略,因为在每个平台上下一个整数类型是宽度的两倍?

  3. 通过像这样预先屏蔽输入来正确防御这种病态情况是一个好主意吗?校对: b = (a & 1) << 31;。(这在每个平台上都必然是正确的。但这可能会使速度至关重要的加密算法变得不必要的慢。)

澄清/编辑:

  • 我接受 C 或 C + + 或两者兼而有之的答案。我想知道至少一种语言的答案。

  • 预掩蔽逻辑可能会损害位旋转。例如,GCC 将用汇编语言将 b = (a << 31) | (a >> 1);编译成32位位旋转指令。但是如果我们预先屏蔽左移位,新逻辑可能不会转换成位旋转,这意味着现在要执行4个操作,而不是1个。

6682 次浏览

To avoid unwanted promotion, you may use the greater type with some typedef, as

using my_uint_at_least32 = std::conditional_t<(sizeof(std::uint32_t) < sizeof(unsigned)),
unsigned,
std::uint32_t>;

For this segment of code:

uint32_t a = (...);
uint32_t b = a << 31;

To promote a to a unsigned type instead of signed type, use:

uint32_t b = a << 31u;

When both sides of << operator is an unsigned type, then this line in 6.3.1.8 (C standard draft n1570) applies:

Otherwise, if both operands have signed integer types or both have unsigned integer types, the operand with the type of lesser integer conversion rank is converted to the type of the operand with greater rank.


The problem you are describing is caused you use 31 which is signed int type so another line in 6.3.1.8

Otherwise, if the type of the operand with signed integer type can represent all of the values of the type of the operand with unsigned integer type, then the operand with unsigned integer type is converted to the type of the operand with signed integer type.

forces a to promoted to a signed type


Update:

This answer is not correct because 6.3.1.1(2) (emphasis mine):

...

If an int can represent all values of the original type (as restricted by the width, for a bit-field), the value is converted to an int; otherwise, it is converted to an unsigned int. These are called the integer promotions.58) All other types are unchanged by the integer promotions.

and footnote 58 (emphasis mine):

58) The integer promotions are applied only: as part of the usual arithmetic conversions, to certain argument expressions, to the operands of the unary +, -, and ~ operators, and to both operands of the shift operators, as specified by their respective subclauses.

Since only integer promotion is happening and not common arithmetic conversion, using 31u does not guarantee a to be converted to unsigned int as stated above.

Q1: Masking before the shift does prevent undefined behavior that OP has concern.

Q2: "... because on every platform the next integer type is double the width?" --> no. The "next" integer type could be less than 2x or even the same size.

The following is well defined for all compliant C compilers that have uint32_t.

uint32_t a;
uint32_t b = (a & 1) << 31;

Q3: uint32_t a; uint32_t b = (a & 1) << 31; is not expected to incur code that performs a mask - it is not needed in the executable - just in the source. If a mask does occur, get a better compiler should speed be an issue.

As suggested, better to emphasize the unsigned-ness with these shifts.

uint32_t b = (a & 1U) << 31;

@John Bollinger good answer well details how to handle OP's specific problem.

The general problem is how to form a number that is of at least n bits, a certain sign-ness and not subject to surprising integer promotions - the core of OP's dilemma. The below fulfills this by invoking an unsigned operation that does not change the value - effective a no-op other than type concerns. The product will be at least the width of unsigned or uint32_t. Casting, in general, may narrow the type. Casting needs to be avoided unless narrowing is certain to not occur. An optimization compiler will not create unnecessary code.

uint32_t a;
uint32_t b = (a + 0u) << 31;
uint32_t b = (a*1u) << 31;

Speaking to the C side of the problem,

  1. Is my reasoning correct, and is this a legitimate problem in theory?

It is a problem that I had not considered before, but I agree with your analysis. C defines the behavior of the << operator in terms of the type of the promoted left operand, and it it conceivable that the integer promotions result in that being (signed) int when the original type of that operand is uint32_t. I don't expect to see that in practice on any modern machine, but I'm all for programming to the actual standard as opposed to my personal expectations.

  1. Is this problem safe to ignore because on every platform the next integer type is double the width?

C does not require such a relationship between integer types, though it is ubiquitous in practice. If you are determined to rely only on the standard, however -- that is, if you are taking pains to write strictly conforming code -- then you cannot rely on such a relationship.

  1. Is a good idea to correctly defend against this pathological situation by pre-masking the input like this?: b = (a & 1) << 31;. (This will necessarily be correct on every platform. But this could make a speed-critical crypto algorithm slower than necessary.)

The type unsigned long is guaranteed to have at least 32 value bits, and it is not subject to promotion to any other type under the integer promotions. On many common platforms it has exactly the same representation as uint32_t, and may even be the same type. Thus, I would be inclined to write the expression like this:

uint32_t a = (...);
uint32_t b = (unsigned long) a << 31;

Or if you need a only as an intermediate value in the computation of b, then declare it as an unsigned long to begin with.

Taking a clue from this question about possible UB in uint32 * uint32 arithmetic, the following simple approach should work in C and C++:

uint32_t a = (...);
uint32_t b = (uint32_t)((a + 0u) << 31);

The integer constant 0u has type unsigned int. This promotes the addition a + 0u to uint32_t or unsigned int, whichever is wider. Because the type has rank int or higher, no more promotion occurs, and the shift can be applied with the left operand being uint32_t or unsigned int.

The final cast back to uint32_t will just suppress potential warnings about a narrowing conversion (say if int is 64 bits).

A decent C compiler should be able to see that adding zero is a no-op, which is less onerous than seeing that a pre-mask has no effect after an unsigned shift.