Best practices for circular shift (rotate) operations in C++

小开

#define ROTATE_RIGHT(x) ( (x>>1) | (x&1?0x8000:0) )

小开

最佳答案

另请参阅 this answer on another rotate question的早期版本，其中包含更多关于 asmgcc/clang 为 x86产生的内容的细节。

用 C 和 C + + 表示旋转以避免任何未定义行为的最编译器友好的方法似乎是 John Regehr 的实现。我已经调整了它，使其根据类型的宽度旋转(使用像 uint32_t这样的固定宽度类型)。

#include <stdint.h>   // for uint32_t
#include <limits.h>   // for CHAR_BIT
// #define NDEBUG
#include <assert.h>


static inline uint32_t rotl32 (uint32_t n, unsigned int c)
{
const unsigned int mask = (CHAR_BIT*sizeof(n) - 1);  // assumes width is a power of 2.


// assert ( (c<=mask) &&"rotate by type width or more");
c &= mask;
return (n<<c) | (n>>( (-c)&mask ));
}


static inline uint32_t rotr32 (uint32_t n, unsigned int c)
{
const unsigned int mask = (CHAR_BIT*sizeof(n) - 1);


// assert ( (c<=mask) &&"rotate by type width or more");
c &= mask;
return (n>>c) | (n<<( (-c)&mask ));
}

Works for any unsigned integer type, not just uint32_t, so you could make versions for other sizes.

参见带有大量安全检查的也是一个 C + + 11模板版本(包括类型宽度为2的 static_assert) ，例如，在某些24位 DSP 或36位大型机上并非如此。

我建议只使用模板作为名称显式包含旋转宽度的包装器的后端。Integer-promotion rules mean that rotl_template(u16 & 0x11UL, 7) would do a 32 or 64-bit rotate, not 16(取决于 unsigned long的宽度)。甚至 uint16_t & uint16_t也可以通过 C + + 的整数升级规则升级到 signed int，除非在 int不比 uint16_t宽的平台上。

Rel = “ nofollow noReferrer”> 内联到单个 < code > rol r32，cl < /code > (或 rol r32, imm8) ，因为编译器知道 X86旋转和移位指令掩盖 shift-count 的方式与 C 源代码相同。译注:

在 x86上编译器支持这种避免 UB 的习惯用法，在 uint32_t x和 unsigned int n上支持可变计数移位:

Clang: 自 clang3.5以来被识别为可变计数的旋转，在此之前是多个位移 + 或 insns。

Gcc: 自 gcc4.9以来识别为可变计数旋转，在此之前的多次移位 + 或 insns。Gcc5，然后在 Wikipedia 版本中优化掉分支和掩码，只使用 ror或 rol指令进行变量计数。

ICC: 支持可变计数自 ICC13或更早的旋转.在某些 CPU (特别是 AMD，但也有一些 Intel)上，当 BMI2不能用于 rorx eax,edi,25保存 MOV 时，常数计数旋转使用 shld edi,edi,7，shld edi,edi,7比 rol edi,7慢，比 rol edi,7占用更多字节。

MSVC: x86-64 CL19: 只能识别常数计数旋转。(维基百科的习惯用法是可以识别的，但是分支和 AND 没有被优化掉)。在 x86(包括 x86-64)上使用来自 <intrin.h>的 _rotl/_rotr内部特性。

gcc for ARM uses an and r1, r1, #31 for variable-count rotates, but still does the actual rotate with a single instruction: ror r0, r0, r1. So gcc doesn't realize that rotate-counts are inherently modular. As the ARM docs say, 移位长度 ABC2大于32的 ROR 与移位长度 n-32的 ROR 相同. I think gcc gets confused here because left/right shifts on ARM saturate the count, so a shift by 32 or more will clear the register. (Unlike x86, where shifts mask the count the same as rotates). It probably decides it needs an AND instruction before recognizing the rotate idiom, because of how non-circular shifts work on that target.

当前的 x86编译器仍然使用一个额外的指令来掩盖8位和16位旋转的变量计数，可能是出于同样的原因，他们没有避免 ARM 上的 AND。这是一个遗漏的优化，因为性能不依赖于任何 x86-64 CPU 上的旋转计数。(出于性能原因，286引入了计数屏蔽，因为它迭代地处理移位，而不像现代 CPU 那样使用常数延迟。)

BTW, prefer rotate-right for variable-count rotates, to avoid making the compiler do 32-n to implement a left rotate on architectures like ARM and MIPS that only provide a rotate-right. (This optimizes away with compile-time-constant counts.)

有趣的事实: ARM 并没有专门的移动/旋转指令，它只是与在 ROR 模式下通过桶移位器的源操作数: mov r0, r0, ror r1 MOV。因此，一个旋转可以折叠成一个寄存器源操作数的 EOR 指令或东西。

确保对 n和返回值使用了无符号类型，否则不会出现旋转 。(gcc for x86目标进行算术右移，移入符号位的副本而不是零，导致当 OR将两个值一起移动时出现问题。负符号整数的右移是 C 语言中实现定义的行为)

还有，确保移位计数是无符号类型，因为带有符号类型的 (-n)&31可以是某人的补数或符号/大小，而不是模2 ^ n，你可以得到无符号或二的补数。(请看 Regehr 博客上的评论)。unsigned int在我查看的每个编译器上都表现良好，对于 x的每个宽度都是如此。其他一些类型实际上无法识别某些编译器的习惯用法，所以不要只使用与 x相同的类型。

一些编译器提供了旋转 的内部特性，如果可移植版本不能在您所针对的编译器上生成好的代码，那么这种内部特性要比行内特性好得多。据我所知，没有任何编译器具有跨平台的内部特性。以下是 x86的一些选项:

情报显示是 ABC0提供 ABC1和 _rotl64内部特性右转也一样。MSVC 需要 <intrin.h>，而 gcc 需要 <x86intrin.h>。#ifdef负责 gcc 和 icc。Clang 9.0也有它，但在此之前，它似乎没有提供他们的任何地方，除了在 MSVC 兼容模式与 -fms-extensions -fms-compatibility -fms-compatibility-version=17.00。而且它给他们发出的高潮糟透了(额外的掩蔽和 CMOV)。

MSVC: ABC0和 _rotr16.

Gcc 和 icc (非叮当) : <x86intrin.h>还为8位左/右旋转、 __rolw/__rorw(16位)、 __rold/__rord(32位)、 __rolq/__rorq(64位，仅为64位目标定义)提供 __rolb/__rorb。对于狭窄的旋转，实现使用 __builtin_ia32_rolhi或 __rolb0，但是32位和64位的旋转使用 shift/or 定义(没有针对 UB 的保护，因为 __rolb4中的代码只需要在 x86的 gcc 上工作)。GNU C 似乎没有像 __rolb3那样具有任何跨平台的 __rolb2功能(它可以扩展到目标平台上的任何最佳功能，即使它不是单个指令)。大多数情况下，你可以从习语识别中获得好的代码。

// For real use, probably use a rotate intrinsic for MSVC, or this idiom for other compilers. This pattern of #ifdefs may be helpful #if defined(__x86_64__) || defined(__i386__) #ifdef _MSC_VER #include <intrin.h> #else #include <x86intrin.h> // Not just <immintrin.h> for compilers other than icc #endif uint32_t rotl32_x86_intrinsic(rotwidth_t x, unsigned n) { //return __builtin_ia32_rorhi(x, 7); // 16-bit rotate, GNU C return _rotl(x, n); // gcc, icc, msvc. Intel-defined. //return __rold(x, n); // gcc, icc. // can't find anything for clang } #endif

可能一些非 x86编译器也有内部特性，但是我们不要扩展这个社区-wiki 答案来包含它们。(也许在关于内部性质的现有答案中这样做)。

(这个答案的旧版本建议使用 MSVC 专用的内联代码(只适用于32位 x86代码) ，或者使用 http://www.devx.com/tips/Tip/14043代码的 C 版本。评论正在对此做出回应。)

Inline asm defeats many optimizations, 特别是 MSVC 风格，因为它强制存储/重新加载输入. A carefully-written GNU C inline-asm rotate would allow the count to be an immediate operand for compile-time-constant shift counts, but it still couldn't optimize away entirely if the value to be shifted is also a compile-time constant after inlining. 翻译: 奇芳校对: 奇芳校对: 奇芳校对: 奇芳校对: 奇芳校对: 奇芳校对: 奇芳校对: 奇芳校对: 奇芳校对: 奇芳校对: 奇芳校对: 奇芳 https://gcc.gnu.org/wiki/dontuseinlineasm 校对: 奇芳校对: 奇芳校对: 奇芳校对: 奇芳校对: 奇芳校对: 奇芳校对: 奇芳校对: 奇芳校对: 奇芳.

小开

因为它是 C + + ，所以使用一个内联函数:

template <typename INT> INT rol(INT val) { return (val << 1) | (val >> (sizeof(INT)*CHAR_BIT-1)); }

C + + 11变体:

template <typename INT> constexpr INT rol(INT val) { static_assert(std::is_unsigned<INT>::value, "Rotate Left only makes sense for unsigned types"); return (val << 1) | (val >> (sizeof(INT)*CHAR_BIT-1)); }

小开

假设你想移动 L位，输入 x是一个 N位的数字:

unsigned ror(unsigned x, int L, int N) { unsigned lsbs = x & ((1 << L) - 1); return (x >> L) | (lsbs << (N-L)); }

小开

过载一个函数:

unsigned int rotate_right(unsigned int x) { return (x>>1 | (x&1?0x80000000:0)) } unsigned short rotate_right(unsigned short x) { /* etc. */ }

小开

大多数编译器都有相应的内部特性。例如 Visual Studio < a href = “ https://Learn.microsoft.com/en-us/cpp/intrinsics/rotr8-rotr16? view = vs-2019”rel = “ nofollow noReferrer”> _ rotr8，_ rotr16

小开

How abt something like this, using the standard bitset ...

#include <bitset> #include <iostream> template <std::size_t N> inline void rotate(std::bitset<N>& b, unsigned m) { b = b << m | b >> (N-m); } int main() { std::bitset<8> b(15); std::cout << b << '\n'; rotate(b, 2); std::cout << b << '\n'; return 0; }

HTH,

小开

详细地说，您可以应用以下逻辑。

If Bit Pattern is 33602 in Integer

1000 0011 0100 0010

然后你需要向右转两次: 首先复制一个位模式，然后向左移位: Llength-RightShift 也就是说，长度是16，右移值是2 16-2 = 14

在左移14次之后。

1000 0000 0000 0000

现在向右移动数值33602，按要求移动两倍。你得到了

0010 0000 1101 0000

现在取14次左移值和2次右移值之间的 OR。

1000 0000 0000 0000 0010 0000 1101 0000 =================== 1010 0000 1101 0000 ===================

记住，位智能操作更快，这甚至不需要任何循环。

小开

If x is an 8 bit value, you can use this:

x=(x>>1 | x<<7);

小开

当然:

template<class T> T ror(T x, unsigned int moves) { return (x >> moves) | (x << sizeof(T)*8 - moves); }

小开

Source Code X 位数

int x =8; data =15; //input unsigned char tmp; for(int i =0;i<x;i++) { printf("Data & 1 %d\n",data&1); printf("Data Shifted value %d\n",data>>1^(data&1)<<(x-1)); tmp = data>>1|(data&1)<<(x-1); data = tmp; }

小开

另一个建议

template<class T> inline T rotl(T x, unsigned char moves){ unsigned char temp; __asm{ mov temp, CL mov CL, moves rol x, CL mov CL, temp }; return x; }

小开

The correct answer is following:

#define BitsCount( val ) ( sizeof( val ) * CHAR_BIT ) #define Shift( val, steps ) ( steps % BitsCount( val ) ) #define ROL( val, steps ) ( ( val << Shift( val, steps ) ) | ( val >> ( BitsCount( val ) - Shift( val, steps ) ) ) ) #define ROR( val, steps ) ( ( val >> Shift( val, steps ) ) | ( val << ( BitsCount( val ) - Shift( val, steps ) ) ) )

小开

下面是 Dídac Pérez 的回答的一个稍微改进的版本，实现了两个方向，并演示了使用无符号 char 和无符号长值的这些函数的用法。一些注释:

这些函数是为编译器优化而内联的

我使用了一个 cout << +value技巧，以简洁的方式输出一个无符号字符，我在这里找到了它: https://stackoverflow.com/a/28414758/1599699

为了清晰和安全起见，我建议使用显式的 <put the type here>语法。

我在 shift tNum 参数中使用了 unsignedchar，这是因为我在附加细节部分给你中发现:

如果 additive-expression为。移位操作的结果是未定义的负值，或者如果 加性表达式大于或等于 (提升的) 移位表达式中的位数。

下面是我使用的代码:

#include <iostream> using namespace std; template <typename T> inline T rotateAndCarryLeft(T rotateMe, unsigned char shiftNum) { static const unsigned char TBitCount = sizeof(T) * 8U; return (rotateMe << shiftNum) | (rotateMe >> (TBitCount - shiftNum)); } template <typename T> inline T rotateAndCarryRight(T rotateMe, unsigned char shiftNum) { static const unsigned char TBitCount = sizeof(T) * 8U; return (rotateMe >> shiftNum) | (rotateMe << (TBitCount - shiftNum)); } void main() { //00010100 == (unsigned char)20U //00000101 == (unsigned char)5U == rotateAndCarryLeft(20U, 6U) //01010000 == (unsigned char)80U == rotateAndCarryRight(20U, 6U) cout << "unsigned char " << 20U << " rotated left by 6 bits == " << +rotateAndCarryLeft<unsigned char>(20U, 6U) << "\n"; cout << "unsigned char " << 20U << " rotated right by 6 bits == " << +rotateAndCarryRight<unsigned char>(20U, 6U) << "\n"; cout << "\n"; for (unsigned char shiftNum = 0U; shiftNum <= sizeof(unsigned char) * 8U; ++shiftNum) { cout << "unsigned char " << 21U << " rotated left by " << +shiftNum << " bit(s) == " << +rotateAndCarryLeft<unsigned char>(21U, shiftNum) << "\n"; } cout << "\n"; for (unsigned char shiftNum = 0U; shiftNum <= sizeof(unsigned char) * 8U; ++shiftNum) { cout << "unsigned char " << 21U << " rotated right by " << +shiftNum << " bit(s) == " << +rotateAndCarryRight<unsigned char>(21U, shiftNum) << "\n"; } cout << "\n"; for (unsigned char shiftNum = 0U; shiftNum <= sizeof(unsigned long long) * 8U; ++shiftNum) { cout << "unsigned long long " << 3457347ULL << " rotated left by " << +shiftNum << " bit(s) == " << rotateAndCarryLeft<unsigned long long>(3457347ULL, shiftNum) << "\n"; } cout << "\n"; for (unsigned char shiftNum = 0U; shiftNum <= sizeof(unsigned long long) * 8U; ++shiftNum) { cout << "unsigned long long " << 3457347ULL << " rotated right by " << +shiftNum << " bit(s) == " << rotateAndCarryRight<unsigned long long>(3457347ULL, shiftNum) << "\n"; } cout << "\n\n"; system("pause"); }

小开

--- Substituting RLC in 8051 C for speed --- Rotate left carry Here is an example using RLC to update a serial 8 bit DAC msb first: (r=DACVAL, P1.4= SDO, P1.5= SCLK) MOV A, r ?1: MOV B, #8 RLC A MOV P1.4, C CLR P1.5 SETB P1.5 DJNZ B, ?1 Here is the code in 8051 C at its fastest: sbit ACC_7 = ACC ^ 7 ; //define this at the top to access bit 7 of ACC ACC = r; B = 8; do { P1_4 = ACC_7; // this assembles into mov c, acc.7 mov P1.4, c ACC <<= 1; P1_5 = 0; P1_5 = 1; B -- ; } while ( B!=0 ); The keil compiler will use DJNZ when a loop is written this way. I am cheating here by using registers ACC and B in c code. If you cannot cheat then substitute with: P1_4 = ( r & 128 ) ? 1 : 0 ; r <<= 1; This only takes a few extra instructions. Also, changing B for a local var char n is the same. Keil does rotate ACC left by ADD A, ACC which is the same as multiply 2. It only takes one extra opcode i think. Keeping code entirely in C keeps things simpler sometimes.

小开

C++20 std::rotl and std::rotr

它已经到达! http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p0553r4.html，应该添加到 <bit>头。

Cpferences 说的用法将类似于:

#include <bit> #include <bitset> #include <cstdint> #include <iostream> int main() { std::uint8_t i = 0b00011101; std::cout << "i = " << std::bitset<8>(i) << '\n'; std::cout << "rotl(i,0) = " << std::bitset<8>(std::rotl(i,0)) << '\n'; std::cout << "rotl(i,1) = " << std::bitset<8>(std::rotl(i,1)) << '\n'; std::cout << "rotl(i,4) = " << std::bitset<8>(std::rotl(i,4)) << '\n'; std::cout << "rotl(i,9) = " << std::bitset<8>(std::rotl(i,9)) << '\n'; std::cout << "rotl(i,-1) = " << std::bitset<8>(std::rotl(i,-1)) << '\n'; }

输出:

i = 00011101 rotl(i,0) = 00011101 rotl(i,1) = 00111010 rotl(i,4) = 11010001 rotl(i,9) = 00111010 rotl(i,-1) = 10001110

当对 GCC 的支持到来时，我将尝试使用它，使用 g++-9 -std=c++2a的 GCC 9.1.0仍然不支持它。

The proposal says:

标题:

namespace std { // 25.5.5, rotating template<class T> [[nodiscard]] constexpr T rotl(T x, int s) noexcept; template<class T> [[nodiscard]] constexpr T rotr(T x, int s) noexcept;

and:

25.5.5旋转[ bitops.rot ]

在下面的描述中，让 N 表示 std::numeric_limits<T>::digits。

template<class T> [[nodiscard]] constexpr T rotl(T x, int s) noexcept;

约束: T 是一个无符号整数类型(3.9.1[ basic.basic ])。

让我们的 s% N。

Returns: If r is 0, x; if r is positive, (x << r) | (x >> (N - r)); if r is negative, rotr(x, -r).

template<class T> [[nodiscard]] constexpr T rotr(T x, int s) noexcept;

约束: T 是一个无符号整数类型(3.9.1[ basic.basic ])。让我们的 s% N。

返回: 如果 r 为0，x; 如果 r 为正，则返回 (x >> r) | (x << (N - r)); 如果 r 为负，则返回 rotl(x, -r)。

还添加了一个 std::popcount来计算1位的数目: 如何计算一个32位整数的集合位数？