C/C + + : 强制位字段顺序和对准

我读到一个结构中位字段的顺序是特定于平台的。如果我使用不同的特定于编译器的打包选项,那么这些保证数据在写入时是否会以正确的顺序存储?例如:

struct Message
{
unsigned int version : 3;
unsigned int type : 1;
unsigned int id : 5;
unsigned int data : 6;
} __attribute__ ((__packed__));

在具有 GCC 编译器的 Intel 处理器上,字段如图所示在内存中进行了布局。Message.version是缓冲区中的前3位,Message.type紧随其后。如果我为不同的编译器找到等效的结构打包选项,这是跨平台的吗?

85622 次浏览

Bit fields vary widely from compiler to compiler, sorry.

With GCC, big endian machines lay out the bits big end first and little endian machines lay out the bits little end first.

K&R says "Adjacent [bit-]field members of structures are packed into implementation-dependent storage units in an implementation-dependent direction. When a field following another field will not fit ... it may be split between units or the unit may be padded. An unnamed field of width 0 forces this padding..."

Therefore, if you need machine independent binary layout you must do it yourself.

This last statement also applies to non-bitfields due to padding -- however all compilers seem to have some way of forcing byte packing of a structure, as I see you already discovered for GCC.

No, it will not be fully-portable. Packing options for structs are extensions, and are themselves not fully portable. In addition to that, C99 §6.7.2.1, paragraph 10 says: "The order of allocation of bit-fields within a unit (high-order to low-order or low-order to high-order) is implementation-defined."

Even a single compiler might lay the bit field out differently depending on the endianness of the target platform, for example.

endianness are talking about byte orders not bit orders. Nowadays , it is 99% sure that bit orders are fixed. However, when using bitfields, endianness should be taken in count. See the example below.

#include <stdio.h>


typedef struct tagT{


int a:4;
int b:4;
int c:8;
int d:16;
}T;




int main()
{
char data[]={0x12,0x34,0x56,0x78};
T *t = (T*)data;
printf("a =0x%x\n" ,t->a);
printf("b =0x%x\n" ,t->b);
printf("c =0x%x\n" ,t->c);
printf("d =0x%x\n" ,t->d);


return 0;
}


//- big endian :  mips24k-linux-gcc (GCC) 4.2.3 - big endian
a =0x1
b =0x2
c =0x34
d =0x5678
1   2   3   4   5   6   7   8
\_/ \_/ \_____/ \_____________/
a   b     c           d


// - little endian : gcc (Ubuntu 4.3.2-1ubuntu11) 4.3.2
a =0x2
b =0x1
c =0x34
d =0x7856
7   8   5   6   3   4   1   2
\_____________/ \_____/ \_/ \_/
d           c     b   a

Most of the time, probably, but don't bet the farm on it, because if you're wrong, you'll lose big.

If you really, really need to have identical binary information, you'll need to create bitfields with bitmasks - e.g. you use an unsigned short (16 bit) for Message, and then make things like versionMask = 0xE000 to represent the three topmost bits.

There's a similar problem with alignment within structs. For instance, Sparc, PowerPC, and 680x0 CPUs are all big-endian, and the common default for Sparc and PowerPC compilers is to align struct members on 4-byte boundaries. However, one compiler I used for 680x0 only aligned on 2-byte boundaries - and there was no option to change the alignment!

So for some structs, the sizes on Sparc and PowerPC are identical, but smaller on 680x0, and some of the members are in different memory offsets within the struct.

This was a problem with one project I worked on, because a server process running on Sparc would query a client and find out it was big-endian, and assume it could just squirt binary structs out on the network and the client could cope. And that worked fine on PowerPC clients, and crashed big-time on 680x0 clients. I didn't write the code, and it took quite a while to find the problem. But it was easy to fix once I did.

Bitfields should be avoided - they aren't very portable between compilers even for the same platform. from the C99 standard 6.7.2.1/10 - "Structure and union specifiers" (there's similar wording in the C90 standard):

An implementation may allocate any addressable storage unit large enough to hold a bitfield. If enough space remains, a bit-field that immediately follows another bit-field in a structure shall be packed into adjacent bits of the same unit. If insufficient space remains, whether a bit-field that does not fit is put into the next unit or overlaps adjacent units is implementation-defined. The order of allocation of bit-fields within a unit (high-order to low-order or low-order to high-order) is implementation-defined. The alignment of the addressable storage unit is unspecified.

You cannot guarantee whether a bit field will 'span' an int boundary or not and you can't specify whether a bitfield starts at the low-end of the int or the high end of the int (this is independant of whether the processor is big-endian or little-endian).

Prefer bitmasks. Use inlines (or even macros) to set, clear and test the bits.

Of course the best answer is to use a class which reads/writes bit fields as a stream. Using the C bit field structure is just not guaranteed. Not to mention it is considered unprofessional/lazy/stupid to use this in real world coding.

Thanks @BenVoigt for your very useful comment starting

No, they were created to save memory.

Linux source does use a bit field to match to an external structure: /usr/include/linux/ip.h has this code for the first byte of an IP datagram

struct iphdr {
#if defined(__LITTLE_ENDIAN_BITFIELD)
__u8    ihl:4,
version:4;
#elif defined (__BIG_ENDIAN_BITFIELD)
__u8    version:4,
ihl:4;
#else
#error  "Please fix <asm/byteorder.h>"
#endif

However in light of your comment I'm giving up trying to get this to work for the multi-byte bit field frag_off.