When is an integer<->pointer cast actually correct?

The common folklore says that:

  • The type system exists for a reason. Integers and pointers are distinct types, casting between them is a malpractice in the majority of cases, may indicate a design error and should be avoided.

  • Even when such a cast is performed, no assumptions shall be made about the size of integers and pointers (casting void* to int is the simplest way to make the code fail on x64), and instead of int one should use intptr_t or uintptr_t from stdint.h.

Knowing that, when is it actually useful to perform such casts?

(Note: having a bit shorter code for the price of portability doesn't count as "actually useful".)


One case I know:

  • Some lock-free multiprocessor algorithms exploit the fact that a 2+-byte-alligned pointer has some redundancy. They then use the lowest bits of the pointer as boolean flags, for instance. With a processor having an appropriate instruction set, this may eliminate the need for a locking mechanism (which would be necessary if the pointer and the boolean flag were separate).
    (Note: This practice is even possible to do safely in Java via java.util.concurrent.atomic.AtomicMarkableReference)

Anything more?

6959 次浏览

The only time i cast a pointer to an integer is when i want to store a pointer, but the only storage i have available is an integer.

It is never useful to perform such casts, unless you have full knowledge of the behaviour of your compiler+platform combination, and wish to exploit it (your question scenario is one such example).

The reason I say it is never useful is because in general, you don't have control of the compiler, nor full knowledge of what optimisations it may choose to do. Or to put it another way, you aren't able to precisely control the machine code it will generate. So in general, you can't implement this sort of trick safely.

It could be useful when checking the alignment of types in general so that misaligned memory gets caught with an assert rather than just SIGBUS/SIGSEGV.

E.g.:

#include <xmmintrin.h>
#include <assert.h>
#include <stdint.h>


int main() {
void *ptr = malloc(sizeof(__m128));
assert(!((intptr_t)ptr) % __alignof__(__m128));
return 0;
}

(In real code I wouldn't just gamble on malloc, but it illustrates the point)

One example is in Windows, e.g. the SendMessage() and PostMessage() functions. They take a HWnd (a handle to a window), a message (an integral type), and two parameters for the message, a WPARAM and an LPARAM. Both parameter types are integral, but sometimes you must pass pointers, depending on the message you send. Then you will have to cast a pointer to an LPARAM or WPARAM.

I would generally avoid it like the plague. If you need to store a pointer, use a pointer type, if that is possible.

You may need to access memory at a fixed known address, then your address is an integer and you need to assign it to a pointer. This is somewhat common in embedded systems. Conversely, you may need to print a memory address and thus need to cast it to integer.

Oh, and don't forget you need to assign and compare pointers to NULL, which is usually a pointer cast of 0L

I sometimes cast pointers to integers when they somehow need to be part of a hashsum. Also I cast them to integers to do some bitfiddling with them on certain implemetnations where it is guaranteed that pointers always have one or two spare bits left, where I can encode AVL or RB Tree information in the left/right pointers instead of having an additional member. But this is all so implementation specific that I recommend to never think about it as any kind of common solution. Also I heard that sometimes hazard pointers can be implemented with such a thing.

In some situations I need a unique ID per object that I pass along to e.g. servers as my request id. Depending on the context when I need to save some memory, and it is worth it, I use the address of my object as such an id, and usually have to cast it to an integer.

When working with embedded systems (such as in canon cameras, see chdk) there are often magic addesses, so a (void*)0xFFBC5235 or similar is often found there too

edit:

Just stumbled (in my mind) over pthread_self() which returns a pthread_t which is usually a typedef to an unsigned integer. Internally though it is a pointer to some thread struct, representing the thread in question. In general it might used elsewhere for an opaque handle.

When is it correct to store pointers in ints? It's correct when you treat it as what it is: The use of a platform or compiler specific behavior.

The problem is only when you have platform/compiler specific code littered throughout your application and you have to port your code to another platform, because you've made assumptions that don't hold true any longer. By isolating that code and hiding it behind an interface that makes no assumptions about the underlying platform, you eliminate the problem.

So as long as you document the implementation, separate it behind a platform independent interface using handles or something that doesn't depend on how it works behind the scenes, and then make the code compile conditionally only on platforms/compilers where it's been tested and works, then there's no reason for you not to use any sort of voodoo magic you come across. You can even include large chunks of assembly language, proprietary API calls, and kernel system calls if you want.

That said, if your "portable" interface uses integer handles, integers are the same size as pointers on the implementation for a certain platform, and that implementation uses pointers internally, why not simply use the pointers as integer handles? A simple cast to an integer makes sense in that case, because you cut out the necessity of a handle/pointer lookup table of some sort.

I have one use for such a thing in network wide ID's of objects. Such a ID would combine identifications of machine (e.g IP address), process id and the address of the object. To be sent over a socket the pointer part of such an ID must be put into a wide enough integer such that it survives transport back and forth. The pointer part is only interpreted as a pointer (= cast back to a pointer) in the context where this makes sense (same machine, same process), on other machines or in other processes it just serves to distinguish different objects.

The things one needs to have that working is the existence uintptr_t and uint64_t as a fix width integer type. (Well only works on machines that have at most 64 addresses :)

under x64, on can use the upper bits of pointers for tagging (as only 47 bits are used for the actual pointer). this is great for things like run time code generation (LuaJIT uses this technique, which is an ancient technique, according to the comments), to do this tagging and tag checking you either need a cast or a union, which basically amount to the same thing.

casting of pointers to integers can also be very helpful in memory management systems that make use of binning, ie: one would be able to easily find the bin/page for an address via some math, an example from a lockless allocator I wrote a while back:

inline Page* GetPage(void* pMemory)
{
return &pPages[((UINT_PTR)pMemory - (UINT_PTR)pReserve) >> nPageShift];
}

The most useful case in my mind is the one that actually has the potential to make programs much more efficient: a number of standard and common library interfaces take a single void * argument which they will pass back to a callback function of some sort. Suppose your callback doesn't need any large amount of data, just a single integer argument.

If the callback will happen before the function returns, you can simply pass the address of a local (automatic) int variable, and all is well. But the best real-world example for this situation is pthread_create, where the "callback" runs in parallel and you have no guarantee that it will be able to read the argument through the pointer before pthread_create returns. In this situation, you have 3 options:

  1. malloc a single int and have the new thread read and free it.
  2. Pass a pointer to a caller-local struct containing the int and a synchronization object (e.g. a semaphore or a barrier) and have the caller wait on it after calling pthread_create.
  3. Cast the int to void * and pass it by value.

Option 3 is immensely more efficient than either of the other choices, both of which involve an extra synchronization step (for option 1, the synchronization is in malloc/free, and will almost certainly involve some cost since the allocating and freeing thread are not the same).

Storing a doubly linked list using half the space

A XOR Linked List combines the next and prev pointers into a single value of the same size. It does this by xor-ing the two pointers together, which requires treating them like integers.

It's very common in embedded systems to access memory-mapped hardware devices where the registers are at fixed addresses in the memory map. I often model hardware differently in C vs. C++ (with C++ you can take advantage of classes and templates), but the general idea can be used for both.

A quick example: suppose you have a timer peripheral in hardware, and it has 2 32-bit registers:

  • a free-running "tick count" register, which decrements at a fixed rate (e.g. every microsecond)

  • a control register, which allows you to start the timer, stop the timer, enable a timer interrupt when we decrement the count to zero, etc.

(Note that a real timer peripheral is usually significantly more complicated).

Each of these registers are 32-bit values, and the "base address" of the timer peripheral is 0xFFFF.0000. You could model the hardware as follows:

// Treat these HW regs as volatile
typedef uint32_t volatile hw_reg;


// C friendly, hence the typedef
typedef struct
{
hw_reg TimerCount;
hw_reg TimerControl;
} TIMER;


// Cast the integer 0xFFFF0000 as being the base address of a timer peripheral.
#define Timer1 ((TIMER *)0xFFFF0000)


// Read the current timer tick value.
// e.g. read the 32-bit value @ 0xFFFF.0000
uint32_t CurrentTicks = Timer1->TimerCount;


// Stop / reset the timer.
// e.g. write the value 0 to the 32-bit location @ 0xFFFF.0004
Timer1->TimerControl = 0;

There are 100 variations on this approach, the pros and cons of which can be debated forever, but the point here is only to illustrate a common use of casting an integer to a pointer. Note that this code isn't portable, is tied to a specific device, assumes the memory region is not off-limits, etc.

I've used such systems when I'm trying to walk byte-by-byte through an array. Often times, the pointer will walk multiple bytes at a time, which causes problems that are very difficult to diagnose.

For example, int pointers:

int* my_pointer;

moving my_pointer++ will result in advancing 4 bytes (in a standard 32-bit system). However, moving ((int)my_pointer)++ will advance it one byte.

It's really the only way to accomplish it, other than casting your pointer to a (char*). ((char*)my_pointer)++

Admittedly, the (char*) is my usual method since it makes more sense.

Pointer values can also be a useful source of entropy for seeding a random number generator:

int* p = new int();
seed(intptr_t(p) ^ *p);
delete p;

The boost UUID library uses this trick, and some others.

There is an old and good tradition to use pointer to an object as a typeless handle. For instance, some people use it for implementing interaction between two C++ units with flat C-style API. In that case, handle type is defined as one of integer types and any method have to convert a pointer into an integer before it can be transfered to another method that expects an abstract typeless handle as one of its parameter. In addition, sometimes there is no other way to break up a circular dependency.