Benefits of pure function

Today i was reading about pure function, got confused with its use:

A function is said to be pure if it returns same set of values for same set of inputs and does not have any observable side effects.

e.g. strlen() is a pure function while rand() is an impure one.

__attribute__ ((pure)) int fun(int i)
{
return i*i;
}


int main()
{
int i=10;
printf("%d",fun(i));//outputs 100
return 0;
}

http://ideone.com/33XJU

The above program behaves in the same way as in the absence of pure declaration.

What are the benefits of declaring a function as pure[if there is no change in output]?

8791 次浏览

pure lets the compiler know that it can make certain optimisations about the function: imagine a bit of code like

for (int i = 0; i < 1000; i++)
{
printf("%d", fun(10));
}

With a pure function, the compiler can know that it needs to evaluate fun(10) once and once only, rather than 1000 times. For a complex function, that's a big win.

When you say a function is 'pure' you are guaranteeing that it has no externally visible side-effects (and as a comment says, if you lie, bad things can happen). Knowing that a function is 'pure' has benefits for the compiler, which can use this knowledge to do certain optimizations.

Here is what the GCC documentation says about the pure attribute:

pure

Many functions have no effects except the return value and their return value depends only on the parameters and/or global variables. Such a function can be subject to common subexpression elimination and loop optimization just as an arithmetic operator would be. These functions should be declared with the attribute pure. For example,

          int square (int) __attribute__ ((pure));

Philip's answer already shows how knowing a function is 'pure' can help with loop optimizations.

Here is one for common sub-expression elimination (given foo is pure):

a = foo (99) * x + y;
b = foo (99) * x + z;

Can become:

_tmp = foo (99) * x;
a = _tmp + y;
b = _tmp + z;

In addition to possible run-time benefits, a pure function is much easier to reason about when reading code. Furthermore, it's much easier to test a pure function since you know that the return value only depends on the values of the parameters.

A non-pure function

int foo(int x, int y) // possible side-effects

is like an extension of a pure function

int bar(int x, int y) // guaranteed no side-effects

in which you have, besides the explicit function arguments x, y, the rest of the universe (or anything your computer can communicate with) as an implicit potential input. Likewise, besides the explicit integer return value, anything your computer can write to is implicitly part of the return value.

It should be clear why it is much easier to reason about a pure function than a non-pure one.

Just as an add-on, I would like to mention that C++11 codifies things somewhat using the constexpr keyword. Example:

#include <iostream>
#include <cstring>


constexpr unsigned static_strlen(const char * str, unsigned offset = 0) {
return (*str == '\0') ? offset : static_strlen(str + 1, offset + 1);
}


constexpr const char * str = "asdfjkl;";


constexpr unsigned len = static_strlen(str); //MUST be evaluated at compile time
//so, for example, this: int arr[len]; is legal, as len is a constant.


int main() {
std::cout << len << std::endl << std::strlen(str) << std::endl;
return 0;
}

The restrictions on the usage of constexpr make it so that the function is provably pure. This way, the compiler can more aggressively optimize (just make sure you use tail recursion, please!) and evaluate the function at compile time instead of run time.

So, to answer your question, is that if you're using C++ (I know you said C, but they are related), writing a pure function in the correct style allows the compiler to do all sorts of cool things with the function :-)

In general, Pure functions has 3 advantages over impure functions that the compiler can take advantage of:

Caching

Lets say that you have pure function f that is being called 100000 times, since it is deterministic and depends only on its parameters, the compiler can calculate its value once and use it when necessary

Parallelism

Pure functions don't read or write to any shared memory, and therefore can run in separate threads without any unexpected consequence

Passing By Reference

A function f(struct t) gets its argument t by value, and on the other hand, the compiler can pass t by reference to f if it is declared as pure while guaranteeing that the value of t will not change and have performance gains


In addition to the compile time considerations, pure functions can be tested fairly easy: just call them.

No need to construct objects or mock connections to DBs / file system.