max method:
10 loops, best of 3: 239 ms per loop
multiplication method:
10 loops, best of 3: 145 ms per loop
abs method:
10 loops, best of 3: 288 ms per loop
EDIT As jirassimok has mentioned below my function will change the data in place, after that it runs a lot faster in timeit. This causes the good results. It's some kind of cheating. Sorry for your inconvenience.
I found a faster method for ReLU with numpy. You can use the fancy index feature of numpy as well.
fancy index:
20.3 ms ± 272 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
import numpy as np
def baseline():
x = np.random.random((5000, 5000)) - 0.5
return x
def relu_mul():
x = np.random.random((5000, 5000)) - 0.5
out = x * (x > 0)
return out
def relu_max():
x = np.random.random((5000, 5000)) - 0.5
out = np.maximum(x, 0)
return out
def relu_max_inplace():
x = np.random.random((5000, 5000)) - 0.5
np.maximum(x, 0, x)
return x
baseline:
10 loops, best of 3: 425 ms per loop
multiplication method:
10 loops, best of 3: 596 ms per loop
max method:
10 loops, best of 3: 682 ms per loop
max inplace method:
10 loops, best of 3: 602 ms per loop
In-place maximum method is only a bit faster than the maximum method, and it may because it omits the variable assignment for 'out'. And it's still slower than the multiplication method.
And since you're implementing the ReLU func. You may have to save the 'x' for backprop through relu. E.g.: