Note that, as perimosocordiae shows, as of NumPy version 1.9, np.linalg.norm(x, axis=1) is the fastest way to compute the L2-norm.
For numpy < 1.9
If you are computing an L2-norm, you could compute it directly (using the axis=-1 argument to sum along rows):
np.sum(np.abs(x)**2,axis=-1)**(1./2)
Lp-norms can be computed similarly of course.
It is considerably faster than np.apply_along_axis, though perhaps not as convenient:
In [48]: %timeit np.apply_along_axis(np.linalg.norm, 1, x)
1000 loops, best of 3: 208 us per loop
In [49]: %timeit np.sum(np.abs(x)**2,axis=-1)**(1./2)
100000 loops, best of 3: 18.3 us per loop
Other ord forms of norm can be computed directly too (with similar speedups):
In [55]: %timeit np.apply_along_axis(lambda row:np.linalg.norm(row,ord=1), 1, x)
1000 loops, best of 3: 203 us per loop
In [54]: %timeit np.sum(abs(x), axis=-1)
100000 loops, best of 3: 10.9 us per loop
Resurrecting an old question due to a numpy update. As of the 1.9 release, numpy.linalg.norm now accepts an axis argument. [code, documentation]
This is the new fastest method in town:
In [10]: x = np.random.random((500,500))
In [11]: %timeit np.apply_along_axis(np.linalg.norm, 1, x)
10 loops, best of 3: 21 ms per loop
In [12]: %timeit np.sum(np.abs(x)**2,axis=-1)**(1./2)
100 loops, best of 3: 2.6 ms per loop
In [13]: %timeit np.linalg.norm(x, axis=1)
1000 loops, best of 3: 1.4 ms per loop
And to prove it's calculating the same thing:
In [14]: np.allclose(np.linalg.norm(x, axis=1), np.sum(np.abs(x)**2,axis=-1)**(1./2))
Out[14]: True