如何将 numpy.linalg.范数应用到矩阵的每一行?

我有一个二维矩阵,我想取每一行的范数。但是当我直接使用 numpy.linalg.norm(X)时,它取整个矩阵的范数。

我可以通过使用 for 循环获取每行的范数,然后获取每个 X[i]的范数,但是这需要很长时间,因为我有30k 行。

有没有什么建议可以找到一个更快的方法? 或者有没有可能将 np.linalg.norm应用到矩阵的每一行?

83339 次浏览

Try the following:

In [16]: numpy.apply_along_axis(numpy.linalg.norm, 1, a)
Out[16]: array([ 5.38516481,  1.41421356,  5.38516481])

where a is your 2D array.

The above computes the L2 norm. For a different norm, you could use something like:

In [22]: numpy.apply_along_axis(lambda row:numpy.linalg.norm(row,ord=1), 1, a)
Out[22]: array([9, 2, 9])

For numpy 1.9+

Note that, as perimosocordiae shows, as of NumPy version 1.9, np.linalg.norm(x, axis=1) is the fastest way to compute the L2-norm.

For numpy < 1.9

If you are computing an L2-norm, you could compute it directly (using the axis=-1 argument to sum along rows):

np.sum(np.abs(x)**2,axis=-1)**(1./2)

Lp-norms can be computed similarly of course.

It is considerably faster than np.apply_along_axis, though perhaps not as convenient:

In [48]: %timeit np.apply_along_axis(np.linalg.norm, 1, x)
1000 loops, best of 3: 208 us per loop


In [49]: %timeit np.sum(np.abs(x)**2,axis=-1)**(1./2)
100000 loops, best of 3: 18.3 us per loop

Other ord forms of norm can be computed directly too (with similar speedups):

In [55]: %timeit np.apply_along_axis(lambda row:np.linalg.norm(row,ord=1), 1, x)
1000 loops, best of 3: 203 us per loop


In [54]: %timeit np.sum(abs(x), axis=-1)
100000 loops, best of 3: 10.9 us per loop

Resurrecting an old question due to a numpy update. As of the 1.9 release, numpy.linalg.norm now accepts an axis argument. [code, documentation]

This is the new fastest method in town:

In [10]: x = np.random.random((500,500))


In [11]: %timeit np.apply_along_axis(np.linalg.norm, 1, x)
10 loops, best of 3: 21 ms per loop


In [12]: %timeit np.sum(np.abs(x)**2,axis=-1)**(1./2)
100 loops, best of 3: 2.6 ms per loop


In [13]: %timeit np.linalg.norm(x, axis=1)
1000 loops, best of 3: 1.4 ms per loop

And to prove it's calculating the same thing:

In [14]: np.allclose(np.linalg.norm(x, axis=1), np.sum(np.abs(x)**2,axis=-1)**(1./2))
Out[14]: True

Much faster than the accepted answer is using NumPy's einsum,

numpy.sqrt(numpy.einsum('ij,ij->i', a, a))

And even faster than that is arranging the data such that the norms are computed across all columns,

numpy.sqrt(numpy.einsum('ij,ij->j', aT, aT))

Note the log-scale:

enter image description here


Code to reproduce the plot:

import numpy as np
import perfplot


rng = np.random.default_rng(0)




def setup(n):
x = rng.random((n, 3))
xt = np.ascontiguousarray(x.T)
return x, xt




def sum_sqrt(a, _):
return np.sqrt(np.sum(np.abs(a) ** 2, axis=-1))




def apply_norm_along_axis(a, _):
return np.apply_along_axis(np.linalg.norm, 1, a)




def norm_axis(a, _):
return np.linalg.norm(a, axis=1)




def einsum_sqrt(a, _):
return np.sqrt(np.einsum("ij,ij->i", a, a))




def einsum_sqrt_columns(_, aT):
return np.sqrt(np.einsum("ij,ij->j", aT, aT))




b = perfplot.bench(
setup=setup,
kernels=[
sum_sqrt,
apply_norm_along_axis,
norm_axis,
einsum_sqrt,
einsum_sqrt_columns,
],
n_range=[2**k for k in range(20)],
xlabel="len(a)",
)
b.show()
b.save("out.png")