我如何添加一个额外的列到NumPy数组?

给定以下二维数组:

a = np.array([
[1, 2, 3],
[2, 3, 4],
])

我想在第二轴上加上一列0,得到:

b = np.array([
[1, 2, 3, 0],
[2, 3, 4, 0],
])
742746 次浏览

使用hstack的一种方法是:

b = np.hstack((a, np.zeros((a.shape[0], 1), dtype=a.dtype)))

我认为一个更直接的解决方案和更快的启动是做以下工作:

import numpy as np
N = 10
a = np.random.rand(N,N)
b = np.zeros((N,N+1))
b[:,:-1] = a

和时间:

In [23]: N = 10


In [24]: a = np.random.rand(N,N)


In [25]: %timeit b = np.hstack((a,np.zeros((a.shape[0],1))))
10000 loops, best of 3: 19.6 us per loop


In [27]: %timeit b = np.zeros((a.shape[0],a.shape[1]+1)); b[:,:-1] = a
100000 loops, best of 3: 5.62 us per loop

np.r_[ ... ]np.c_[ ... ]vstackhstack的有用替代品,
.使用方括号[]代替圆括号() 下面举几个例子:

: import numpy as np
: N = 3
: A = np.eye(N)


: np.c_[ A, np.ones(N) ]              # add a column
array([[ 1.,  0.,  0.,  1.],
[ 0.,  1.,  0.,  1.],
[ 0.,  0.,  1.,  1.]])


: np.c_[ np.ones(N), A, np.ones(N) ]  # or two
array([[ 1.,  1.,  0.,  0.,  1.],
[ 1.,  0.,  1.,  0.,  1.],
[ 1.,  0.,  0.,  1.,  1.]])


: np.r_[ A, [A[1]] ]              # add a row
array([[ 1.,  0.,  0.],
[ 0.,  1.,  0.],
[ 0.,  0.,  1.],
[ 0.,  1.,  0.]])
: # not np.r_[ A, A[1] ]


: np.r_[ A[0], 1, 2, 3, A[1] ]    # mix vecs and scalars
array([ 1.,  0.,  0.,  1.,  2.,  3.,  0.,  1.,  0.])


: np.r_[ A[0], [1, 2, 3], A[1] ]  # lists
array([ 1.,  0.,  0.,  1.,  2.,  3.,  0.,  1.,  0.])


: np.r_[ A[0], (1, 2, 3), A[1] ]  # tuples
array([ 1.,  0.,  0.,  1.,  2.,  3.,  0.,  1.,  0.])


: np.r_[ A[0], 1:4, A[1] ]        # same, 1:4 == arange(1,4) == 1,2,3
array([ 1.,  0.,  0.,  1.,  2.,  3.,  0.,  1.,  0.])

(使用方括号[]而不是圆括号()的原因 Python是否在square中展开了例如1:4 超载的奇迹)

我觉得下面这些最优雅:

b = np.insert(a, 3, values=0, axis=1) # Insert values before column 3

insert的一个优点是它还允许你在数组的其他地方插入列(或行)。此外,您可以轻松地插入整个向量,而不是插入单个值,例如复制最后一列:

b = np.insert(a, insert_index, values=a[:,2], axis=1)

这就导致:

array([[1, 2, 3, 3],
[2, 3, 4, 4]])

对于计时,insert可能比JoshAdel的解决方案慢:

In [1]: N = 10


In [2]: a = np.random.rand(N,N)


In [3]: %timeit b = np.hstack((a, np.zeros((a.shape[0], 1))))
100000 loops, best of 3: 7.5 µs per loop


In [4]: %timeit b = np.zeros((a.shape[0], a.shape[1]+1)); b[:,:-1] = a
100000 loops, best of 3: 2.17 µs per loop


In [5]: %timeit b = np.insert(a, 3, values=0, axis=1)
100000 loops, best of 3: 10.2 µs per loop

我认为:

np.column_stack((a, zeros(shape(a)[0])))

更优雅。

使用numpy.append:

>>> a = np.array([[1,2,3],[2,3,4]])
>>> a
array([[1, 2, 3],
[2, 3, 4]])


>>> z = np.zeros((2,1), dtype=int64)
>>> z
array([[0],
[0]])


>>> np.append(a, z, axis=1)
array([[1, 2, 3, 0],
[2, 3, 4, 0]])

我喜欢JoshAdel的回答,因为他关注的是表现。一个较小的性能改进是避免使用零进行初始化的开销,而这些初始化只会被覆盖。当N很大时,这有一个可测量的差异,用空代替零,零的列被写成一个单独的步骤:

In [1]: import numpy as np


In [2]: N = 10000


In [3]: a = np.ones((N,N))


In [4]: %timeit b = np.zeros((a.shape[0],a.shape[1]+1)); b[:,:-1] = a
1 loops, best of 3: 492 ms per loop


In [5]: %timeit b = np.empty((a.shape[0],a.shape[1]+1)); b[:,:-1] = a; b[:,-1] = np.zeros((a.shape[0],))
1 loops, best of 3: 407 ms per loop

有点晚了,但还没有人发布这个答案,所以为了完整起见:你可以在一个普通的Python数组上使用列表推导式来完成:

source = a.tolist()
result = [row + [0] for row in source]
b = np.array(result)

np.concatenate也可以工作

>>> a = np.array([[1,2,3],[2,3,4]])
>>> a
array([[1, 2, 3],
[2, 3, 4]])
>>> z = np.zeros((2,1))
>>> z
array([[ 0.],
[ 0.]])
>>> np.concatenate((a, z), axis=1)
array([[ 1.,  2.,  3.,  0.],
[ 2.,  3.,  4.,  0.]])

我也对这个问题感兴趣,比较了速度

numpy.c_[a, a]
numpy.stack([a, a]).T
numpy.vstack([a, a]).T
numpy.ascontiguousarray(numpy.stack([a, a]).T)
numpy.ascontiguousarray(numpy.vstack([a, a]).T)
numpy.column_stack([a, a])
numpy.concatenate([a[:,None], a[:,None]], axis=1)
numpy.concatenate([a[None], a[None]], axis=0).T

它们都对任何输入向量a做同样的事情。a增长的时间:

enter image description here

注意,所有非连续变量(特别是stack/vstack)最终都比所有连续变量快。column_stack(因为它的清晰度和速度)似乎是一个很好的选择,如果你需要连续。


代码重现情节:

import numpy as np
import perfplot


b = perfplot.bench(
setup=np.random.rand,
kernels=[
lambda a: np.c_[a, a],
lambda a: np.ascontiguousarray(np.stack([a, a]).T),
lambda a: np.ascontiguousarray(np.vstack([a, a]).T),
lambda a: np.column_stack([a, a]),
lambda a: np.concatenate([a[:, None], a[:, None]], axis=1),
lambda a: np.ascontiguousarray(np.concatenate([a[None], a[None]], axis=0).T),
lambda a: np.stack([a, a]).T,
lambda a: np.vstack([a, a]).T,
lambda a: np.concatenate([a[None], a[None]], axis=0).T,
],
labels=[
"c_",
"ascont(stack)",
"ascont(vstack)",
"column_stack",
"concat",
"ascont(concat)",
"stack (non-cont)",
"vstack (non-cont)",
"concat (non-cont)",
],
n_range=[2 ** k for k in range(23)],
xlabel="len(a)",
)
b.save("out.png")

假设M是一个(100,3)ndarray,而y是一个(100,)ndarray, append可以这样使用:

M=numpy.append(M,y[:,None],1)

诀窍在于使用

y[:, None]

这将y转换为(100,1)2D数组。

M.shape

现在给

(100, 4)

在我的例子中,我必须向NumPy数组中添加一列1

X = array([ 6.1101, 5.5277, ... ])
X.shape => (97,)
X = np.concatenate((np.ones((m,1), dtype=np.int), X.reshape(m,1)), axis=1)
< p >后 X.shape => (97,2)

. b
array([[ 1. , 6.1101],
[ 1. , 5.5277],
...

np.insert也有这个作用。

matA = np.array([[1,2,3],
[2,3,4]])
idx = 3
new_col = np.array([0, 0])
np.insert(matA, idx, new_col, axis=1)


array([[1, 2, 3, 0],
[2, 3, 4, 0]])

它沿着一个轴插入值,这里new_col,在给定索引之前,这里idx。换句话说,新插入的值将占据idx列,并将原本在idx列及其之后的值向后移动。

有一个专门的函数。它被称为numpy.pad

a = np.array([[1,2,3], [2,3,4]])
b = np.pad(a, ((0, 0), (0, 1)), mode='constant', constant_values=0)
print b
>>> array([[1, 2, 3, 0],
[2, 3, 4, 0]])

以下是它在文档字符串中所说的:

Pads an array.


Parameters
----------
array : array_like of rank N
Input array
pad_width : {sequence, array_like, int}
Number of values padded to the edges of each axis.
((before_1, after_1), ... (before_N, after_N)) unique pad widths
for each axis.
((before, after),) yields same before and after pad for each axis.
(pad,) or int is a shortcut for before = after = pad width for all
axes.
mode : str or function
One of the following string values or a user supplied function.


'constant'
Pads with a constant value.
'edge'
Pads with the edge values of array.
'linear_ramp'
Pads with the linear ramp between end_value and the
array edge value.
'maximum'
Pads with the maximum value of all or part of the
vector along each axis.
'mean'
Pads with the mean value of all or part of the
vector along each axis.
'median'
Pads with the median value of all or part of the
vector along each axis.
'minimum'
Pads with the minimum value of all or part of the
vector along each axis.
'reflect'
Pads with the reflection of the vector mirrored on
the first and last values of the vector along each
axis.
'symmetric'
Pads with the reflection of the vector mirrored
along the edge of the array.
'wrap'
Pads with the wrap of the vector along the axis.
The first values are used to pad the end and the
end values are used to pad the beginning.
<function>
Padding function, see Notes.
stat_length : sequence or int, optional
Used in 'maximum', 'mean', 'median', and 'minimum'.  Number of
values at edge of each axis used to calculate the statistic value.


((before_1, after_1), ... (before_N, after_N)) unique statistic
lengths for each axis.


((before, after),) yields same before and after statistic lengths
for each axis.


(stat_length,) or int is a shortcut for before = after = statistic
length for all axes.


Default is ``None``, to use the entire axis.
constant_values : sequence or int, optional
Used in 'constant'.  The values to set the padded values for each
axis.


((before_1, after_1), ... (before_N, after_N)) unique pad constants
for each axis.


((before, after),) yields same before and after constants for each
axis.


(constant,) or int is a shortcut for before = after = constant for
all axes.


Default is 0.
end_values : sequence or int, optional
Used in 'linear_ramp'.  The values used for the ending value of the
linear_ramp and that will form the edge of the padded array.


((before_1, after_1), ... (before_N, after_N)) unique end values
for each axis.


((before, after),) yields same before and after end values for each
axis.


(constant,) or int is a shortcut for before = after = end value for
all axes.


Default is 0.
reflect_type : {'even', 'odd'}, optional
Used in 'reflect', and 'symmetric'.  The 'even' style is the
default with an unaltered reflection around the edge value.  For
the 'odd' style, the extented part of the array is created by
subtracting the reflected values from two times the edge value.


Returns
-------
pad : ndarray
Padded array of rank equal to `array` with shape increased
according to `pad_width`.


Notes
-----
.. versionadded:: 1.7.0


For an array with rank greater than 1, some of the padding of later
axes is calculated from padding of previous axes.  This is easiest to
think about with a rank 2 array where the corners of the padded array
are calculated by using padded values from the first axis.


The padding function, if used, should return a rank 1 array equal in
length to the vector argument with padded values replaced. It has the
following signature::


padding_func(vector, iaxis_pad_width, iaxis, kwargs)


where


vector : ndarray
A rank 1 array already padded with zeros.  Padded values are
vector[:pad_tuple[0]] and vector[-pad_tuple[1]:].
iaxis_pad_width : tuple
A 2-tuple of ints, iaxis_pad_width[0] represents the number of
values padded at the beginning of vector where
iaxis_pad_width[1] represents the number of values padded at
the end of vector.
iaxis : int
The axis currently being calculated.
kwargs : dict
Any keyword arguments the function requires.


Examples
--------
>>> a = [1, 2, 3, 4, 5]
>>> np.pad(a, (2,3), 'constant', constant_values=(4, 6))
array([4, 4, 1, 2, 3, 4, 5, 6, 6, 6])


>>> np.pad(a, (2, 3), 'edge')
array([1, 1, 1, 2, 3, 4, 5, 5, 5, 5])


>>> np.pad(a, (2, 3), 'linear_ramp', end_values=(5, -4))
array([ 5,  3,  1,  2,  3,  4,  5,  2, -1, -4])


>>> np.pad(a, (2,), 'maximum')
array([5, 5, 1, 2, 3, 4, 5, 5, 5])


>>> np.pad(a, (2,), 'mean')
array([3, 3, 1, 2, 3, 4, 5, 3, 3])


>>> np.pad(a, (2,), 'median')
array([3, 3, 1, 2, 3, 4, 5, 3, 3])


>>> a = [[1, 2], [3, 4]]
>>> np.pad(a, ((3, 2), (2, 3)), 'minimum')
array([[1, 1, 1, 2, 1, 1, 1],
[1, 1, 1, 2, 1, 1, 1],
[1, 1, 1, 2, 1, 1, 1],
[1, 1, 1, 2, 1, 1, 1],
[3, 3, 3, 4, 3, 3, 3],
[1, 1, 1, 2, 1, 1, 1],
[1, 1, 1, 2, 1, 1, 1]])


>>> a = [1, 2, 3, 4, 5]
>>> np.pad(a, (2, 3), 'reflect')
array([3, 2, 1, 2, 3, 4, 5, 4, 3, 2])


>>> np.pad(a, (2, 3), 'reflect', reflect_type='odd')
array([-1,  0,  1,  2,  3,  4,  5,  6,  7,  8])


>>> np.pad(a, (2, 3), 'symmetric')
array([2, 1, 1, 2, 3, 4, 5, 5, 4, 3])


>>> np.pad(a, (2, 3), 'symmetric', reflect_type='odd')
array([0, 1, 1, 2, 3, 4, 5, 5, 6, 7])


>>> np.pad(a, (2, 3), 'wrap')
array([4, 5, 1, 2, 3, 4, 5, 1, 2, 3])


>>> def pad_with(vector, pad_width, iaxis, kwargs):
...     pad_value = kwargs.get('padder', 10)
...     vector[:pad_width[0]] = pad_value
...     vector[-pad_width[1]:] = pad_value
...     return vector
>>> a = np.arange(6)
>>> a = a.reshape((2, 3))
>>> np.pad(a, 2, pad_with)
array([[10, 10, 10, 10, 10, 10, 10],
[10, 10, 10, 10, 10, 10, 10],
[10, 10,  0,  1,  2, 10, 10],
[10, 10,  3,  4,  5, 10, 10],
[10, 10, 10, 10, 10, 10, 10],
[10, 10, 10, 10, 10, 10, 10]])
>>> np.pad(a, 2, pad_with, padder=100)
array([[100, 100, 100, 100, 100, 100, 100],
[100, 100, 100, 100, 100, 100, 100],
[100, 100,   0,   1,   2, 100, 100],
[100, 100,   3,   4,   5, 100, 100],
[100, 100, 100, 100, 100, 100, 100],
[100, 100, 100, 100, 100, 100, 100]])

对我来说,下一种方法看起来非常直观和简单。

zeros = np.zeros((2,1)) #2 is a number of rows in your array.
b = np.hstack((a, zeros))

在numpy数组中添加一个额外的列:

Numpy的np.append方法有三个参数,前两个是2D Numpy数组,第三个是一个轴参数,指示沿哪个轴追加:

import numpy as np
x = np.array([[1,2,3], [4,5,6]])
print("Original x:")
print(x)


y = np.array([[1], [1]])
print("Original y:")
print(y)


print("x appended to y on axis of 1:")
print(np.append(x, y, axis=1))

打印:

Original x:
[[1 2 3]
[4 5 6]]
Original y:
[[1]
[1]]
y appended to x on axis of 1:
[[1 2 3 1]
[4 5 6 1]]

我喜欢这个:

new_column = np.zeros((len(a), 1))
b = np.block([a, new_column])