熊猫: 使用范围内的随机整数在 df 中创建新列

我有一个熊猫数据框架与50k 行。我试图添加一个新列,它是一个从1到5随机生成的整数。

如果我想要5万个随机数,我会用:

df1['randNumCol'] = random.sample(xrange(50000), len(df1))

但我不知道该怎么做。

顺便说一句,我会这么做:

sample(1:5, 50000, replace = TRUE)

有什么建议吗?

130121 次浏览

One solution is to use numpy.random.randint:

import numpy as np
df1['randNumCol'] = np.random.randint(1, 6, df1.shape[0])

Or if the numbers are non-consecutive (albeit slower), you can use this:

df1['randNumCol'] = np.random.choice([1, 9, 20], df1.shape[0])

In order to make the results reproducible you can set the seed with numpy.random.seed (e.g. np.random.seed(42))

To add a column of random integers, use randint(low, high, size). There's no need to waste memory allocating range(low, high); that could be a lot of memory if high is large.

df1['randNumCol'] = np.random.randint(0,5, size=len(df1))

Notes:

An option that doesn't require an additional import for numpy:

df1['randNumCol'] = pd.Series(range(1,6)).sample(int(5e4), replace=True).array