如何创建一个没有重复的随机数列表

我试着用 random.randint(0, 100),但有些数字是一样的。是否有创建列表唯一随机数的方法/模块?

注意: 下面的代码是 基于一个答案,是在发布答案之后添加的。这不是问题的一部分,而是解决方案。

def getScores():
# open files to read and write
f1 = open("page.txt", "r");
p1 = open("pgRes.txt", "a");


gScores = [];
bScores = [];
yScores = [];


# run 50 tests of 40 random queries to implement "bootstrapping" method
for i in range(50):
# get 40 random queries from the 50
lines = random.sample(f1.readlines(), 40);
326884 次浏览

这将返回从0到99的范围内选择的10个数字的列表,没有重复。

import random
random.sample(range(100), 10)

对于特定的代码示例,您可能希望读取文件 一次中的所有行,然后从内存中保存的列表中选择随机行。例如:

all_lines = f1.readlines()
for i in range(50):
lines = random.sample(all_lines, 40)

这样,您只需要在循环之前从文件中实际读取一次。这样做要比寻找文件的开头并在每次循环迭代时再次调用 f1.readlines()有效得多。

如果从1到 N 的 N 个数字的列表是随机生成的,那么是的,有一些数字可能会被重复。

如果希望以随机顺序获得从1到 N 的数字列表,请用从1到 N 的整数填充数组,然后使用 Fisher-Yates 洗牌或 Python 的 random.shuffle()

您可以首先创建一个从 ab的数字列表,其中 ab分别是列表中最小和最大的数字,然后使用 Fisher-Yates算法或 Python 的 random.shuffle方法对其进行洗牌。

如果希望确保所添加的数字是唯一的,可以使用 设定目标

如果使用2.7或更高版本,或导入 set 模块。

正如其他人所提到的,这意味着这些数字并非真正的随机。

这个答案中提出的解决方案是可行的,但是如果样本量很小,但是总量很大(例如 random.sample(insanelyLargeNumber, 10)) ,那么它可能会对内存造成问题。

为了解决这个问题,我想说:

answer = set()
sampleSize = 10
answerSize = 0


while answerSize < sampleSize:
r = random.randint(0,100)
if r not in answer:
answerSize += 1
answer.add(r)


# answer now contains 10 unique, random integers from 0.. 100

您可以像下面这样使用 随机的模块中的 洗牌函数:

import random


my_list = list(xrange(1,100)) # list of integers from 1 to 99
# adjust this boundaries to fit your needs
random.shuffle(my_list)
print my_list # <- List of unique random numbers

请注意,shuffle 方法并不像人们期望的那样返回任何列表,它只是对通过引用传递的列表进行了 shuffle。

来自 win xp 中的 CLI:

python -c "import random; print(sorted(set([random.randint(6,49) for i in range(7)]))[:6])"

在加拿大,我们有6/49乐透。我只是将上面的代码包装在 lotto.bat 中,然后运行 C:\home\lotto.bat或者仅仅运行 C:\home\lotto

因为 random.randint经常重复一个数字,所以我将 setrange(7)一起使用,然后将其缩短为6。

偶尔,如果一个数字重复2次以上,得到的列表长度将小于6。

编辑: 然而,random.sample(range(6,49),6)是正确的方法。

如果您需要抽样非常大的数字,您不能使用 range

random.sample(range(10000000000000000000000000000000), 10)

因为它会抛出:

OverflowError: Python int too large to convert to C ssize_t

此外,如果 random.sample不能生产你想要的项目的数量,由于范围太小

 random.sample(range(2), 1000)

它抛出:

 ValueError: Sample larger than population

这个函数解决了以下两个问题:

import random


def random_sample(count, start, stop, step=1):
def gen_random():
while True:
yield random.randrange(start, stop, step)


def gen_n_unique(source, n):
seen = set()
seenadd = seen.add
for i in (i for i in source() if i not in seen and not seenadd(i)):
yield i
if len(seen) == n:
break


return [i for i in gen_n_unique(gen_random,
min(count, int(abs(stop - start) / abs(step))))]

极大数字的用法:

print('\n'.join(map(str, random_sample(10, 2, 10000000000000000000000000000000))))

样本结果:

7822019936001013053229712669368
6289033704329783896566642145909
2473484300603494430244265004275
5842266362922067540967510912174
6775107889200427514968714189847
9674137095837778645652621150351
9969632214348349234653730196586
1397846105816635294077965449171
3911263633583030536971422042360
9864578596169364050929858013943

范围小于请求项数的用法:

print(', '.join(map(str, random_sample(100000, 0, 3))))

样本结果:

2, 0, 1

它还适用于负值范围和步骤:

print(', '.join(map(str, random_sample(10, 10, -10, -2))))
print(', '.join(map(str, random_sample(10, 5, -5, -2))))

结果样本:

2, -8, 6, -2, -4, 0, 4, 10, -6, 8
-3, 1, 5, -1, 3

您可以使用 麻木库快速回答如下所示-

给定的代码片段列出了0到5之间的6个 独一无二数字

import numpy as np
import random
a = np.linspace( 0, 5, 6 )
random.shuffle(a)
print(a)

输出

[ 2.  1.  5.  3.  4.  0.]

它没有像我们在 random.sample中看到的 给你那样放置任何约束。

线性同余伪随机数发生器

O (1)记忆

O (k)行动

这个问题可以用一个简单的 线性同余方法来解决。这需要恒定的内存开销(8个整数)和最多2 * (序列长度)计算。

所有其他解决方案使用更多的内存和更多的计算!如果您只需要几个随机序列,那么这种方法将会便宜得多。对于大小范围的 N,如果你想生成的顺序为 N独特的 k序列或更多,我建议接受的解决方案使用内建方法 random.sample(range(N),k)作为这个 已经被优化了在 python 的速度。

密码

# Return a randomized "range" using a Linear Congruential Generator
# to produce the number sequence. Parameters are the same as for
# python builtin "range".
#   Memory  -- storage for 8 integers, regardless of parameters.
#   Compute -- at most 2*"maximum" steps required to generate sequence.
#
def random_range(start, stop=None, step=None):
import random, math
# Set a default values the same way "range" does.
if (stop == None): start, stop = 0, start
if (step == None): step = 1
# Use a mapping to convert a standard range into the desired range.
mapping = lambda i: (i*step) + start
# Compute the number of numbers in this range.
maximum = (stop - start) // step
# Seed range with a random integer.
value = random.randint(0,maximum)
#
# Construct an offset, multiplier, and modulus for a linear
# congruential generator. These generators are cyclic and
# non-repeating when they maintain the properties:
#
#   1) "modulus" and "offset" are relatively prime.
#   2) ["multiplier" - 1] is divisible by all prime factors of "modulus".
#   3) ["multiplier" - 1] is divisible by 4 if "modulus" is divisible by 4.
#
offset = random.randint(0,maximum) * 2 + 1      # Pick a random odd-valued offset.
multiplier = 4*(maximum//4) + 1                 # Pick a multiplier 1 greater than a multiple of 4.
modulus = int(2**math.ceil(math.log2(maximum))) # Pick a modulus just big enough to generate all numbers (power of 2).
# Track how many random numbers have been returned.
found = 0
while found < maximum:
# If this is a valid value, yield it in generator fashion.
if value < maximum:
found += 1
yield mapping(value)
# Calculate the next value in the sequence.
value = (value*multiplier + offset) % modulus

用法

这个函数“ Random _ range”的用法与任何生成器(如“ range”)的用法相同:

# Show off random range.
print()
for v in range(3,6):
v = 2**v
l = list(random_range(v))
print("Need",v,"found",len(set(l)),"(min,max)",(min(l),max(l)))
print("",l)
print()

样本结果

Required 8 cycles to generate a sequence of 8 values.
Need 8 found 8 (min,max) (0, 7)
[1, 0, 7, 6, 5, 4, 3, 2]


Required 16 cycles to generate a sequence of 9 values.
Need 9 found 9 (min,max) (0, 8)
[3, 5, 8, 7, 2, 6, 0, 1, 4]


Required 16 cycles to generate a sequence of 16 values.
Need 16 found 16 (min,max) (0, 15)
[5, 14, 11, 8, 3, 2, 13, 1, 0, 6, 9, 4, 7, 12, 10, 15]


Required 32 cycles to generate a sequence of 17 values.
Need 17 found 17 (min,max) (0, 16)
[12, 6, 16, 15, 10, 3, 14, 5, 11, 13, 0, 1, 4, 8, 7, 2, ...]


Required 32 cycles to generate a sequence of 32 values.
Need 32 found 32 (min,max) (0, 31)
[19, 15, 1, 6, 10, 7, 0, 28, 23, 24, 31, 17, 22, 20, 9, ...]


Required 64 cycles to generate a sequence of 33 values.
Need 33 found 33 (min,max) (0, 32)
[11, 13, 0, 8, 2, 9, 27, 6, 29, 16, 15, 10, 3, 14, 5, 24, ...]

答案提供了 给你在时间方面非常好的工作 以及内存,但有点复杂,因为它使用高级 Python 更简单的答案在实践中运作良好,但问题在于 答案是,它可能会在实际构造之前生成许多虚假的整数 尝试使用 popationSize = 1000,sampleSize = 999。 理论上,它有可能不会终止。

下面的答案解决了这两个问题,因为它是确定性的,而且在一定程度上是有效的 虽然目前效率不如另外两个。

def randomSample(populationSize, sampleSize):
populationStr = str(populationSize)
dTree, samples = {}, []
for i in range(sampleSize):
val, dTree = getElem(populationStr, dTree, '')
samples.append(int(val))
return samples, dTree

函数 getElem,percolateUp 的定义如下

import random


def getElem(populationStr, dTree, key):
msd  = int(populationStr[0])
if not key in dTree.keys():
dTree[key] = range(msd + 1)
idx = random.randint(0, len(dTree[key]) - 1)
key = key +  str(dTree[key][idx])
if len(populationStr) == 1:
dTree[key[:-1]].pop(idx)
return key, (percolateUp(dTree, key[:-1]))
newPopulation = populationStr[1:]
if int(key[-1]) != msd:
newPopulation = str(10**(len(newPopulation)) - 1)
return getElem(newPopulation, dTree, key)


def percolateUp(dTree, key):
while (dTree[key] == []):
dTree[key[:-1]].remove( int(key[-1]) )
key = key[:-1]
return dTree

最后,如下所示,对于 n 的大值,平均计时大约为15ms,

In [3]: n = 10000000000000000000000000000000


In [4]: %time l,t = randomSample(n, 5)
Wall time: 15 ms


In [5]: l
Out[5]:
[10000000000000000000000000000000L,
5731058186417515132221063394952L,
85813091721736310254927217189L,
6349042316505875821781301073204L,
2356846126709988590164624736328L]

一个非常简单的函数,也可以解决您的问题

from random import randint


data = []


def unique_rand(inicial, limit, total):


data = []


i = 0


while i < total:
number = randint(inicial, limit)
if number not in data:
data.append(number)
i += 1


return data




data = unique_rand(1, 60, 6)


print(data)




"""


prints something like


[34, 45, 2, 36, 25, 32]


"""

基于集合的方法(“如果返回值为随机值,请再试一次”)的问题在于,由于冲突(这需要另一次“再试一次”迭代) ,它们的运行时不确定,特别是当从范围返回大量随机值时。

另一种不容易出现这种非确定性运行时的方法是:

import bisect
import random


def fast_sample(low, high, num):
""" Samples :param num: integer numbers in range of
[:param low:, :param high:) without replacement
by maintaining a list of ranges of values that
are permitted.


This list of ranges is used to map a random number
of a contiguous a range (`r_n`) to a permissible
number `r` (from `ranges`).
"""
ranges = [high]
high_ = high - 1
while len(ranges) - 1 < num:
# generate a random number from an ever decreasing
# contiguous range (which we'll map to the true
# random number).
# consider an example with low=0, high=10,
# part way through this loop with:
#
# ranges = [0, 2, 3, 7, 9, 10]
#
# r_n :-> r
#   0 :-> 1
#   1 :-> 4
#   2 :-> 5
#   3 :-> 6
#   4 :-> 8
r_n = random.randint(low, high_)
range_index = bisect.bisect_left(ranges, r_n)
r = r_n + range_index
for i in xrange(range_index, len(ranges)):
if ranges[i] <= r:
# as many "gaps" we iterate over, as much
# is the true random value (`r`) shifted.
r = r_n + i + 1
elif ranges[i] > r_n:
break
# mark `r` as another "gap" of the original
# [low, high) range.
ranges.insert(i, r)
# Fewer values possible.
high_ -= 1
# `ranges` happens to contain the result.
return ranges[:-1]

编辑: 无视我的回答。使用 python 的 random.shufflerandom.sample,正如其他答案中提到的。

在“ minval”和“ maxval”之间不替换的情况下抽样整数:
import numpy as np


minval, maxval, n_samples = -50, 50, 10
generator = np.random.default_rng(seed=0)
samples = generator.permutation(np.arange(minval, maxval))[:n_samples]


# or, if minval is 0,
samples = generator.permutation(maxval)[:n_samples]

和 Jax 一起:

import jax


minval, maxval, n_samples = -50, 50, 10
key = jax.random.PRNGKey(seed=0)
samples = jax.random.shuffle(key, jax.numpy.arange(minval, maxval))[:n_samples]
import random


sourcelist=[]
resultlist=[]


for x in range(100):
sourcelist.append(x)


for y in sourcelist:
resultlist.insert(random.randint(0,len(resultlist)),y)


print (resultlist)

为了获得一个程序,生成一个不重复的随机值列表,这是确定性的,有效的,并建立了基本的编程构造考虑函数 extractSamples定义如下,

def extractSamples(populationSize, sampleSize, intervalLst) :
import random
if (sampleSize > populationSize) :
raise ValueError("sampleSize = "+str(sampleSize) +" > populationSize (= " + str(populationSize) + ")")
samples = []
while (len(samples) < sampleSize) :
i = random.randint(0, (len(intervalLst)-1))
(a,b) = intervalLst[i]
sample = random.randint(a,b)
if (a==b) :
intervalLst.pop(i)
elif (a == sample) : # shorten beginning of interval
intervalLst[i] = (sample+1, b)
elif ( sample == b) : # shorten interval end
intervalLst[i] = (a, sample - 1)
else :
intervalLst[i] = (a, sample - 1)
intervalLst.append((sample+1, b))
samples.append(sample)
return samples

基本思想是跟踪 intervalLst间隔的可能值,从中选择所需的元素。这是确定性的,因为我们可以保证在固定数量的步骤内生成样本(完全依赖于 populationSizesampleSize)。

要使用上面的函数生成所需的列表,

In [3]: populationSize, sampleSize = 10**17, 10**5


In [4]: %time lst1 = extractSamples(populationSize, sampleSize, [(0, populationSize-1)])
CPU times: user 289 ms, sys: 9.96 ms, total: 299 ms
Wall time: 293 ms


我们还可以与早期的解决方案进行比较(对于一个较低的 popationSize 值)

In [5]: populationSize, sampleSize = 10**8, 10**5


In [6]: %time lst = random.sample(range(populationSize), sampleSize)
CPU times: user 1.89 s, sys: 299 ms, total: 2.19 s
Wall time: 2.18 s


In [7]: %time lst1 = extractSamples(populationSize, sampleSize, [(0, populationSize-1)])
CPU times: user 449 ms, sys: 8.92 ms, total: 458 ms
Wall time: 442 ms

注意,我降低了 populationSize的值,因为当使用 random.sample解决方案时,它会为更高的值产生内存错误(在前面的答案 给你给你中也提到过)。对于以上值,我们还可以观察到 extractSamples优于 random.sample方法。

附注: 虽然核心方法类似于我的 早期答案,但是在实现和方法方面有很大的修改,并且在清晰度方面有所改进。

这是我做的一个非常小的函数,希望对你有所帮助!

import random
numbers = list(range(0, 100))
random.shuffle(numbers)

如果你想要的数字是随机的,你可以这样做。在这种情况下,长度是您希望从中选择的最高数字。

如果它注意到新的随机数已经被选中,它会从 count 中减去1(因为一个 count 在它知道它是否是一个重复之前就被添加了)。如果它不在列表中,然后做你想与它,并添加到列表中,使它不能被再次选中。

import random
def randomizer():
chosen_number=[]
count=0
user_input = int(input("Enter number for how many rows to randomly select: "))
numlist=[]
#length = whatever the highest number you want to choose from
while 1<=user_input<=length:
count=count+1
if count>user_input:
break
else:
chosen_number = random.randint(0, length)
if line_number in numlist:
count=count-1
continue
if chosen_number not in numlist:
numlist.append(chosen_number)
#do what you want here

我发现了一种比使用 range函数更快的方法(非常慢) ,而且不需要使用 python中的 random函数(我不喜欢 random内置库,因为当你播种时,它会重复随机数生成器的模式)

import numpy as np


nums = set(np.random.randint(low=0, high=100, size=150)) #generate some more for the duplicates
nums = list(nums)[:100]

这也太快了吧。

试试用..。

import random


LENGTH = 100


random_with_possible_duplicates = [random.randrange(-3, 3) for _ in range(LENGTH)]
random_without_duplicates = list(set(random_with_possible_duplicates)) # This removes duplicates

好处

快速,高效,易读。

可能出现的问题

如果存在重复,此方法可以更改列表的长度。

一个简单的替代方法是使用 np.Random. choice () ,如下所示

np.random.choice(range(10), size=3, replace=False)

这导致了三个不同的整数。例如,[1,3,5] ,[2,5,1] ..。