random.choice的加权版本

小开

def weighted_choice(choices):
total = sum(w for c, w in choices)
r = random.uniform(0, total)
upto = 0
for c, w in choices:
if upto + w >= r:
return c
upto += w
assert False, "Shouldn't get here"

小开

粗糙的，但可能足够:

import random
weighted_choice = lambda s : random.choice(sum(([v]*wt for v,wt in s),[]))

这有用吗?

# define choices and relative weights
choices = [("WHITE",90), ("RED",8), ("GREEN",2)]


# initialize tally dict
tally = dict.fromkeys(choices, 0)


# tally up 1000 weighted choices
for i in xrange(1000):
tally[weighted_choice(choices)] += 1


print tally.items()

打印:

[('WHITE', 904), ('GREEN', 22), ('RED', 74)]

假设所有权重都是整数。它们的和不一定是100，我这么做只是为了让测试结果更容易理解。(如果权重是浮点数，则将它们都乘以10，直到所有权重>= 1。)

weights = [.6, .2, .001, .199]
while any(w < 1.0 for w in weights):
weights = [w*10 for w in weights]
weights = map(int, weights)

小开

将权重排列为a 李累积分布。< / >
使用random.random ()随机选择一个 0.0 <= x < total浮动。李< / > <李>搜索使用bisect.bisect as分发如http://docs.python.org/dev/library/bisect.html#other-examples所示。

from random import random
from bisect import bisect


def weighted_choice(choices):
values, weights = zip(*choices)
total = 0
cum_weights = []
for w in weights:
total += w
cum_weights.append(total)
x = random() * total
i = bisect(cum_weights, x)
return values[i]


>>> weighted_choice([("WHITE",90), ("RED",8), ("GREEN",2)])
'WHITE'

如果需要做出多个选择，可以将其分成两个函数，一个用于构建累积权重，另一个用于对随机点进行等分。

小开

我看了指向的其他线程，并在我的编码风格中提出了这种变化，这返回了用于计数的索引，但返回字符串很简单(注释返回替代):

import random
import bisect


try:
range = xrange
except:
pass


def weighted_choice(choices):
total, cumulative = 0, []
for c,w in choices:
total += w
cumulative.append((total, c))
r = random.uniform(0, total)
# return index
return bisect.bisect(cumulative, (r,))
# return item string
#return choices[bisect.bisect(cumulative, (r,))][0]


# define choices and relative weights
choices = [("WHITE",90), ("RED",8), ("GREEN",2)]


tally = [0 for item in choices]


n = 100000
# tally up n weighted choices
for i in range(n):
tally[weighted_choice(choices)] += 1


print([t/sum(tally)*100 for t in tally])

小开

如果你有一个加权字典而不是一个列表，你可以这样写

items = { "a": 10, "b": 5, "c": 1 }
random.choice([k for k in items for dummy in range(items[k])])

注意，[k for k in items for dummy in range(items[k])]生成了这个列表['a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'c', 'b', 'b', 'b', 'b', 'b']

小开

如果你不介意使用numpy，你可以使用numpy.random.choice。

例如:

import numpy


items  = [["item1", 0.2], ["item2", 0.3], ["item3", 0.45], ["item4", 0.05]
elems = [i[0] for i in items]
probs = [i[1] for i in items]


trials = 1000
results = [0] * len(items)
for i in range(trials):
res = numpy.random.choice(items, p=probs)  #This is where the item is selected!
results[items.index(res)] += 1
results = [r / float(trials) for r in results]
print "item\texpected\tactual"
for i in range(len(probs)):
print "%s\t%0.4f\t%0.4f" % (items[i], probs[i], results[i])

如果你知道你需要提前做多少选择，你可以不像这样循环:

numpy.random.choice(items, trials, p=probs)

小开

通解:

import random
def weighted_choice(choices, weights):
total = sum(weights)
treshold = random.uniform(0, total)
for k, weight in enumerate(weights):
total -= weight
if total < treshold:
return choices[k]

小开

下面是使用numpy的另一个版本的weighted_choice。传入weights向量，它将返回一个由0组成的数组，其中包含一个1，表示所选择的bin。该代码默认只进行一次绘制，但您可以传入绘制的数量，并且将返回每个绘制的bin的计数。

如果权重向量的和不等于1，它将被规范化，使之等于1。

import numpy as np


def weighted_choice(weights, n=1):
if np.sum(weights)!=1:
weights = weights/np.sum(weights)


draws = np.random.random_sample(size=n)


weights = np.cumsum(weights)
weights = np.insert(weights,0,0.0)


counts = np.histogram(draws, bins=weights)
return(counts[0])

小开

import numpy as np
w=np.array([ 0.4,  0.8,  1.6,  0.8,  0.4])
np.random.choice(w, p=w/sum(w))

小开

最佳答案

从1.7.0版本开始，NumPy有一个支持概率分布的choice函数。

from numpy.random import choice
draw = choice(list_of_candidates, number_of_items_to_pick,
p=probability_distribution)

注意，probability_distribution是一个与list_of_candidates顺序相同的序列。你也可以使用关键字replace=False来改变行为，这样绘制的项就不会被替换。

小开

我可能已经来不及提供任何有用的东西了，但这里有一个简单，简短，非常有效的片段:

def choose_index(probabilies):
cmf = probabilies[0]
choice = random.random()
for k in xrange(len(probabilies)):
if choice <= cmf:
return k
else:
cmf += probabilies[k+1]

不需要排序你的概率或用你的cmf创建一个向量，它一旦找到它的选择就会终止。内存:O(1)，时间:O(N)，平均运行时间~ N/2。

如果你有权重，只需添加一行:

def choose_index(weights):
probabilities = weights / sum(weights)
cmf = probabilies[0]
choice = random.random()
for k in xrange(len(probabilies)):
if choice <= cmf:
return k
else:
cmf += probabilies[k+1]

小开

如果你的加权选项列表是相对静态的，并且你想要频繁采样，你可以做一个O(N)预处理步骤，然后使用这是相关的答案中的函数在O(1)中进行选择。

# run only when `choices` changes.
preprocessed_data = prep(weight for _,weight in choices)


# O(1) selection
value = choices[sample(preprocessed_data)][0]

小开

自Python 3.6起，random模块中有一个方法choices。

In [1]: import random


In [2]: random.choices(
...:     population=[['a','b'], ['b','a'], ['c','b']],
...:     weights=[0.2, 0.2, 0.6],
...:     k=10
...: )


Out[2]:
[['c', 'b'],
['c', 'b'],
['b', 'a'],
['c', 'b'],
['c', 'b'],
['b', 'a'],
['c', 'b'],
['b', 'a'],
['c', 'b'],
['c', 'b']]

注意，random.choices将根据文档对与更换进行抽样:

返回一个k大小的元素列表，包含从替换填充中选择的元素。

为确保回答的完整性，请注意:

当从有限总体中抽取抽样单位并返回时对于该种群，在其特征被记录下来之后，在下一个单位被绘制之前，抽样被称为“与” replacement"。它基本上意味着每个元素可以被选择多于一次。< / p >

如果你需要在不替换的情况下进行采样，那么正如@ronan-paixão的精彩回答所示，你可以使用numpy.choice，它的replace参数控制这种行为。

小开

下面是Python 3.6标准库中包含的版本:

import itertools as _itertools
import bisect as _bisect


class Random36(random.Random):
"Show the code included in the Python 3.6 version of the Random class"


def choices(self, population, weights=None, *, cum_weights=None, k=1):
"""Return a k sized list of population elements chosen with replacement.


If the relative weights or cumulative weights are not specified,
the selections are made with equal probability.


"""
random = self.random
if cum_weights is None:
if weights is None:
_int = int
total = len(population)
return [population[_int(random() * total)] for i in range(k)]
cum_weights = list(_itertools.accumulate(weights))
elif weights is not None:
raise TypeError('Cannot specify both weights and cumulative weights')
if len(cum_weights) != len(population):
raise ValueError('The number of weights does not match the population')
bisect = _bisect.bisect
total = cum_weights[-1]
return [population[bisect(cum_weights, random() * total)] for i in range(k)]

来源:https://hg.python.org/cpython/file/tip/Lib/random.py#l340

小开

从Python v3.6开始，random.choices可用于从给定的具有可选权重的填充中返回指定大小的元素的list。

random.choices(population, weights=None, *, cum_weights=None, k=1)

population:包含唯一观测值的list。(如果为空，则引发IndexError)

weights:更精确地进行选择所需的相对权重。

cum_weights:进行选择所需的累积权重。

k:要输出的list的大小(len)。(默认len()=1)

< em >几个事项:< / em >

1)利用加权抽样与替换，使绘制的项目以后可以被替换。权重序列中的值本身并不重要，但它们的相对比例却很重要。

与np.random.choice不同，np.random.choice只能将概率作为权重，并且必须确保个人概率的总和达到1个标准，这里没有这样的规定。只要它们属于数值类型(int/float/fraction类型除外Decimal类型)，它们仍然可以执行。

>>> import random
# weights being integers
>>> random.choices(["white", "green", "red"], [12, 12, 4], k=10)
['green', 'red', 'green', 'white', 'white', 'white', 'green', 'white', 'red', 'white']
# weights being floats
>>> random.choices(["white", "green", "red"], [.12, .12, .04], k=10)
['white', 'white', 'green', 'green', 'red', 'red', 'white', 'green', 'white', 'green']
# weights being fractions
>>> random.choices(["white", "green", "red"], [12/100, 12/100, 4/100], k=10)
['green', 'green', 'white', 'red', 'green', 'red', 'white', 'green', 'green', 'green']

2)如果既没有指定权重也没有指定cum_weights，选择的概率是相等的。如果提供了权重序列，则其长度必须与人口序列相同。

同时指定权重和cum_weights会引发TypeError。

>>> random.choices(["white", "green", "red"], k=10)
['white', 'white', 'green', 'red', 'red', 'red', 'white', 'white', 'white', 'green']

3) cum_weights通常是itertools.accumulate函数的结果，在这种情况下非常方便。

_{从文档链接:}
在内部，相对权重被转换为累积权重在进行选择之前，提供累计权重可以节省工作。< / p >

因此，为我们所设计的情况提供weights=[12, 12, 4]或cum_weights=[12, 24, 28]都会产生相同的结果，并且后者似乎更快/更有效。

小开

这取决于你想对分布进行多少次抽样。

假设要对分布进行K次抽样。然后，当n是分布中的项数时，每次使用np.random.choice()的时间复杂度为O(K(n + log(n)))。

在我的例子中，我需要对相同的分布进行多次采样，阶数为10^3其中n阶数为10^6。我使用了下面的代码，它预先计算累积分布并在O(log(n))中对其进行抽样。总体时间复杂度为O(n+K*log(n))。

import numpy as np


n,k = 10**6,10**3


# Create dummy distribution
a = np.array([i+1 for i in range(n)])
p = np.array([1.0/n]*n)


cfd = p.cumsum()
for _ in range(k):
x = np.random.uniform()
idx = cfd.searchsorted(x, side='right')
sampled_element = a[idx]

小开

一种方法是随机化所有权重的总和，然后使用这些值作为每个变量的极限点。以下是作为生成器的粗略实现。

def rand_weighted(weights):
"""
Generator which uses the weights to generate a
weighted random values
"""
sum_weights = sum(weights.values())
cum_weights = {}
current_weight = 0
for key, value in sorted(weights.iteritems()):
current_weight += value
cum_weights[key] = current_weight
while True:
sel = int(random.uniform(0, 1) * sum_weights)
for key, value in sorted(cum_weights.iteritems()):
if sel < value:
break
yield key

小开

使用numpy

def choice(items, weights):
return items[np.argmin((np.cumsum(weights) / sum(weights)) < np.random.rand())]

小开

我需要做这样的事情非常快速非常简单，从搜索的想法，我终于建立了这个模板。其思想是以json的形式从api接收加权值，这里是由dict模拟的。

然后将其转换为一个列表，其中每个值都与它的权重成比例地重复，只需使用random。选择从列表中选择一个值。

我尝试了10次、100次和1000次迭代。分布似乎很稳定。

def weighted_choice(weighted_dict):
"""Input example: dict(apples=60, oranges=30, pineapples=10)"""
weight_list = []
for key in weighted_dict.keys():
weight_list += [key] * weighted_dict[key]
return random.choice(weight_list)

小开

我不喜欢它们的语法。我只想具体说明这些项目是什么以及每项的权重是多少。我意识到我本可以使用random.choices，但我很快就在下面写了这个类。

import random, string
from numpy import cumsum


class randomChoiceWithProportions:
'''
Accepts a dictionary of choices as keys and weights as values. Example if you want a unfair dice:




choiceWeightDic = {"1":0.16666666666666666, "2": 0.16666666666666666, "3": 0.16666666666666666
, "4": 0.16666666666666666, "5": .06666666666666666, "6": 0.26666666666666666}
dice = randomChoiceWithProportions(choiceWeightDic)


samples = []
for i in range(100000):
samples.append(dice.sample())


# Should be close to .26666
samples.count("6")/len(samples)


# Should be close to .16666
samples.count("1")/len(samples)
'''
def __init__(self, choiceWeightDic):
self.choiceWeightDic = choiceWeightDic
weightSum = sum(self.choiceWeightDic.values())
assert weightSum == 1, 'Weights sum to ' + str(weightSum) + ', not 1.'
self.valWeightDict = self._compute_valWeights()


def _compute_valWeights(self):
valWeights = list(cumsum(list(self.choiceWeightDic.values())))
valWeightDict = dict(zip(list(self.choiceWeightDic.keys()), valWeights))
return valWeightDict


def sample(self):
num = random.uniform(0,1)
for key, val in self.valWeightDict.items():
if val >= num:
return key

小开

为random.choice()提供一个预先加权的列表:

解决方案,测试:

import random


options = ['a', 'b', 'c', 'd']
weights = [1, 2, 5, 2]


weighted_options = [[opt]*wgt for opt, wgt in zip(options, weights)]
weighted_options = [opt for sublist in weighted_options for opt in sublist]
print(weighted_options)


# test


counts = {c: 0 for c in options}
for x in range(10000):
counts[random.choice(weighted_options)] += 1


for opt, wgt in zip(options, weights):
wgt_r = counts[opt] / 10000 * sum(weights)
print(opt, counts[opt], wgt, wgt_r)

输出:

['a', 'b', 'b', 'c', 'c', 'c', 'c', 'c', 'd', 'd']
a 1025 1 1.025
b 1948 2 1.948
c 5019 5 5.019
d 2008 2 2.008

小开

另一种方法是，假设我们的权重与元素数组中的元素的下标相同。

import numpy as np
weights = [0.1, 0.3, 0.5] #weights for the item at index 0,1,2
# sum of weights should be <=1, you can also divide each weight by sum of all weights to standardise it to <=1 constraint.
trials = 1 #number of trials
num_item = 1 #number of items that can be picked in each trial
selected_item_arr = np.random.multinomial(num_item, weights, trials)
# gives number of times an item was selected at a particular index
# this assumes selection with replacement
# one possible output
# selected_item_arr
# array([[0, 0, 1]])
# say if trials = 5, the the possible output could be
# selected_item_arr
# array([[1, 0, 0],
#   [0, 0, 1],
#   [0, 0, 1],
#   [0, 1, 0],
#   [0, 0, 1]])

现在我们假设，我们要在一次试验中抽取3个项目。你可以假设有三个球R、G、B大量存在，它们的权重由权重数组给定，可能的结果如下:

num_item = 3
trials = 1
selected_item_arr = np.random.multinomial(num_item, weights, trials)
# selected_item_arr can give output like :
# array([[1, 0, 2]])

您还可以将要选择的项目数量视为一组中二项/多项试验的数量。所以，上面的例子仍然可以作为工作

num_binomial_trial = 5
weights = [0.1,0.9] #say an unfair coin weights for H/T
num_experiment_set = 1
selected_item_arr = np.random.multinomial(num_binomial_trial, weights, num_experiment_set)
# possible output
# selected_item_arr
# array([[1, 4]])
# i.e H came 1 time and T came 4 times in 5 binomial trials. And one set contains 5 binomial trails.

小开

在Udacity的免费课程AI for Robotics中，Sebastien Thurn对此进行了演讲。基本上，他使用mod操作符%创建了一个索引权重的圆形数组，将变量beta设置为0，随机选择一个索引， for循环遍历N，其中N是索引的数量，在for循环中，beta首先按公式递增:

Beta = Beta +来自{0…2 * Weight_max}

然后在for循环中嵌套一个while循环per:

while w[index] < beta:
beta = beta - w[index]
index = index + 1


select p[index]

然后到下一个索引，根据概率(或课程中介绍的情况下的归一化概率)重新采样。

在Udacity上找到第8课，机器人人工智能的第21期视频，他正在讲粒子滤波器。

小开

如果你碰巧有Python 3，并且害怕安装numpy或编写自己的循环，你可以这样做:

import itertools, bisect, random


def weighted_choice(choices):
weights = list(zip(*choices))[1]
return choices[bisect.bisect(list(itertools.accumulate(weights)),
random.uniform(0, sum(weights)))][0]

因为你可以用一袋管道适配器构建任何东西 !尽管……我必须承认，尼德的回答虽然稍长一些，但比较容易理解。

小开

加权选择的一个非常基本和简单的方法如下:

np.random.choice(['A', 'B', 'C'], p=[0.3, 0.4, 0.3])

小开

如果你没有提前定义你想要选择多少项(所以，你不做类似k=10的事情)，你只有概率，你可以做下面的事情。注意，你的概率加起来不需要等于1，它们可以相互独立:

soup_items = ['pepper', 'onion', 'tomato', 'celery']
items_probability = [0.2, 0.3, 0.9, 0.1]


selected_items = [item for item,p in zip(soup_items,items_probability) if random.random()<p]
print(selected_items)
>>>['pepper','tomato']

小开

生成你感兴趣的CDF F

步骤2:生成u.r.v. u

步骤3:求z=F^{-1}(u)

这种建模在概率论或随机过程课程中有描述。这是适用的，因为您有简单的CDF。

小开

假设你有

items = [11, 23, 43, 91]
probability = [0.2, 0.3, 0.4, 0.1]

，你有一个函数，它生成一个介于[0,1)之间的随机数(我们可以在这里使用random.random())。现在取概率的前缀和

prefix_probability=[0.2,0.5,0.9,1]

现在，我们只需取一个0-1之间的随机数，然后使用二分搜索来查找该数字在prefix_probability中的位置。这个索引就是你的答案

代码是这样的

return items[bisect.bisect(prefix_probability,random.random())]