random.choice的加权版本

我需要写一个加权版的random。选择(列表中的每个元素有不同的被选择的概率)。这是我想到的:

def weightedChoice(choices):
"""Like random.choice, but each element can have a different chance of
being selected.


choices can be any iterable containing iterables with two items each.
Technically, they can have more than two items, the rest will just be
ignored.  The first item is the thing being chosen, the second item is
its weight.  The weights can be any numeric values, what matters is the
relative differences between them.
"""
space = {}
current = 0
for choice, weight in choices:
if weight > 0:
space[current] = choice
current += weight
rand = random.uniform(0, current)
for key in sorted(space.keys() + [current]):
if rand < key:
return choice
choice = space[key]
return None

这个函数对我来说太复杂了,而且很丑。我希望这里的每个人都能提供一些改进的建议或其他方法。对我来说,效率没有代码的整洁和可读性重要。

307450 次浏览
def weighted_choice(choices):
total = sum(w for c, w in choices)
r = random.uniform(0, total)
upto = 0
for c, w in choices:
if upto + w >= r:
return c
upto += w
assert False, "Shouldn't get here"

粗糙的,但可能足够:

import random
weighted_choice = lambda s : random.choice(sum(([v]*wt for v,wt in s),[]))

这有用吗?

# define choices and relative weights
choices = [("WHITE",90), ("RED",8), ("GREEN",2)]


# initialize tally dict
tally = dict.fromkeys(choices, 0)


# tally up 1000 weighted choices
for i in xrange(1000):
tally[weighted_choice(choices)] += 1


print tally.items()

打印:

[('WHITE', 904), ('GREEN', 22), ('RED', 74)]

假设所有权重都是整数。它们的和不一定是100,我这么做只是为了让测试结果更容易理解。(如果权重是浮点数,则将它们都乘以10,直到所有权重>= 1。)

weights = [.6, .2, .001, .199]
while any(w < 1.0 for w in weights):
weights = [w*10 for w in weights]
weights = map(int, weights)
  1. 将权重排列为a 李累积分布。< / >
  2. 使用random.random ()随机选择一个 0.0 <= x < total浮动。李< / > <李>搜索 使用bisect.bisect as分发 如http://docs.python.org/dev/library/bisect.html#other-examples所示。
from random import random
from bisect import bisect


def weighted_choice(choices):
values, weights = zip(*choices)
total = 0
cum_weights = []
for w in weights:
total += w
cum_weights.append(total)
x = random() * total
i = bisect(cum_weights, x)
return values[i]


>>> weighted_choice([("WHITE",90), ("RED",8), ("GREEN",2)])
'WHITE'

如果需要做出多个选择,可以将其分成两个函数,一个用于构建累积权重,另一个用于对随机点进行等分。

我看了指向的其他线程,并在我的编码风格中提出了这种变化,这返回了用于计数的索引,但返回字符串很简单(注释返回替代):

import random
import bisect


try:
range = xrange
except:
pass


def weighted_choice(choices):
total, cumulative = 0, []
for c,w in choices:
total += w
cumulative.append((total, c))
r = random.uniform(0, total)
# return index
return bisect.bisect(cumulative, (r,))
# return item string
#return choices[bisect.bisect(cumulative, (r,))][0]


# define choices and relative weights
choices = [("WHITE",90), ("RED",8), ("GREEN",2)]


tally = [0 for item in choices]


n = 100000
# tally up n weighted choices
for i in range(n):
tally[weighted_choice(choices)] += 1


print([t/sum(tally)*100 for t in tally])

如果你有一个加权字典而不是一个列表,你可以这样写

items = { "a": 10, "b": 5, "c": 1 }
random.choice([k for k in items for dummy in range(items[k])])

注意,[k for k in items for dummy in range(items[k])]生成了这个列表['a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'c', 'b', 'b', 'b', 'b', 'b']

如果你不介意使用numpy,你可以使用numpy.random.choice

例如:

import numpy


items  = [["item1", 0.2], ["item2", 0.3], ["item3", 0.45], ["item4", 0.05]
elems = [i[0] for i in items]
probs = [i[1] for i in items]


trials = 1000
results = [0] * len(items)
for i in range(trials):
res = numpy.random.choice(items, p=probs)  #This is where the item is selected!
results[items.index(res)] += 1
results = [r / float(trials) for r in results]
print "item\texpected\tactual"
for i in range(len(probs)):
print "%s\t%0.4f\t%0.4f" % (items[i], probs[i], results[i])

如果你知道你需要提前做多少选择,你可以不像这样循环:

numpy.random.choice(items, trials, p=probs)

通解:

import random
def weighted_choice(choices, weights):
total = sum(weights)
treshold = random.uniform(0, total)
for k, weight in enumerate(weights):
total -= weight
if total < treshold:
return choices[k]

下面是使用numpy的另一个版本的weighted_choice。传入weights向量,它将返回一个由0组成的数组,其中包含一个1,表示所选择的bin。该代码默认只进行一次绘制,但您可以传入绘制的数量,并且将返回每个绘制的bin的计数。

如果权重向量的和不等于1,它将被规范化,使之等于1。

import numpy as np


def weighted_choice(weights, n=1):
if np.sum(weights)!=1:
weights = weights/np.sum(weights)


draws = np.random.random_sample(size=n)


weights = np.cumsum(weights)
weights = np.insert(weights,0,0.0)


counts = np.histogram(draws, bins=weights)
return(counts[0])
import numpy as np
w=np.array([ 0.4,  0.8,  1.6,  0.8,  0.4])
np.random.choice(w, p=w/sum(w))

从1.7.0版本开始,NumPy有一个支持概率分布的choice函数。

from numpy.random import choice
draw = choice(list_of_candidates, number_of_items_to_pick,
p=probability_distribution)

注意,probability_distribution是一个与list_of_candidates顺序相同的序列。你也可以使用关键字replace=False来改变行为,这样绘制的项就不会被替换。

我可能已经来不及提供任何有用的东西了,但这里有一个简单,简短,非常有效的片段:

def choose_index(probabilies):
cmf = probabilies[0]
choice = random.random()
for k in xrange(len(probabilies)):
if choice <= cmf:
return k
else:
cmf += probabilies[k+1]

不需要排序你的概率或用你的cmf创建一个向量,它一旦找到它的选择就会终止。内存:O(1),时间:O(N),平均运行时间~ N/2。

如果你有权重,只需添加一行:

def choose_index(weights):
probabilities = weights / sum(weights)
cmf = probabilies[0]
choice = random.random()
for k in xrange(len(probabilies)):
if choice <= cmf:
return k
else:
cmf += probabilies[k+1]

如果你的加权选项列表是相对静态的,并且你想要频繁采样,你可以做一个O(N)预处理步骤,然后使用这是相关的答案中的函数在O(1)中进行选择。

# run only when `choices` changes.
preprocessed_data = prep(weight for _,weight in choices)


# O(1) selection
value = choices[sample(preprocessed_data)][0]

自Python 3.6起,random模块中有一个方法choices

In [1]: import random


In [2]: random.choices(
...:     population=[['a','b'], ['b','a'], ['c','b']],
...:     weights=[0.2, 0.2, 0.6],
...:     k=10
...: )


Out[2]:
[['c', 'b'],
['c', 'b'],
['b', 'a'],
['c', 'b'],
['c', 'b'],
['b', 'a'],
['c', 'b'],
['b', 'a'],
['c', 'b'],
['c', 'b']]

注意,random.choices将根据文档与更换进行抽样:

返回一个k大小的元素列表,包含从替换填充中选择的元素。

为确保回答的完整性,请注意:

当从有限总体中抽取抽样单位并返回时 对于该种群,在其特征被记录下来之后, 在下一个单位被绘制之前,抽样被称为“与” replacement"。它基本上意味着每个元素可以被选择多于 一次。< / p >

如果你需要在不替换的情况下进行采样,那么正如@ronan-paixão的精彩回答所示,你可以使用numpy.choice,它的replace参数控制这种行为。

下面是Python 3.6标准库中包含的版本:

import itertools as _itertools
import bisect as _bisect


class Random36(random.Random):
"Show the code included in the Python 3.6 version of the Random class"


def choices(self, population, weights=None, *, cum_weights=None, k=1):
"""Return a k sized list of population elements chosen with replacement.


If the relative weights or cumulative weights are not specified,
the selections are made with equal probability.


"""
random = self.random
if cum_weights is None:
if weights is None:
_int = int
total = len(population)
return [population[_int(random() * total)] for i in range(k)]
cum_weights = list(_itertools.accumulate(weights))
elif weights is not None:
raise TypeError('Cannot specify both weights and cumulative weights')
if len(cum_weights) != len(population):
raise ValueError('The number of weights does not match the population')
bisect = _bisect.bisect
total = cum_weights[-1]
return [population[bisect(cum_weights, random() * total)] for i in range(k)]

来源:https://hg.python.org/cpython/file/tip/Lib/random.py#l340

从Python v3.6开始,random.choices可用于从给定的具有可选权重的填充中返回指定大小的元素的list

random.choices(population, weights=None, *, cum_weights=None, k=1)

  • population:包含唯一观测值的list。(如果为空,则引发IndexError)

  • weights:更精确地进行选择所需的相对权重。

  • cum_weights:进行选择所需的累积权重。

  • k:要输出的list的大小(len)。(默认len()=1)


< em >几个事项:< / em >

1)利用加权抽样与替换,使绘制的项目以后可以被替换。权重序列中的值本身并不重要,但它们的相对比例却很重要。

np.random.choice不同,np.random.choice只能将概率作为权重,并且必须确保个人概率的总和达到1个标准,这里没有这样的规定。只要它们属于数值类型(int/float/fraction类型除外Decimal类型),它们仍然可以执行。

>>> import random
# weights being integers
>>> random.choices(["white", "green", "red"], [12, 12, 4], k=10)
['green', 'red', 'green', 'white', 'white', 'white', 'green', 'white', 'red', 'white']
# weights being floats
>>> random.choices(["white", "green", "red"], [.12, .12, .04], k=10)
['white', 'white', 'green', 'green', 'red', 'red', 'white', 'green', 'white', 'green']
# weights being fractions
>>> random.choices(["white", "green", "red"], [12/100, 12/100, 4/100], k=10)
['green', 'green', 'white', 'red', 'green', 'red', 'white', 'green', 'green', 'green']

2)如果既没有指定权重也没有指定cum_weights,选择的概率是相等的。如果提供了权重序列,则其长度必须与人口序列相同。

同时指定权重cum_weights会引发TypeError

>>> random.choices(["white", "green", "red"], k=10)
['white', 'white', 'green', 'red', 'red', 'red', 'white', 'white', 'white', 'green']

3) cum_weights通常是itertools.accumulate函数的结果,在这种情况下非常方便。

从文档链接:

在内部,相对权重被转换为累积权重 在进行选择之前,提供累计权重可以节省 工作。< / p >

因此,为我们所设计的情况提供weights=[12, 12, 4]cum_weights=[12, 24, 28]都会产生相同的结果,并且后者似乎更快/更有效。

这取决于你想对分布进行多少次抽样。

假设要对分布进行K次抽样。然后,当n是分布中的项数时,每次使用np.random.choice()的时间复杂度为O(K(n + log(n)))

在我的例子中,我需要对相同的分布进行多次采样,阶数为10^3其中n阶数为10^6。我使用了下面的代码,它预先计算累积分布并在O(log(n))中对其进行抽样。总体时间复杂度为O(n+K*log(n))

import numpy as np


n,k = 10**6,10**3


# Create dummy distribution
a = np.array([i+1 for i in range(n)])
p = np.array([1.0/n]*n)


cfd = p.cumsum()
for _ in range(k):
x = np.random.uniform()
idx = cfd.searchsorted(x, side='right')
sampled_element = a[idx]

一种方法是随机化所有权重的总和,然后使用这些值作为每个变量的极限点。以下是作为生成器的粗略实现。

def rand_weighted(weights):
"""
Generator which uses the weights to generate a
weighted random values
"""
sum_weights = sum(weights.values())
cum_weights = {}
current_weight = 0
for key, value in sorted(weights.iteritems()):
current_weight += value
cum_weights[key] = current_weight
while True:
sel = int(random.uniform(0, 1) * sum_weights)
for key, value in sorted(cum_weights.iteritems()):
if sel < value:
break
yield key

使用numpy

def choice(items, weights):
return items[np.argmin((np.cumsum(weights) / sum(weights)) < np.random.rand())]

我需要做这样的事情非常快速非常简单,从搜索的想法,我终于建立了这个模板。其思想是以json的形式从api接收加权值,这里是由dict模拟的。

然后将其转换为一个列表,其中每个值都与它的权重成比例地重复,只需使用random。选择从列表中选择一个值。

我尝试了10次、100次和1000次迭代。分布似乎很稳定。

def weighted_choice(weighted_dict):
"""Input example: dict(apples=60, oranges=30, pineapples=10)"""
weight_list = []
for key in weighted_dict.keys():
weight_list += [key] * weighted_dict[key]
return random.choice(weight_list)

我不喜欢它们的语法。我只想具体说明这些项目是什么以及每项的权重是多少。我意识到我本可以使用random.choices,但我很快就在下面写了这个类。

import random, string
from numpy import cumsum


class randomChoiceWithProportions:
'''
Accepts a dictionary of choices as keys and weights as values. Example if you want a unfair dice:




choiceWeightDic = {"1":0.16666666666666666, "2": 0.16666666666666666, "3": 0.16666666666666666
, "4": 0.16666666666666666, "5": .06666666666666666, "6": 0.26666666666666666}
dice = randomChoiceWithProportions(choiceWeightDic)


samples = []
for i in range(100000):
samples.append(dice.sample())


# Should be close to .26666
samples.count("6")/len(samples)


# Should be close to .16666
samples.count("1")/len(samples)
'''
def __init__(self, choiceWeightDic):
self.choiceWeightDic = choiceWeightDic
weightSum = sum(self.choiceWeightDic.values())
assert weightSum == 1, 'Weights sum to ' + str(weightSum) + ', not 1.'
self.valWeightDict = self._compute_valWeights()


def _compute_valWeights(self):
valWeights = list(cumsum(list(self.choiceWeightDic.values())))
valWeightDict = dict(zip(list(self.choiceWeightDic.keys()), valWeights))
return valWeightDict


def sample(self):
num = random.uniform(0,1)
for key, val in self.valWeightDict.items():
if val >= num:
return key

为random.choice()提供一个预先加权的列表:

解决方案,测试:

import random


options = ['a', 'b', 'c', 'd']
weights = [1, 2, 5, 2]


weighted_options = [[opt]*wgt for opt, wgt in zip(options, weights)]
weighted_options = [opt for sublist in weighted_options for opt in sublist]
print(weighted_options)


# test


counts = {c: 0 for c in options}
for x in range(10000):
counts[random.choice(weighted_options)] += 1


for opt, wgt in zip(options, weights):
wgt_r = counts[opt] / 10000 * sum(weights)
print(opt, counts[opt], wgt, wgt_r)

输出:

['a', 'b', 'b', 'c', 'c', 'c', 'c', 'c', 'd', 'd']
a 1025 1 1.025
b 1948 2 1.948
c 5019 5 5.019
d 2008 2 2.008

另一种方法是,假设我们的权重与元素数组中的元素的下标相同。

import numpy as np
weights = [0.1, 0.3, 0.5] #weights for the item at index 0,1,2
# sum of weights should be <=1, you can also divide each weight by sum of all weights to standardise it to <=1 constraint.
trials = 1 #number of trials
num_item = 1 #number of items that can be picked in each trial
selected_item_arr = np.random.multinomial(num_item, weights, trials)
# gives number of times an item was selected at a particular index
# this assumes selection with replacement
# one possible output
# selected_item_arr
# array([[0, 0, 1]])
# say if trials = 5, the the possible output could be
# selected_item_arr
# array([[1, 0, 0],
#   [0, 0, 1],
#   [0, 0, 1],
#   [0, 1, 0],
#   [0, 0, 1]])

现在我们假设,我们要在一次试验中抽取3个项目。你可以假设有三个球R、G、B大量存在,它们的权重由权重数组给定,可能的结果如下:

num_item = 3
trials = 1
selected_item_arr = np.random.multinomial(num_item, weights, trials)
# selected_item_arr can give output like :
# array([[1, 0, 2]])

您还可以将要选择的项目数量视为一组中二项/多项试验的数量。所以,上面的例子仍然可以作为工作

num_binomial_trial = 5
weights = [0.1,0.9] #say an unfair coin weights for H/T
num_experiment_set = 1
selected_item_arr = np.random.multinomial(num_binomial_trial, weights, num_experiment_set)
# possible output
# selected_item_arr
# array([[1, 4]])
# i.e H came 1 time and T came 4 times in 5 binomial trials. And one set contains 5 binomial trails.
在Udacity的免费课程AI for Robotics中,Sebastien Thurn对此进行了演讲。基本上,他使用mod操作符%创建了一个索引权重的圆形数组,将变量beta设置为0,随机选择一个索引, for循环遍历N,其中N是索引的数量,在for循环中,beta首先按公式递增:

Beta = Beta +来自{0…2 * Weight_max}

然后在for循环中嵌套一个while循环per:

while w[index] < beta:
beta = beta - w[index]
index = index + 1


select p[index]

然后到下一个索引,根据概率(或课程中介绍的情况下的归一化概率)重新采样。

在Udacity上找到第8课,机器人人工智能的第21期视频,他正在讲粒子滤波器。

如果你碰巧有Python 3,并且害怕安装numpy或编写自己的循环,你可以这样做:

import itertools, bisect, random


def weighted_choice(choices):
weights = list(zip(*choices))[1]
return choices[bisect.bisect(list(itertools.accumulate(weights)),
random.uniform(0, sum(weights)))][0]

因为你可以用一袋管道适配器构建任何东西 !尽管……我必须承认,尼德的回答虽然稍长一些,但比较容易理解。

加权选择的一个非常基本和简单的方法如下:

np.random.choice(['A', 'B', 'C'], p=[0.3, 0.4, 0.3])

如果你没有提前定义你想要选择多少项(所以,你不做类似k=10的事情),你只有概率,你可以做下面的事情。注意,你的概率加起来不需要等于1,它们可以相互独立:

soup_items = ['pepper', 'onion', 'tomato', 'celery']
items_probability = [0.2, 0.3, 0.9, 0.1]


selected_items = [item for item,p in zip(soup_items,items_probability) if random.random()<p]
print(selected_items)
>>>['pepper','tomato']

生成你感兴趣的CDF F

步骤2:生成u.r.v. u

步骤3:z=F^{-1}(u)

这种建模在概率论或随机过程课程中有描述。这是适用的,因为您有简单的CDF。

假设你有

items = [11, 23, 43, 91]
probability = [0.2, 0.3, 0.4, 0.1]

,你有一个函数,它生成一个介于[0,1)之间的随机数(我们可以在这里使用random.random())。 现在取概率的前缀和

prefix_probability=[0.2,0.5,0.9,1]

现在,我们只需取一个0-1之间的随机数,然后使用二分搜索来查找该数字在prefix_probability中的位置。这个索引就是你的答案

代码是这样的

return items[bisect.bisect(prefix_probability,random.random())]