Weighted random numbers

I'm trying to implement a weighted random numbers. I'm currently just banging my head against the wall and cannot figure this out.

In my project (Hold'em hand-ranges, subjective all-in equity analysis), I'm using Boost's random -functions. So, let's say I want to pick a random number between 1 and 3 (so either 1, 2 or 3). Boost's mersenne twister generator works like a charm for this. However, I want the pick to be weighted for example like this:

1 (weight: 90)
2 (weight: 56)
3 (weight:  4)

Does Boost have some sort of functionality for this?

114286 次浏览

有一个简单的算法可以随机挑选一个项目,其中每个项目都有单独的权重:

1)计算所有权重的总和

2)选择一个大于等于0且小于权重总和的随机数

3) go through the items one at a time, subtracting their weight from your random number, until you get the item where the random number is less than that item's weight

说明这一点的伪代码:

int sum_of_weight = 0;
for(int i=0; i<num_choices; i++) {
sum_of_weight += choice_weight[i];
}
int rnd = random(sum_of_weight);
for(int i=0; i<num_choices; i++) {
if(rnd < choice_weight[i])
return i;
rnd -= choice_weight[i];
}
assert(!"should never get here");

这应该是直接适应您的升压容器等。


如果你的权重很少改变,但你经常随机选择一个,只要你的容器是存储指向对象的指针或超过几十个项目(基本上,你必须知道这是否有帮助或阻碍) ,然后有一个优化:

通过在每个项目中存储累计重量和,您可以使用 二进制搜索来选择与拣选重量相对应的项目。


如果您不知道列表中的项目数,那么有一个非常简洁的算法称为 水塘抽样,它可以适应加权。

在[0,1)上选择一个随机数,它应该是升压 RNG 的默认操作符()。选择有累积概率密度函数 > = 该数字的项目:

template <class It,class P>
It choose_p(It begin,It end,P const& p)
{
if (begin==end) return end;
double sum=0.;
for (It i=begin;i!=end;++i)
sum+=p(*i);
double choice=sum*random01();
for (It i=begin;;) {
choice -= p(*i);
It r=i;
++i;
if (choice<0 || i==end) return r;
}
return begin; //unreachable
}

其中 Random 01()返回一个 double > = 0和 < 1。请注意,上面的代码并不要求概率之和为1; 它为您规范化了这些概率。

P 只是一个函数,它为集合中的一个项目[开始,结束]赋予一个概率。如果只有一个概率序列,则可以省略它(或使用标识)。

构建一个包(或 std: : Vector) ,其中包含所有可以选择的项目。
确保每个项目的数量与你的权重成正比。

例如:

  • 160%
  • 235%
  • 35%

所以要有一个装有100件物品的袋子,里面有601,352和53。
现在随机对袋子进行排序(std: : Random _ shuffle)

Pick elements from the bag sequentially until it is empty.
一旦空重新随机袋和重新开始。

更新了一个老问题的答案,你可以很容易地在 C + + 11中使用 std: : lib:

#include <iostream>
#include <random>
#include <iterator>
#include <ctime>
#include <type_traits>
#include <cassert>


int main()
{
// Set up distribution
double interval[] = {1,   2,   3,   4};
double weights[] =  {  .90, .56, .04};
std::piecewise_constant_distribution<> dist(std::begin(interval),
std::end(interval),
std::begin(weights));
// Choose generator
std::mt19937 gen(std::time(0));  // seed as wanted
// Demonstrate with N randomly generated numbers
const unsigned N = 1000000;
// Collect number of times each random number is generated
double avg[std::extent<decltype(weights)>::value] = {0};
for (unsigned i = 0; i < N; ++i)
{
// Generate random number using gen, distributed according to dist
unsigned r = static_cast<unsigned>(dist(gen));
// Sanity check
assert(interval[0] <= r && r <= *(std::end(interval)-2));
// Save r for statistical test of distribution
avg[r - 1]++;
}
// Compute averages for distribution
for (double* i = std::begin(avg); i < std::end(avg); ++i)
*i /= N;
// Display distribution
for (unsigned i = 1; i <= std::extent<decltype(avg)>::value; ++i)
std::cout << "avg[" << i << "] = " << avg[i-1] << '\n';
}

Output on my system:

avg[1] = 0.600115
avg[2] = 0.373341
avg[3] = 0.026544

请注意,上面的大部分代码只用于显示和分析输出。实际生成的代码只有几行。输出表明已经获得了请求的“概率”。您必须将请求的输出除以1.5,因为这是请求的总和。

当我需要给数字加权时,我会用一个随机数来加权。

例如: 我需要生成从1到3的随机数,其权重如下:

  • 一个随机数的10% 可能是1
  • 一个随机数的30% 可能是2
  • 60% 的随机数可能是3

Then I use:

weight = rand() % 10;


switch( weight ) {


case 0:
randomNumber = 1;
break;
case 1:
case 2:
case 3:
randomNumber = 2;
break;
case 4:
case 5:
case 6:
case 7:
case 8:
case 9:
randomNumber = 3;
break;
}

这样,随机抽取10% 的概率是1,30% 的概率是2,60% 的概率是3。

You can play with it as your needs.

希望我能帮到你,祝你好运!

If your weights change more slowly than they are drawn, C++11 discrete_distribution is going to be the easiest:

#include <random>
#include <vector>
std::vector<double> weights{90,56,4};
std::discrete_distribution<int> dist(std::begin(weights), std::end(weights));
std::mt19937 gen;
gen.seed(time(0));//if you want different results from different runs
int N = 100000;
std::vector<int> samples(N);
for(auto & i: samples)
i = dist(gen);
//do something with your samples...

但是请注意,c + + 11 discrete_distribution计算初始化时的所有累积总和。通常,您需要这样做是因为它加快了一次 O (N)成本的采样时间。但是对于一个快速变化的发行版来说,它将带来沉重的计算(和内存)成本。例如,如果权重表示有多少个项目,并且每次绘制一个项目时都要删除它,那么您可能需要一个自定义算法。

Will 的答案 https://stackoverflow.com/a/1761646/837451避免了这个开销,但是从中提取的速度比 C + + 11慢,因为它不能使用二进制搜索。

为了看到它这样做,您可以看到相关的行(在我的 Ubuntu 16.04 + GCC 5.3安装中的 /usr/include/c++/5/bits/random.tcc) :

  template<typename _IntType>
void
discrete_distribution<_IntType>::param_type::
_M_initialize()
{
if (_M_prob.size() < 2)
{
_M_prob.clear();
return;
}


const double __sum = std::accumulate(_M_prob.begin(),
_M_prob.end(), 0.0);
// Now normalize the probabilites.
__detail::__normalize(_M_prob.begin(), _M_prob.end(), _M_prob.begin(),
__sum);
// Accumulate partial sums.
_M_cp.reserve(_M_prob.size());
std::partial_sum(_M_prob.begin(), _M_prob.end(),
std::back_inserter(_M_cp));
// Make sure the last cumulative probability is one.
_M_cp[_M_cp.size() - 1] = 1.0;
}

这是我对“加权随机”的理解,我最近一直在使用它。(代码是用 Python 编写的,但可以用其他语言实现)

假设你想随机选择一个人,但他们被选中的机会并不均等 你可以给每个人一个“重量”或“概率”值:

choices = [("Ade", 60), ("Tope", 50), ("Maryamu", 30)]

You use their weights to calculate a score for each then find the choice with the highest score

highest = [None, 0]
for p in choices:
score = math.floor(random.random() * p[1])
if score > highest[1]:
highest[0] = p
highest[1] = score


print(highest)

对于艾德来说,他们能得到的最高分是60分、50分等等,这意味着艾德比其他人有更高的机会得到最高分。

你可以使用任意范围的权重,差异越大,分布越不平衡。 如果艾德的体重是1000,他们几乎总是会被选中。

测试

votes = [{"name": "Ade", "votes": 0}, {"name": "Tope", "votes": 0}, {"name": "Maryamu", "votes": 0]
for v in range(100):
        

highest = [None, 0]
for p in choices:
score = math.floor(random.random() * p[1])
            

if score > highest[1]:
highest[0] = p
highest[1] = score


candidate = choices(index(highest[0])) # get index of person
votes[candidate]["count"] += 1 # increase vote count
print(votes)
// votes printed at the end. your results might be different
[{"name": "Ade", "votes": 45}, {"name": "Tope", "votes": 30}, {"name": "Maryamu", "votes": 25}]

问题

看来投票人越多,结果就越容易预测

希望这能给某人一个启发。

我刚刚通过“ 威尔”实现了给定的解决方案

#include <iostream>
#include <map>


using namespace std;




template < class T >
class WeightedRandomSample
{
public:
void SetWeigthMap( map< T , unsigned int >& WeightMap )
{
m_pMap = &WeightMap;
}
    

T GetRandomSample()
{
unsigned int sum_of_weight = GetSumOfWeights();
unsigned int rnd = (rand() % sum_of_weight);
map<T , unsigned int>& w_map = *m_pMap;
typename map<T , unsigned int>::iterator it;
for(it = w_map.begin() ; it != w_map.end() ; ++it )
{
unsigned int w = it->second;
if(rnd < w)
return (it->first);
rnd -= w;
}
//assert(!"should never get here");
T* t = NULL;
return *(t);
}
    

unsigned int GetSumOfWeights()
{
if(m_pMap == NULL)
return 0;
unsigned int sum = 0;
map<T , unsigned int>& w_map = *m_pMap;
typename map<T , unsigned int>::iterator it;
        

for(it = w_map.begin() ; it != w_map.end() ; ++it )
{
sum += it->second;
}
return sum;
}


    

protected:
map< T , unsigned int>* m_pMap = NULL;
    

};


typedef pair<int , int> PAIR_INT_INT;
typedef map<PAIR_INT_INT ,unsigned int> mul_table_weighted_map;


int main()
{
    

mul_table_weighted_map m;
m[PAIR_INT_INT(2,3)] = 10;
m[PAIR_INT_INT(4,5)] = 20;
m[PAIR_INT_INT(2,5)] = 10;
    

WeightedRandomSample<PAIR_INT_INT> WRS;
WRS.SetWeigthMap(m);
unsigned int sum_of_weight = WRS.GetSumOfWeights();
cout <<"Sum of weights : " << sum_of_weight << endl;
    

unsigned int number_of_test = 10000;
cout << "testing " << number_of_test << " ..." << endl;
map<PAIR_INT_INT , unsigned int> check_map;
for(int i = 0 ; i < number_of_test ; i++)
{
PAIR_INT_INT res = WRS.GetRandomSample();
check_map[res]++;
//cout << i+1 << ": random = " << res.first << " * " << res.second << endl;
}
cout << "results: " << endl;
    

for(auto t : check_map)
{
PAIR_INT_INT p = t.first;
unsigned int expected = (number_of_test * m[p]) / sum_of_weight;
cout << " pair " << p.first << " * " << p.second
<< ", counted = " << t.second
<< ", expected = " << expected
<< endl;
}


return 0;
}