For best results in a classification context, choose the number of clusters 更大 than the number of classes, or even apply the clustering to 单身 classes only (to find out whether there is some structure within the class!).
下面是我在 EM 中用于解决这个问题的 Java 实现(Do and Batzoglou,2008)。实现的核心部分是运行 EM 直到参数收敛的循环。
private Parameters _parameters;
public Parameters run()
while (true)
Parameters estimatedParameters = maximization();
if (_parameters.converged(estimatedParameters)) {
_parameters = estimatedParameters;
return _parameters;
import java.util.*;
This class encapsulates the parameters of the problem. For this problem posed
in the article by (Do and Batzoglou, 2008), the parameters are thetaA and
thetaB, the probability of a coin coming up heads for the two coins A and B,
class Parameters
double _thetaA = 0.0; // Probability of heads for coin A.
double _thetaB = 0.0; // Probability of heads for coin B.
double _delta = 0.00001;
public Parameters(double thetaA, double thetaB)
_thetaA = thetaA;
_thetaB = thetaB;
Returns true if this parameter is close enough to another parameter
(typically the estimated parameter coming from the maximization step).
public boolean converged(Parameters other)
if (Math.abs(_thetaA - other._thetaA) < _delta &&
Math.abs(_thetaB - other._thetaB) < _delta)
return true;
return false;
public double getThetaA()
return _thetaA;
public double getThetaB()
return _thetaB;
public String toString()
return String.format("thetaA = %.5f, thetaB = %.5f", _thetaA, _thetaB);
This class encapsulates an observation, that is the number of heads
and tails in a trial. The observation can be either (1) one of the
experimental observations, or (2) an estimated observation resulting from
the expectation step.
class Observation
double _numHeads = 0;
double _numTails = 0;
public Observation(String s)
for (int i = 0; i < s.length(); i++)
char c = s.charAt(i);
if (c == 'H')
else if (c == 'T')
throw new RuntimeException("Unknown character: " + c);
public Observation(double numHeads, double numTails)
_numHeads = numHeads;
_numTails = numTails;
public double getNumHeads()
return _numHeads;
public double getNumTails()
return _numTails;
public String toString()
return String.format("heads: %.1f, tails: %.1f", _numHeads, _numTails);
This class runs expectation-maximization for the problem posed by the article
from (Do and Batzoglou, 2008).
public class EM
// Current estimated parameters.
private Parameters _parameters;
// Observations from the trials. These observations are set once.
private final List<Observation> _observations;
// Estimated observations per coin. These observations are the output
// of the expectation step.
private List<Observation> _expectedObservationsForCoinA;
private List<Observation> _expectedObservationsForCoinB;
private static o = System.out;
Principal constructor.
@param observations The observations from the trial.
@param parameters The initial guessed parameters.
public EM(List<Observation> observations, Parameters parameters)
_observations = observations;
_parameters = parameters;
Run EM until parameters converge.
public Parameters run()
while (true)
Parameters estimatedParameters = maximization();
o.printf("%s\n", estimatedParameters);
if (_parameters.converged(estimatedParameters)) {
_parameters = estimatedParameters;
return _parameters;
Given the observations and current estimated parameters, compute new
estimated completions (distribution over the classes) and observations.
private void expectation()
_expectedObservationsForCoinA = new ArrayList<Observation>();
_expectedObservationsForCoinB = new ArrayList<Observation>();
for (Observation observation : _observations)
int numHeads = (int)observation.getNumHeads();
int numTails = (int)observation.getNumTails();
double probabilityOfObservationForCoinA=
binomialProbability(10, numHeads, _parameters.getThetaA());
double probabilityOfObservationForCoinB=
binomialProbability(10, numHeads, _parameters.getThetaB());
double normalizer = probabilityOfObservationForCoinA +
// Compute the completions for coin A and B (i.e. the probability
// distribution of the two classes, summed to 1.0).
double completionCoinA = probabilityOfObservationForCoinA /
double completionCoinB = probabilityOfObservationForCoinB /
// Compute new expected observations for the two coins.
Observation expectedObservationForCoinA =
new Observation(numHeads * completionCoinA,
numTails * completionCoinA);
Observation expectedObservationForCoinB =
new Observation(numHeads * completionCoinB,
numTails * completionCoinB);
Given new estimated observations, compute new estimated parameters.
private Parameters maximization()
double sumCoinAHeads = 0.0;
double sumCoinATails = 0.0;
double sumCoinBHeads = 0.0;
double sumCoinBTails = 0.0;
for (Observation observation : _expectedObservationsForCoinA)
sumCoinAHeads += observation.getNumHeads();
sumCoinATails += observation.getNumTails();
for (Observation observation : _expectedObservationsForCoinB)
sumCoinBHeads += observation.getNumHeads();
sumCoinBTails += observation.getNumTails();
return new Parameters(sumCoinAHeads / (sumCoinAHeads + sumCoinATails),
sumCoinBHeads / (sumCoinBHeads + sumCoinBTails));
//o.printf("parameters: %s\n", _parameters);
Since the coin-toss experiment posed in this article is a Bernoulli trial,
use a binomial probability Pr(X=k; n,p) = (n choose k) * p^k * (1-p)^(n-k).
private static double binomialProbability(int n, int k, double p)
double q = 1.0 - p;
return nChooseK(n, k) * Math.pow(p, k) * Math.pow(q, n-k);
private static long nChooseK(int n, int k)
long numerator = 1;
for (int i = 0; i < k; i++)
numerator = numerator * n;
long denominator = factorial(k);
return (long)(numerator / denominator);
private static long factorial(int n)
long result = 1;
for (; n >0; n--)
result = result * n;
return result;
Entry point into the program.
public static void main(String argv[])
// Create the observations and initial parameter guess
// from the (Do and Batzoglou, 2008) article.
List<Observation> observations = new ArrayList<Observation>();
observations.add(new Observation("HTTTHHTHTH"));
observations.add(new Observation("HHHHTHHHHH"));
observations.add(new Observation("HTHHHHHTHH"));
observations.add(new Observation("HTHTTTHHTT"));
observations.add(new Observation("THHHTHHHTH"));
Parameters initialParameters = new Parameters(0.6, 0.5);
EM em = new EM(observations, initialParameters);
Parameters finalParameters =;
o.printf("Final result:\n%s\n", finalParameters);
Other answers being good, i will try to provide another perspective and tackle the intuitive part of the question.
EM (期望最大化)算法 是使用 < a href = “ _% 28數学% 29”rel = “ nofollow”> 对偶性的一类迭代算法的变体
节选(重点是我的) :
theorems or mathematical structures into other concepts, theorems or
对合运算的对偶: 如果 A 的对偶是 B,那么 B 的对偶
is A. Such involutions 有时有固定点, so that the dual
就是 A 本身
通常 对象A 的 dualB 与 A 有某种关系,这种关系保留了一些 对称性或兼容性。例如 AB = 康斯特
Examples of iterative algorithms, employing duality (in the previous sense) are:
1st: {H,T,T,T,H,H,T,H,T,H} 5 Heads, 5 Tails; Did coin A or B generate me?
2nd: {H,H,H,H,T,H,H,H,H,H} 9 Heads, 1 Tails
3rd: {H,T,H,H,H,H,H,T,H,H} 8 Heads, 2 Tails
4th: {H,T,H,T,T,T,H,H,T,T} 4 Heads, 6 Tails
5th: {T,H,H,H,T,H,H,H,T,H} 7 Heads, 3 Tails
Two possible coins, A & B are used to generate these distributions.
A & B have an unknown parameter: their bias towards heads.
We don't know the biases, but we can simply start with a guess: A=60% heads, B=50% heads.
In the case of the first trial's question, intuitively we'd think B generated it since the proportion of heads matches B's bias very well... but that value was just a guess, so we can't be sure.
def estimate_mean(data, weight):
For each data point, multiply the point by the probability it
was drawn from the colour's distribution (its "weight").
Divide by the total weight: essentially, we're finding where
the weight is centred among our data points.
return np.sum(data * weight) / np.sum(weight)
def estimate_std(data, weight, mean):
For each data point, multiply the point's squared difference
from a mean value by the probability it was drawn from
that distribution (its "weight").
Divide by the total weight: essentially, we're finding where
the weight is centred among the values for the difference of
each data point from the mean.
This is the estimate of the variance, take the positive square
root to find the standard deviation.
variance = np.sum(weight * (data - mean)**2) / np.sum(weight)
return np.sqrt(variance)
# new estimates for standard deviation
blue_std_guess = estimate_std(both_colours, blue_weight, blue_mean_guess)
red_std_guess = estimate_std(both_colours, red_weight, red_mean_guess)
# new estimates for mean
red_mean_guess = estimate_mean(both_colours, red_weight)
blue_mean_guess = estimate_mean(both_colours, blue_weight)
| EM guess | Actual | Delta
Red mean | 2.910 | 2.802 | 0.108
Red std | 0.854 | 0.871 | -0.017
Blue mean | 6.838 | 6.932 | -0.094
Blue std | 2.227 | 2.195 | 0.032