Scikit 学习中的随机森林分类器与外树分类器

有人能解释一下 scikit 中的随机森林分类器和额外树分类器之间的区别吗。我花了不少时间看报纸:

《极度随机化的树》 ,《机器学习》 ,2006年第63期(1) ,第3-42页

似乎这就是外星人的不同之处:

1)在分割选择变量时,从整个训练集中抽取样本,而不是从训练集中抽取自举样本。

2)分裂是完全随机选择的范围内的值在样品在每一分裂。

这两件事的结果就是更多的“树叶”。

40689 次浏览

Yes both conclusions are correct, although the Random Forest implementation in scikit-learn makes it possible to enable or disable the bootstrap resampling.

In practice, RFs are often more compact than ETs. ETs are generally cheaper to train from a computational point of view but can grow much bigger. ETs can sometime generalize better than RFs but it's hard to guess when it's the case without trying both first (and tuning n_estimators, max_features and min_samples_split by cross-validated grid search).

ExtraTrees classifier always tests random splits over fraction of features (in contrast to RandomForest, which tests all possible splits over fraction of features)

The main difference between random forests and extra trees (usually called extreme random forests) lies in the fact that, instead of computing the locally optimal feature/split combination (for the random forest), for each feature under consideration, a random value is selected for the split (for the extra trees). Here is a good resource to know more about their difference in more detail Random forest vs extra tree.