Sklearn,gridsearch: 如何打印执行过程中的进度?

我使用 GridSearchsklearn优化分类器的参数。由于有大量的数据,因此整个优化过程需要一段时间: 超过一天。我想观察在执行过程中已经尝试过的参数组合的性能。有可能吗?

82671 次浏览

GridSearchCV中的 verbose参数设置为正数(数字越大,获得的详细信息越多)。例如:

GridSearchCV(clf, param_grid, cv=cv, scoring='accuracy', verbose=10)

看看 GridSearchCVProgress 条

刚找到的,我正在用,非常投入:

In [1]: GridSearchCVProgressBar
Out[1]: pactools.grid_search.GridSearchCVProgressBar


In [2]:


In [2]: ??GridSearchCVProgressBar
Init signature: GridSearchCVProgressBar(estimator, param_grid, scoring=None, fit_params=None, n_jobs=1, iid=True, refit=True, cv=None, verbose=0, pre_dispatch='2*n_jobs', error_score='raise', return_train_score='warn')
Source:
class GridSearchCVProgressBar(model_selection.GridSearchCV):
"""Monkey patch Parallel to have a progress bar during grid search"""


def _get_param_iterator(self):
"""Return ParameterGrid instance for the given param_grid"""


iterator = super(GridSearchCVProgressBar, self)._get_param_iterator()
iterator = list(iterator)
n_candidates = len(iterator)


cv = model_selection._split.check_cv(self.cv, None)
n_splits = getattr(cv, 'n_splits', 3)
max_value = n_candidates * n_splits


class ParallelProgressBar(Parallel):
def __call__(self, iterable):
bar = ProgressBar(max_value=max_value, title='GridSearchCV')
iterable = bar(iterable)
return super(ParallelProgressBar, self).__call__(iterable)


# Monkey patch
model_selection._search.Parallel = ParallelProgressBar


return iterator
File:           ~/anaconda/envs/python3/lib/python3.6/site-packages/pactools/grid_search.py
Type:           ABCMeta


In [3]: ?GridSearchCVProgressBar
Init signature: GridSearchCVProgressBar(estimator, param_grid, scoring=None, fit_params=None, n_jobs=1, iid=True, refit=True, cv=None, verbose=0, pre_dispatch='2*n_jobs', error_score='raise', return_train_score='warn')
Docstring:      Monkey patch Parallel to have a progress bar during grid search
File:           ~/anaconda/envs/python3/lib/python3.6/site-packages/pactools/grid_search.py
Type:           ABCMeta

我只是想补充 大卫的回答

为了给你一个想法,对于一个非常简单的情况,这是它看起来如何与 verbose=1:

Fitting 10 folds for each of 1 candidates, totalling 10 fits
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done  10 out of  10 | elapsed:  1.2min finished

verbose=10看起来是这样的:

Fitting 10 folds for each of 1 candidates, totalling 10 fits
[CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[CV]  booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1, score=0.637, total=   7.1s
[CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    7.0s remaining:    0.0s
[CV]  booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1, score=0.630, total=   6.5s
[CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1
[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:   13.5s remaining:    0.0s
[CV]  booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1, score=0.637, total=   6.5s
[CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1
[Parallel(n_jobs=1)]: Done   3 out of   3 | elapsed:   20.0s remaining:    0.0s
[CV]  booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1, score=0.637, total=   6.7s
[CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1
[Parallel(n_jobs=1)]: Done   4 out of   4 | elapsed:   26.7s remaining:    0.0s
[CV]  booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1, score=0.632, total=   7.9s
[CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1
[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:   34.7s remaining:    0.0s
[CV]  booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1, score=0.622, total=   6.9s
[CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1
[Parallel(n_jobs=1)]: Done   6 out of   6 | elapsed:   41.6s remaining:    0.0s
[CV]  booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1, score=0.627, total=   7.1s
[CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1
[Parallel(n_jobs=1)]: Done   7 out of   7 | elapsed:   48.7s remaining:    0.0s
[CV]  booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1, score=0.628, total=   7.2s
[CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1
[Parallel(n_jobs=1)]: Done   8 out of   8 | elapsed:   55.9s remaining:    0.0s
[CV]  booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1, score=0.640, total=   6.6s
[CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1
[Parallel(n_jobs=1)]: Done   9 out of   9 | elapsed:  1.0min remaining:    0.0s
[CV]  booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1, score=0.629, total=   6.6s
[Parallel(n_jobs=1)]: Done  10 out of  10 | elapsed:  1.2min finished

在我的案例中,verbose=1就起作用了。