最佳答案
我有一个包含数字和分类数据的数据集,我想根据患者的医学特征来预测不良后果。我为我的数据集定义了一个预测管道,如下所示:
X = dataset.drop(columns=['target'])
y = dataset['target']
# define categorical and numeric transformers
numeric_transformer = Pipeline(steps=[
('knnImputer', KNNImputer(n_neighbors=2, weights="uniform")),
('scaler', StandardScaler())])
categorical_transformer = Pipeline(steps=[
('imputer', SimpleImputer(strategy='constant', fill_value='missing')),
('onehot', OneHotEncoder(handle_unknown='ignore'))])
# dispatch object columns to the categorical_transformer and remaining columns to numerical_transformer
preprocessor = ColumnTransformer(transformers=[
('num', numeric_transformer, selector(dtype_exclude="object")),
('cat', categorical_transformer, selector(dtype_include="object"))
])
# Append classifier to preprocessing pipeline.
# Now we have a full prediction pipeline.
clf = Pipeline(steps=[('preprocessor', preprocessor),
('classifier', LogisticRegression())])
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
clf.fit(X_train, y_train)
print("model score: %.3f" % clf.score(X_test, y_test))
但是,在运行此代码时,我会得到以下警告消息:
ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
extra_warning_msg=_LOGISTIC_SOLVER_CONVERGENCE_MSG)
model score: 0.988
有人能解释一下这个警告是什么意思吗?我是机器学习的新手,所以对于如何改进预测模型有点迷茫。正如您可以从 numeric _ former 中看到的,我通过标准化扩展了数据。我也很困惑,为什么模型得分这么高,这是好事还是坏事。