Given is a simple CSV file:
A,B,C
Hello,Hi,0
Hola,Bueno,1
Obviously the real dataset is far more complex than this, but this one reproduces the error. I'm attempting to build a random forest classifier for it, like so:
cols = ['A','B','C']
col_types = {'A': str, 'B': str, 'C': int}
test = pd.read_csv('test.csv', dtype=col_types)
train_y = test['C'] == 1
train_x = test[cols]
clf_rf = RandomForestClassifier(n_estimators=50)
clf_rf.fit(train_x, train_y)
But I just get this traceback when invoking fit():
ValueError: could not convert string to float: 'Bueno'
scikit-learn version is 0.16.1.