在‘ float’和‘ str’实例之间不支持‘ >’

我面临这个错误的多个变量，甚至处理缺失的值。例如:

le = preprocessing.LabelEncoder()
categorical = list(df.select_dtypes(include=['object']).columns.values)
for cat in categorical:
print(cat)
df[cat].fillna('UNK', inplace=True)
df[cat] = le.fit_transform(df[cat])
#     print(le.classes_)
#     print(le.transform(le.classes_))




---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-24-424a0952f9d0> in <module>()
4     print(cat)
5     df[cat].fillna('UNK', inplace=True)
----> 6     df[cat] = le.fit_transform(df[cat].fillna('UNK'))
7 #     print(le.classes_)
8 #     print(le.transform(le.classes_))


C:\Users\paula.ceccon.ribeiro\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\preprocessing\label.py in fit_transform(self, y)
129         y = column_or_1d(y, warn=True)
130         _check_numpy_unicode_bug(y)
--> 131         self.classes_, y = np.unique(y, return_inverse=True)
132         return y
133


C:\Users\paula.ceccon.ribeiro\AppData\Local\Continuum\Anaconda3\lib\site-packages\numpy\lib\arraysetops.py in unique(ar, return_index, return_inverse, return_counts)
209
210     if optional_indices:
--> 211         perm = ar.argsort(kind='mergesort' if return_index else 'quicksort')
212         aux = ar[perm]
213     else:


TypeError: '>' not supported between instances of 'float' and 'str'

检查导致错误的变量的结果是:

df['CRM do Médico'].isnull().sum()
0

除了 nan 值之外，还有什么可能导致这个错误？

234458

小开

最佳答案

This is due to the series df[cat] containing elements that have varying data types e.g.(strings and/or floats). This could be due to the way the data is read, i.e. numbers are read as float and text as strings or the datatype was float and changed after the fillna operation.

In other words

pandas data type 'Object' indicates mixed types rather than str type

so using the following line:

df[cat] = le.fit_transform(df[cat].astype(str))

should help

小开

Or use a cast with split to uniform type of str

unique, counts = numpy.unique(str(a).split(), return_counts=True)

小开

As string data types have variable length, it is by default stored as object type. I faced this problem after treating missing values too. Converting all those columns to type 'category' before label encoding worked in my case.

df[cat]=df[cat].astype('category')

And then check df.dtypes and perform label encoding.

小开

You can use

df['cat'] = df['cat'].apply(str)

小开

In my case, I had nan in a list; which limits certain operations you can do