熊猫: ValueError: 无法将浮点 NaN 转换为整数

我得到 无法将浮点 NaN 转换为整数的原因是:

df = pandas.read_csv('zoom11.csv')
df[['x']] = df[['x']].astype(int)
  • “ x”是 csv 文件中的一列,我在文件中找不到任何 漂浮 NaN,我不明白这个错误或者我为什么会得到它。
  • 当我将列读作 String 时,它的值是 -1,0,1,... 2000,对我来说,所有的 int 数都非常漂亮。
  • 当我以 float 形式读取列时,就可以加载它。然后它显示值为 -1.0.0.0等,仍然没有任何 NaN-s
  • 我试过在 read _ csv 中使用 Error _ bad _ lines = False 错误和 dtype 参数,但是没有用。
  • 文件不小(10 + M 行) ,所以不能手动检查它,当我提取一个小头部分,然后没有错误,但它发生在完整的文件。因此,它是文件中的东西,但无法检测到什么。
  • 从逻辑上讲,csv 不应该有丢失的值,但是即使有一些垃圾,我也可以跳过这些行。或者至少识别它们,但是我没有看到通过文件扫描和报告转换错误的方法。

更新: 使用评论/回答中的提示,我得到了我的数据清理:

# x contained NaN
df = df[~df['x'].isnull()]


# Y contained some other garbage, so null check was not enough
df = df[df['y'].str.isnumeric()]


# final conversion now worked
df[['x']] = df[['x']].astype(int)
df[['y']] = df[['y']].astype(int)
312849 次浏览

For identifying NaN values use boolean indexing:

print(df[df['x'].isnull()])

Then for removing all non-numeric values use to_numeric with parameter errors='coerce' - to replace non-numeric values to NaNs:

df['x'] = pd.to_numeric(df['x'], errors='coerce')

And for remove all rows with NaNs in column x use dropna:

df = df.dropna(subset=['x'])

Last convert values to ints:

df['x'] = df['x'].astype(int)

I know this has been answered but wanted to provide alternate solution for anyone in the future:

You can use .loc to subset the dataframe by only values that are notnull(), and then subset out the 'x' column only. Take that same vector, and apply(int) to it.

If column x is float:

df.loc[df['x'].notnull(), 'x'] = df.loc[df['x'].notnull(), 'x'].apply(int)

ValueError: cannot convert float NaN to integer

From v0.24, you actually can. Pandas introduces Nullable Integer Data Types which allows integers to coexist with NaNs.

Given a series of whole float numbers with missing data,

s = pd.Series([1.0, 2.0, np.nan, 4.0])
s


0    1.0
1    2.0
2    NaN
3    4.0
dtype: float64


s.dtype
# dtype('float64')

You can convert it to a nullable int type (choose from one of Int16, Int32, or Int64) with,

s2 = s.astype('Int32') # note the 'I' is uppercase
s2


0      1
1      2
2    NaN
3      4
dtype: Int32


s2.dtype
# Int32Dtype()

Your column needs to have whole numbers for the cast to happen. Anything else will raise a TypeError:

s = pd.Series([1.1, 2.0, np.nan, 4.0])


s.astype('Int32')
# TypeError: cannot safely cast non-equivalent float64 to int32

if you have null value then in doing mathematical operation you will get this error to resolve it use df[~df['x'].isnull()]df[['x']].astype(int) if you want your dataset to be unchangeable.

Also, even at the lastest versions of pandas if the column is object type you would have to convert into float first, something like:

df['column_name'].astype(np.float).astype("Int32")

NB: You have to go through numpy float first and then to nullable Int32, for some reason.

The size of the int if it's 32 or 64 depends on your variable, be aware you may loose some precision if your numbers are to big for the format.