如何删除熊猫中只包含零的列？

小开

最佳答案

df.loc[:, (df != 0).any(axis=0)]

Here is a break-down of how it works:

In [74]: import pandas as pd


In [75]: df = pd.DataFrame([[1,0,0,0], [0,0,1,0]])


In [76]: df
Out[76]:
0  1  2  3
0  1  0  0  0
1  0  0  1  0


[2 rows x 4 columns]

df != 0 creates a boolean DataFrame which is True where df is nonzero:

In [77]: df != 0
Out[77]:
0      1      2      3
0   True  False  False  False
1  False  False   True  False


[2 rows x 4 columns]

(df != 0).any(axis=0)返回一个布尔型 Series，指示哪些列有非零条目。(any操作将沿0轴(即沿行)的值聚合为一个布尔值。因此，每列的结果是一个布尔值。)

In [78]: (df != 0).any(axis=0)
Out[78]:
0     True
1    False
2     True
3    False
dtype: bool

df.loc可以用来选择这些列:

In [79]: df.loc[:, (df != 0).any(axis=0)]
Out[79]:
0  2
0  1  0
1  0  1


[2 rows x 2 columns]

要“删除”零列，请重新分配 df:

df = df.loc[:, (df != 0).any(axis=0)]

小开

下面是另一种使用方法

df.replace(0,np.nan).dropna(axis=1,how="all")

与 unutbu 方法相比，这种方法显然要慢一些:

%timeit df.loc[:, (df != 0).any(axis=0)]
652 µs ± 5.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


%timeit df.replace(0,np.nan).dropna(axis=1,how="all")
1.75 ms ± 9.49 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

小开

In case you'd like a more 很有表现力 way of getting the zero-column names so you can print / log them, and drop them, in-place, by their 名字:

zero_cols = [ col for col, is_zero in ((df == 0).sum() == df.shape[0]).items() if is_zero ]
df.drop(zero_cols, axis=1, inplace=True)

有些崩溃了:

# a pandas Series with {col: is_zero} items
# is_zero is True when the number of zero items in that column == num_all_rows
(df == 0).sum() == df.shape[0])


# a list comprehension of zero_col_names is built from the_series
[ col for col, is_zero in the_series.items() if is_zero ]

小开

如果您的列中有一些 NaN 值，如果您想删除同时包含0和 NaN 的列，您可能需要使用这种方法:

df.loc[:, (df**2).sum() != 0]