Python 熊猫检查 dataframe 是否为空

我有一个 if语句,它检查数据帧是否为空。我的方法是这样的:

if dataframe.empty:
pass
else:
#do something

但我真的需要:

if dataframe is not empty:
#do something

我的问题是,有没有一种方法 .not_empty()来实现这一点?我还想问,第二个版本是否在性能方面更好?否则,也许我应该让它保持原样,也就是第一个版本?

149803 次浏览

Just do

if not dataframe.empty:
# insert code here

The reason this works is because dataframe.empty returns True if dataframe is empty. To invert this, we can use the negation operator not, which flips True to False and vice-versa.

You can use the attribute dataframe.empty to check whether it's empty or not:

if not dataframe.empty:
#do something

Or

if len(dataframe) != 0:
#do something

Or

if len(dataframe.index) != 0:
#do something

Another way:

if dataframe.empty == False:
#do something`

.empty returns a boolean value

>>> df_empty.empty
True

So if not empty can be written as

if not df.empty:
#Your code

Check pandas.DataFrame.empty , might help someone.

No doubt that the use of empty is the most comprehensive in this case (explicit is better than implicit).
However, the most efficient in term of computation time is through the usage of len :

if not len(df.index) == 0:
# insert code here

Source : this answer.

As already clearly explained by other commentators, you can negate a boolean expression in Python by simply prepending the not operator, hence:

if not df.empty:
# do something

does the trick.

I only want to clarify the meaning of "empty" in this context, because it was a bit confusing for me at first.

According to the Pandas documentation, the DataFrame.empty method returns True if any of the axes in the DataFrame are of length 0.

As a consequence, "empty" doesn't mean zero rows and zero columns, like someone might expect. A dataframe with zero rows (axis 1 is empty) but non-zero columns (axis 2 is not empty) is still considered empty:

> df = pd.DataFrame(columns=["A", "B", "C"])
> df.empty
True

Another interesting point highlighted in the documentation is a DataFrame that only contains NaNs is not considered empty.

> df = pd.DataFrame(columns=["A", "B", "C"], index=['a', 'b', 'c'])
> df
A    B    C
a  NaN  NaN  NaN
b  NaN  NaN  NaN
c  NaN  NaN  NaN
> df.empty
False