Find empty or NaN entry in Pandas Dataframe

小开

最佳答案

np.where(pd.isnull(df)) returns the row and column indices where the value is NaN:

In [152]: import numpy as np
In [153]: import pandas as pd
In [154]: np.where(pd.isnull(df))
Out[154]: (array([2, 5, 6, 6, 7, 7]), array([7, 7, 6, 7, 6, 7]))


In [155]: df.iloc[2,7]
Out[155]: nan


In [160]: [df.iloc[i,j] for i,j in zip(*np.where(pd.isnull(df)))]
Out[160]: [nan, nan, nan, nan, nan, nan]

Finding values which are empty strings could be done with applymap:

In [182]: np.where(df.applymap(lambda x: x == ''))
Out[182]: (array([5]), array([7]))

Note that using applymap requires calling a Python function once for each cell of the DataFrame. That could be slow for a large DataFrame, so it would be better if you could arrange for all the blank cells to contain NaN instead so you could use pd.isnull.

小开

Partial solution: for a single string column tmp = df['A1'].fillna(''); isEmpty = tmp=='' gives boolean Series of True where there are empty strings or NaN values.

小开

Try this:

df[df['column_name'] == ''].index

and for NaNs you can try:

pd.isna(df['column_name'])

小开

I've resorted to

df[ (df[column_name].notnull()) & (df[column_name]!=u'') ].index

lately. That gets both null and empty-string cells in one go.

小开

To obtain all the rows that contains an empty cell in in a particular column.

DF_new_row=DF_raw.loc[DF_raw['columnname']=='']

This will give the subset of DF_raw, which satisfy the checking condition.

小开

Check if the columns contain Nan using .isnull() and check for empty strings using .eq(''), then join the two together using the bitwise OR operator |.

Sum along axis 0 to find columns with missing data, then sum along axis 1 to the index locations for rows with missing data.

missing_cols, missing_rows = (
(df2.isnull().sum(x) | df2.eq('').sum(x))
.loc[lambda x: x.gt(0)].index
for x in (0, 1)
)


>>> df2.loc[missing_rows, missing_cols]
A2       A3
2            1.10035
5 -0.508501
6       NaN      NaN
7       NaN      NaN

小开

Another opltion covering cases where there might be severar spaces is by using the isspace() python function.

df[df.col_name.apply(lambda x:x.isspace() == False)] # will only return cases without empty spaces

adding NaN values:

df[(df.col_name.apply(lambda x:x.isspace() == False) & (~df.col_name.isna())]

小开

you also do something good:

text_empty = df['column name'].str.len() > -1

df.loc[text_empty].index

The results will be the rows which are empty & it's index number.

小开

You can use string methods with regex to find cells with empty strings:

df[~df.column_name.str.contains('\w')].column_name.count()

小开

In my opinion, don't waste time and just replace with NaN! Then, search all entries with Na. (This is correct because empty values are missing values anyway).

import numpy as np                             # to use np.nan
import pandas as pd                            # to use replace
    

df = df.replace(' ', np.nan)                   # to get rid of empty values
nan_values = df[df.isna().any(axis=1)]         # to get all rows with Na


nan_values                                     # view df with NaN rows only