熊猫-过滤无值

我用熊猫来探索一些数据集,我有这样的数据框架:

enter image description here

我想排除在 City列中有值的任何行,所以我尝试了:

new_df = all_df[(all_df["City"] == "None") ]
new_df

但是我得到了一个空的数据框:

enter image description here

当我使用 None以外的任何值时,它都能正常工作。有什么办法可以过滤这个数据帧吗?

114801 次浏览

I hope "where" can do what you expect

new_df = new_df.where(new_df["city"], None)

And it is better use np.nan rather than None.

For more details pandas.DataFrame.where

Try this to select only the None values for city column:

new_df = all_df['City'][all_df['City'] == "None"]

Try this to see all other columns which has the same rows of 'City'==None

new_df = all_df[all_df['City'] == "None"]
print(new_df.head()) # with function head() you can see the first 5 rows

Consider using isnull() to locate missing values

all_df[all_df['City'].isnull()]

If it's desired to filter multiple rows with None values, we could use any, all or sum. For example, for df given below:

   FACTS_Value    Region      City  Village
0        16482  Al Bahah      None     None
1        22522  Al Bahah   Al Aqiq     None
2        12444  Al Bahah   Al Aqiq  Al Aqiq
3        12823  Al Bahah  Al Bahah  Al Aqiq
4        11874      None      None     None

If we want to select all rows with None value in at least one column, we could use isna + any on axis to build a boolean mask:

msk = df.isna().any(axis=1)
out = df[msk]

Output:

   FACTS_Value    Region     City Village
0        16482  Al Bahah     None    None
1        22522  Al Bahah  Al Aqiq    None
4        11874      None     None    None

If we want the rows where all non-numeric column values are None, then we could use isna + all on axis:

msk = df.select_dtypes(exclude='number').isna().all(axis=1)

or

msk = df[['Region', 'City', 'Village']].isna().all(axis=1)
out = df[msk]

Output:

   FACTS_Value Region  City Village
4        11874   None  None    None

If we want to filter rows where there are exactly n None values, then we could use sum on axis + eq:

msk = df.isna().sum(axis=1) == 2
out = df[msk]

Output:

   FACTS_Value    Region  City Village
0        16482  Al Bahah  None    None

Another alternative is to use the query method:

In [3]: all_df.query('City != City')
Out[3]:
FACTS_Value    Region  City Village
0      34135.0  Al Bahah  None    None

Method to extract all the values other than None given its column name.

df = df[df.columnname.notna()]

Given, where None is python None, not 'None':

   FACTS_Value    Region      City  Village
0        16482  Al Bahah      None     None
1        22522  Al Bahah   Al Aqiq     None
2        12444  Al Bahah   Al Aqiq  Al Aqiq
3        12823  Al Bahah  Al Bahah  Al Aqiq
4        11874      None      None     None

Fix your None values:

df = df.fillna(np.nan)

Now the following works as expected:

df[df.City.isna()]
  1. None is a keyword, not a string, so don't use quotes.

  2. None == None gives True, but in custom classes the comparison operator can be overridden, so it's safer to use is None.

  3. Pandas provides the isna() function. So I suggest:

new_df = all_df[all_df['City'].isna()]