根据列表中(不)的索引值切片熊猫数据框

小开

import pandas as pd
df = pd.DataFrame(data=[5,6,7,8], index=[1,2,3,4], columns=['D',])
blacklist = [2,3]
#your current way ...
ix=[i for i in df.index if i not in blacklist]
df_select=df.loc[ix]


# use a mask
mask = [True if x else False for x in df.index if x not in blacklist]
df.loc[mask]

http://pandas.pydata.org/pandas-docs/dev/indexing.html#indexing-label actually, loc and iloc both take a boolean array, in this case the mask. from now on you can reuse this mask and should be more efficient.

小开

You could use set() to create the difference between your original indices and those that you want to remove:

df.loc[set(df.index) - set(blacklist)]

It has the advantage of being parsimonious, as well as being easier to read than a list comprehension.

小开

最佳答案

Use isin on the index and invert the boolean index to perform label selection:

In [239]:


df = pd.DataFrame({'a':np.random.randn(5)})
df
Out[239]:
a
0 -0.548275
1 -0.411741
2 -1.187369
3  1.028967
4 -2.755030
In [240]:


t = [2,4]
df.loc[~df.index.isin(t)]
Out[240]:
a
0 -0.548275
1 -0.411741
3  1.028967

小开

Thanks to ASGM; I found that I needed to turn the set into a list to make it work with a MultiIndex:

mi1 = pd.MultiIndex.from_tuples([("a", 1), ("a", 2), ("b", 1), ("b", 2)])
df1 = pd.DataFrame(data={"aaa":[1,2,3,4]}, index=mi1)
setValid = set(df1.index) - set([("a", 2)])
df1.loc[list(setValid)] # works
df1.loc[setValid] # fails

(sorry can't comment, insufficient rep)

小开

If you are looking for a way to select all rows that are outside a condition you can use np.invert() given that the condition returns an array of booleans.

df.loc[np.invert(({condition 1}) & (condition 2))]

小开

df = pd.DataFrame(data=[5,6,7,8], index=[1,2,3,4], columns=['D',])
blacklist = [2,3]


df.drop(blacklist,0)

小开

You can use the np.setdiff1d function which finds the set difference of two arrays.

index = np.array(blacklist)
not_index = np.setdiff1d(df.index.to_numpy(), index)
df.iloc[not_index]

小开

You could use difference() to obtain the difference between your original indices and those that you want to exclude:

df.loc[df.index.difference(blacklist), :]

It has the advantage of being easier to read.

小开

Inside query you can access your variable blacklist using @:

df.query('index != @blacklist')


# Or alternatively:
df.query('index not in @blacklist')