如何从熊猫数据框架中删除行列表?

小开

最佳答案

使用DataFrame.drop并传递给它一系列索引标签:

In [65]: df
Out[65]:
one  two
one      1    4
two      2    3
three    3    2
four     4    1




In [66]: df.drop(df.index[[1,3]])
Out[66]:
one  two
one      1    4
three    3    2

小开

请注意，当您想要执行下拉行时，使用“inplace”命令可能很重要。

df.drop(df.index[[1,3]], inplace=True)

因为你最初的问题没有返回任何东西，所以应该使用这个命令。 http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.DataFrame.drop.html < / p >

小开

你也可以传递给DataFrame.drop 标签本身(而不是一系列索引标签):

In[17]: df
Out[17]:
a         b         c         d         e
one  0.456558 -2.536432  0.216279 -1.305855 -0.121635
two -1.015127 -0.445133  1.867681  2.179392  0.518801


In[18]: df.drop('one')
Out[18]:
a         b         c         d         e
two -1.015127 -0.445133  1.867681  2.179392  0.518801

这相当于:

In[19]: df.drop(df.index[[0]])
Out[19]:
a         b         c         d         e
two -1.015127 -0.445133  1.867681  2.179392  0.518801

小开

在对@theodros-zelleke的回答的评论中，@j-jones询问如果索引不是唯一的该怎么办。我不得不处理这种情况。我所做的是在调用drop()之前重命名索引中的重复项，例如:

dropped_indexes = <determine-indexes-to-drop>
df.index = rename_duplicates(df.index)
df.drop(df.index[dropped_indexes], inplace=True)

其中rename_duplicates()是我定义的函数，它遍历index的元素并重命名重复项。我使用了与pd.read_csv()在列上使用的相同的重命名模式，即"%s.%d" % (name, count)，其中name是行名，而count是它之前出现的次数。

小开

如果DataFrame很大，并且要删除的行数也很大，那么简单地通过索引df.drop(df.index[])删除会花费太多时间。

在我的例子中，我有一个带有100M rows x 3 cols的浮点数的多索引DataFrame，我需要从中删除10k行。我找到的最快的方法是，完全违反直觉，take剩下的行。

设indexes_to_drop为要删除的位置索引数组(问题中的[1, 2, 4])。

indexes_to_keep = set(range(df.shape[0])) - set(indexes_to_drop)
df_sliced = df.take(list(indexes_to_keep))

在我的例子中，这占用了20.5s，而简单的df.drop占用了5min 27s，并消耗了大量内存。结果的数据帧是相同的。

小开

如果我想删除索引为x的一行，我将执行以下操作:

df = df[df.index != x]

如果我想删除多个索引(假设这些索引在列表unwanted_indices中)，我将这样做:

desired_indices = [i for i in len(df.index) if i not in unwanted_indices]
desired_df = df.iloc[desired_indices]

小开

我用了一个更简单的方法——只用了两步。

用不需要的行/数据创建一个数据帧。
使用这个不需要的数据帧的索引来删除原始数据帧中的行。

< p >例子:
假设你有一个数据框架df，它有很多列，包括'Age'，它是一个整数。现在让我们假设你想删除所有以'Age'为负数的行

df_age_negative = df[ df['Age'] < 0 ] # Step 1
df = df.drop(df_age_negative.index, axis=0) # Step 2

希望这是更简单的，并帮助你。

小开

这里有一个具体的例子，我想展示。假设在某些行中有许多重复的条目。如果您有字符串条目，您可以很容易地使用字符串方法找到要删除的所有索引。

ind_drop = df[df['column_of_strings'].apply(lambda x: x.startswith('Keyword'))].index

现在使用索引删除这些行

new_df = df.drop(ind_drop)

小开

如上所述，从布尔值中确定索引。

df[df['column'].isin(values)].index

是否比使用此方法确定索引更占用内存

pd.Index(np.where(df['column'].isin(values))[0])

像这样应用

df.drop(pd.Index(np.where(df['column'].isin(values))[0]), inplace = True)

这种方法在处理大数据帧和有限内存时非常有用。

小开

只使用Index参数删除行:-

df.drop(index = 2, inplace = True)

多行:-

df.drop(index=[1,3], inplace = True)

小开

考虑一个示例数据框架

df =
index    column1
0           00
1           10
2           20
3           30

我们想要删除第2和第3个索引行。

方法1:

df = df.drop(df.index[2,3])
or
df.drop(df.index[2,3],inplace=True)
print(df)


df =
index    column1
0           00
3           30


#This approach removes the rows as we wanted but the index remains unordered

方法2

df.drop(df.index[2,3],inplace=True,ignore_index=True)
print(df)
df =
index    column1
0           00
1           30
#This approach removes the rows as we wanted and resets the index.

小开

请看下面的数据框架df

df

   column1  column2  column3
0        1       11       21
1        2       12       22
2        3       13       23
3        4       14       24
4        5       15       25
5        6       16       26
6        7       17       27
7        8       18       28
8        9       19       29
9       10       20       30

删除第1列中所有奇数的行

创建一个列n1中所有元素的列表，并只保留那些偶数元素(您不想删除的元素)

keep_elements = [x for x in df.column1 if x%2==0]

所有列n1中包含[2, 4, 6, 8, 10]值的行将被保留或不被删除。

df.set_index('column1',inplace = True)
df.drop(df.index.difference(keep_elements),axis=0,inplace=True)
df.reset_index(inplace=True)

我们将columnn1作为索引，并删除所有不需要的行。然后我们将索引重置回来。 df < / p >

   column1  column2  column3
0        2       12       22
1        4       14       24
2        6       16       26
3        8       18       28
4       10       20       30

小开

要删除索引为1,2,4的行，您可以使用:

df[~df.index.isin([1, 2, 4])]

波浪符~对方法isin的结果求反。另一种选择是删除索引:

df.loc[df.index.drop([1, 2, 4])]

小开

正如Dennis Golomazov的回答是所建议的，使用drop删除行。您可以选择保留行。假设你有一个要删除的行索引列表，名为indices_to_drop。您可以将其转换为掩码，操作如下:

mask = np.ones(len(df), bool)
mask[indices_to_drop] = False

你可以直接使用这个索引:

df_new = df.iloc[mask]

这个方法的好处是mask可以来自任何来源:它可以是一个包含许多列的条件，或者其他东西。

真正好的事情是，你根本不需要原始DataFrame的索引，所以索引是否唯一并不重要。

缺点当然是不能用这种方法进行就地放置。

小开

这对我有用

# Create a list containing the index numbers you want to remove
index_list = list(range(42766, 42798))
df.drop(df.index[index_list], inplace =True)
df.shape

这将删除所创建范围内的所有索引