从列表中删除多个元素

是否可以同时从列表中删除多个元素?如果我想删除索引为0和2的元素,并尝试类似del somelist[0],后跟del somelist[2]的语句,第二个语句实际上将删除somelist[3]

我想我总是可以先删除编号较高的元素,但我希望有更好的方法。

295620 次浏览

If you're deleting multiple non-adjacent items, then what you describe is the best way (and yes, be sure to start from the highest index).

If your items are adjacent, you can use the slice assignment syntax:

a[2:10] = []

You can use enumerate and remove the values whose index matches the indices you want to remove:

indices = 0, 2
somelist = [i for j, i in enumerate(somelist) if j not in indices]

As a function:

def multi_delete(list_, *args):
indexes = sorted(list(args), reverse=True)
for index in indexes:
del list_[index]
return list_

Runs in n log(n) time, which should make it the fastest correct solution yet.

You can do that way on a dict, not on a list. In a list elements are in sequence. In a dict they depend only on the index.

Simple code just to explain it by doing:

>>> lst = ['a','b','c']
>>> dct = {0: 'a', 1: 'b', 2:'c'}
>>> lst[0]
'a'
>>> dct[0]
'a'
>>> del lst[0]
>>> del dct[0]
>>> lst[0]
'b'
>>> dct[0]
Traceback (most recent call last):
File "<pyshell#19>", line 1, in <module>
dct[0]
KeyError: 0
>>> dct[1]
'b'
>>> lst[1]
'c'

A way to "convert" a list in a dict is:

>>> dct = {}
>>> for i in xrange(0,len(lst)): dct[i] = lst[i]

The inverse is:

lst = [dct[i] for i in sorted(dct.keys())]

Anyway I think it's better to start deleting from the higher index as you said.

I can actually think of two ways to do it:

  1. slice the list like (this deletes the 1st,3rd and 8th elements)

    somelist = somelist[1:2]+somelist[3:7]+somelist[8:]

  2. do that in place, but one at a time:

    somelist.pop(2) somelist.pop(0)

So, you essentially want to delete multiple elements in one pass? In that case, the position of the next element to delete will be offset by however many were deleted previously.

Our goal is to delete all the vowels, which are precomputed to be indices 1, 4, and 7. Note that its important the to_delete indices are in ascending order, otherwise it won't work.

to_delete = [1, 4, 7]
target = list("hello world")
for offset, index in enumerate(to_delete):
index -= offset
del target[index]

It'd be a more complicated if you wanted to delete the elements in any order. IMO, sorting to_delete might be easier than figuring out when you should or shouldn't subtract from index.

As a specialisation of Greg's answer, you can even use extended slice syntax. eg. If you wanted to delete items 0 and 2:

>>> a= [0, 1, 2, 3, 4]
>>> del a[0:3:2]
>>> a
[1, 3, 4]

This doesn't cover any arbitrary selection, of course, but it can certainly work for deleting any two items.

here is another method which removes the elements in place. also if your list is really long, it is faster.

>>> a = range(10)
>>> remove = [0,4,5]
>>> from collections import deque
>>> deque((list.pop(a, i) for i in sorted(remove, reverse=True)), maxlen=0)


>>> timeit.timeit('[i for j, i in enumerate(a) if j not in remove]', setup='import random;remove=[random.randrange(100000) for i in range(100)]; a = range(100000)', number=1)
0.1704120635986328


>>> timeit.timeit('deque((list.pop(a, i) for i in sorted(remove, reverse=True)), maxlen=0)', setup='from collections import deque;import random;remove=[random.randrange(100000) for i in range(100)]; a = range(100000)', number=1)
0.004853963851928711

I'm a total beginner in Python, and my programming at the moment is crude and dirty to say the least, but my solution was to use a combination of the basic commands I learnt in early tutorials:

some_list = [1,2,3,4,5,6,7,8,10]
rem = [0,5,7]


for i in rem:
some_list[i] = '!' # mark for deletion


for i in range(0, some_list.count('!')):
some_list.remove('!') # remove
print some_list

Obviously, because of having to choose a "mark-for-deletion" character, this has its limitations.

As for the performance as the size of the list scales, I'm sure that my solution is sub-optimal. However, it's straightforward, which I hope appeals to other beginners, and will work in simple cases where some_list is of a well-known format, e.g., always numeric...

Here is an alternative, that does not use enumerate() to create tuples (as in SilentGhost's original answer).

This seems more readable to me. (Maybe I'd feel differently if I was in the habit of using enumerate.) CAVEAT: I have not tested performance of the two approaches.

# Returns a new list. "lst" is not modified.
def delete_by_indices(lst, indices):
indices_as_set = set(indices)
return [ lst[i] for i in xrange(len(lst)) if i not in indices_as_set ]

NOTE: Python 2.7 syntax. For Python 3, xrange => range.

Usage:

lst = [ 11*x for x in xrange(10) ]
somelist = delete_by_indices( lst, [0, 4, 5])

somelist:

[11, 22, 33, 66, 77, 88, 99]

--- BONUS ---

Delete multiple values from a list. That is, we have the values we want to delete:

# Returns a new list. "lst" is not modified.
def delete__by_values(lst, values):
values_as_set = set(values)
return [ x for x in lst if x not in values_as_set ]

Usage:

somelist = delete__by_values( lst, [0, 44, 55] )

somelist:

[11, 22, 33, 66, 77, 88, 99]

This is the same answer as before, but this time we supplied the VALUES to be deleted [0, 44, 55].

Remove method will causes a lot of shift of list elements. I think is better to make a copy:

...
new_list = []
for el in obj.my_list:
if condition_is_true(el):
new_list.append(el)
del obj.my_list
obj.my_list = new_list
...

This has been mentioned, but somehow nobody managed to actually get it right.

On O(n) solution would be:

indices = {0, 2}
somelist = [i for j, i in enumerate(somelist) if j not in indices]

This is really close to SilentGhost's version, but adds two braces.

For some reason I don't like any of the answers here. Yes, they work, but strictly speaking most of them aren't deleting elements in a list, are they? (But making a copy and then replacing the original one with the edited copy).

Why not just delete the higher index first?

Is there a reason for this? I would just do:

for i in sorted(indices, reverse=True):
del somelist[i]

If you really don't want to delete items backwards, then I guess you should just deincrement the indices values which are greater than the last deleted index (can't really use the same index since you're having a different list) or use a copy of the list (which wouldn't be 'deleting' but replacing the original with an edited copy).

Am I missing something here, any reason to NOT delete in the reverse order?

technically, the answer is NO it is not possible to delete two objects AT THE SAME TIME. However, it IS possible to delete two objects in one line of beautiful python.

del (foo['bar'],foo['baz'])

will recusrively delete foo['bar'], then foo['baz']

You can use numpy.delete as follows:

import numpy as np
a = ['a', 'l', 3.14, 42, 'u']
I = [0, 2]
np.delete(a, I).tolist()
# Returns: ['l', '42', 'u']

If you don't mind ending up with a numpy array at the end, you can leave out the .tolist(). You should see some pretty major speed improvements, too, making this a more scalable solution. I haven't benchmarked it, but numpy operations are compiled code written in either C or Fortran.

An alternative list comprehension method that uses list index values:

stuff = ['a', 'b', 'c', 'd', 'e', 'f', 'woof']
index = [0, 3, 6]
new = [i for i in stuff if stuff.index(i) not in index]

This returns:

['b', 'c', 'e', 'f']

we can do this by use of a for loop iterating over the indexes after sorting the index list in descending order

mylist=[66.25, 333, 1, 4, 6, 7, 8, 56, 8769, 65]
indexes = 4,6
indexes = sorted(indexes, reverse=True)
for i in index:
mylist.pop(i)
print mylist

For the indices 0 and 2 from listA:

for x in (2,0): listA.pop(x)

For some random indices to remove from listA:

indices=(5,3,2,7,0)
for x in sorted(indices)[::-1]: listA.pop(x)

To generalize the comment from @sth. Item deletion in any class, that implements abc.MutableSequence, and in list in particular, is done via __delitem__ magic method. This method works similar to __getitem__, meaning it can accept either an integer or a slice. Here is an example:

class MyList(list):
def __delitem__(self, item):
if isinstance(item, slice):
for i in range(*item.indices(len(self))):
self[i] = 'null'
else:
self[item] = 'null'




l = MyList(range(10))
print(l)
del l[5:8]
print(l)

This will output

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 'null', 'null', 'null', 8, 9]

Importing it only for this reason might be overkill, but if you happen to be using pandas anyway, then the solution is simple and straightforward:

import pandas as pd
stuff = pd.Series(['a','b','a','c','a','d'])
less_stuff = stuff[stuff != 'a']  # define any condition here
# results ['b','c','d']

I wanted to a way to compare the different solutions that made it easy to turn the knobs.

First I generated my data:

import random


N = 16 * 1024
x = range(N)
random.shuffle(x)
y = random.sample(range(N), N / 10)

Then I defined my functions:

def list_set(value_list, index_list):
index_list = set(index_list)
result = [value for index, value in enumerate(value_list) if index not in index_list]
return result


def list_del(value_list, index_list):
for index in sorted(index_list, reverse=True):
del(value_list[index])


def list_pop(value_list, index_list):
for index in sorted(index_list, reverse=True):
value_list.pop(index)

Then I used timeit to compare the solutions:

import timeit
from collections import OrderedDict


M = 1000
setup = 'from __main__ import x, y, list_set, list_del, list_pop'
statement_dict = OrderedDict([
('overhead',  'a = x[:]'),
('set', 'a = x[:]; list_set(a, y)'),
('del', 'a = x[:]; list_del(a, y)'),
('pop', 'a = x[:]; list_pop(a, y)'),
])


overhead = None
result_dict = OrderedDict()
for name, statement in statement_dict.iteritems():
result = timeit.timeit(statement, number=M, setup=setup)
if overhead is None:
overhead = result
else:
result = result - overhead
result_dict[name] = result


for name, result in result_dict.iteritems():
print "%s = %7.3f" % (name, result)

Output

set =   1.711
del =   3.450
pop =   3.618

So the generator with the indices in a set was the winner. And del is slightly faster then pop.

l = ['a','b','a','c','a','d']
to_remove = [1, 3]
[l[i] for i in range(0, len(l)) if i not in to_remove])

It's basically the same as the top voted answer, just a different way of writing it. Note that using l.index() is not a good idea, because it can't handle duplicated elements in a list.

You can use this logic:

my_list = ['word','yes','no','nice']


c=[b for i,b in enumerate(my_list) if not i in (0,2,3)]


print c
some_list.remove(some_list[max(i, j)])

Avoids sorting cost and having to explicitly copy list.

Another implementation of the idea of removing from the highest index.

for i in range(len(yourlist)-1, -1, -1):
del yourlist(i)

How about one of these (I'm very new to Python, but they seem ok):

ocean_basin = ['a', 'Atlantic', 'Pacific', 'Indian', 'a', 'a', 'a']
for i in range(1, (ocean_basin.count('a') + 1)):
ocean_basin.remove('a')
print(ocean_basin)

['Atlantic', 'Pacific', 'Indian']

ob = ['a', 'b', 4, 5,'Atlantic', 'Pacific', 'Indian', 'a', 'a', 4, 'a']
remove = ('a', 'b', 4, 5)
ob = [i for i in ob if i not in (remove)]
print(ob)

['Atlantic', 'Pacific', 'Indian']

None of the answers offered so far performs the deletion in place in O(n) on the length of the list for an arbitrary number of indices to delete, so here's my version:

def multi_delete(the_list, indices):
assert type(indices) in {set, frozenset}, "indices must be a set or frozenset"
offset = 0
for i in range(len(the_list)):
if i in indices:
offset += 1
elif offset:
the_list[i - offset] = the_list[i]
if offset:
del the_list[-offset:]


# Example:
a = [0, 1, 2, 3, 4, 5, 6, 7]
multi_delete(a, {1, 2, 4, 6, 7})
print(a)  # prints [0, 3, 5]

You can use remove, too.

delete_from_somelist = []
for i in [int(0), int(2)]:
delete_from_somelist.append(somelist[i])
for j in delete_from_somelist:
newlist = somelist.remove(j)

I put it all together into a list_diff function that simply takes two lists as inputs and returns their difference, while preserving the original order of the first list.

def list_diff(list_a, list_b, verbose=False):


# returns a difference of list_a and list_b,
# preserving the original order, unlike set-based solutions


# get indices of elements to be excluded from list_a
excl_ind = [i for i, x in enumerate(list_a) if x in list_b]
if verbose:
print(excl_ind)


# filter out the excluded indices, producing a new list
new_list = [i for i in list_a if list_a.index(i) not in excl_ind]
if verbose:
print(new_list)


return(new_list)

Sample usage:

my_list = ['a', 'b', 'c', 'd', 'e', 'f', 'woof']
# index = [0, 3, 6]


# define excluded names list
excl_names_list = ['woof', 'c']


list_diff(my_list, excl_names_list)
>> ['a', 'b', 'd', 'e', 'f']

You may want to simply use np.delete:

list_indices = [0, 2]
original_list = [0, 1, 2, 3]
new_list = np.delete(original_list, list_indices)

Output

array([1, 3])

Here, the first argument is the original list, the second is the index or a list of indices you want to delete.

There is a third argument which you can use in the case of having ndarrays: axis (0 for rows and 1 for columns in case of ndarrays).

Use numpy.delete which is definitely faster (376 times, as shown later) than python lists.

First method (using numpy):

import numpy as np


arr = np.array([0,3,5,7])
# [0,3,5,7]
indexes = [0,3]
np.delete(arr, indexes)
# [3,5]

Second method (using a python list):

arr = [0,3,5,7]
# [0,3,5,7]
indexes = [0,3]
for index in sorted(indexes, reverse=True):
del arr[index]
arr
# [3,5]

Code to benchmark the two methods on an array of 500000 elements, deleting half of the elements randomly:

import numpy as np
import random
import time


start = 0
stop = 500000
elements = np.arange(start,stop)
num_elements = len(temp)


temp = np.copy(elements)
temp2 = elements.tolist()


indexes = random.sample(range(0, num_elements), int(num_elements/2))


start_time = time.time()


temp = np.delete(temp, indexes)


end_time = time.time()
total_time = end_time - start_time
print("First method: ", total_time)


start_time = time.time()


for index in sorted(indexes, reverse=True):
del temp2[index]


end_time = time.time()
total_time = end_time - start_time
print("Second method: ", total_time)


# First method:  0.04500985145568848
# Second method:  16.94180393218994

The first method is about 376 times faster than the second one.

I tested the suggested solutions with perfplot and found that NumPy's

np.delete(lst, remove_ids)

is the fastest solution if the list is longer than about 100 entries. Before that, all solutions are around 10^-5 seconds. The list comprehension seems simple enough then:

out = [item for i, item in enumerate(lst) if i not in remove_ids]

enter image description here


Code to reproduce the plot:

import perfplot
import random
import numpy as np
import copy




def setup(n):
lst = list(range(n))
random.shuffle(lst)
# //10 = 10%
remove_ids = random.sample(range(n), n // 10)
return lst, remove_ids




def if_comprehension(lst, remove_ids):
return [item for i, item in enumerate(lst) if i not in remove_ids]




def del_list_inplace(lst, remove_ids):
out = copy.deepcopy(lst)
for i in sorted(remove_ids, reverse=True):
del out[i]
return out




def del_list_numpy(lst, remove_ids):
return np.delete(lst, remove_ids)




b = perfplot.bench(
setup=setup,
kernels=[if_comprehension, del_list_numpy, del_list_inplace],
n_range=[2**k for k in range(20)],
)
b.save("out.png")
b.show()