查找和替换列表中的元素

我必须在一个列表中搜索,并用另一个元素替换所有出现的元素。到目前为止,我在代码方面的尝试都没有取得什么进展,最好的方法是什么?

例如,假设我的列表有以下整数

>>> a = [1,2,3,4,5,1,2,3,4,5,1]

我需要将所有出现的数字1替换为值10,所以我需要的输出是

>>> a = [10, 2, 3, 4, 5, 10, 2, 3, 4, 5, 10]

因此,我的目标是将数字1的所有实例替换为数字10。

1283797 次浏览

尝试使用列表理解条件表达式

>>> a=[1,2,3,1,3,2,1,1]
>>> [4 if x==1 else x for x in a]
[4, 2, 3, 4, 3, 2, 4, 4]
>>> a=[1,2,3,4,5,1,2,3,4,5,1]
>>> item_to_replace = 1
>>> replacement_value = 6
>>> indices_to_replace = [i for i,x in enumerate(a) if x==item_to_replace]
>>> indices_to_replace
[0, 5, 10]
>>> for i in indices_to_replace:
...     a[i] = replacement_value
...
>>> a
[6, 2, 3, 4, 5, 6, 2, 3, 4, 5, 6]
>>>

在迭代列表时,可以使用内置的enumerate来获得索引和值。然后,使用该值来测试条件,并使用索引替换原始列表中的该值:

>>> a = [1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1]
>>> for i, n in enumerate(a):
...   if n == 1:
...      a[i] = 10
...
>>> a
[10, 2, 3, 4, 5, 10, 2, 3, 4, 5, 10]

列表理解工作得很好,使用enumerate进行循环可以节省一些内存(b/c操作基本上是在适当的位置完成的)。

还有函数式编程。参见地图的用法:

>>> a = [1,2,3,2,3,4,3,5,6,6,5,4,5,4,3,4,3,2,1]
>>> map(lambda x: x if x != 4 else 'sss', a)
[1, 2, 3, 2, 3, 'sss', 3, 5, 6, 6, 5, 'sss', 5, 'sss', 3, 'sss', 3, 2, 1]

如果你有几个值需要替换,你也可以使用字典:

a = [1, 2, 3, 4, 1, 5, 3, 2, 6, 1, 1]
replacements = {1:10, 2:20, 3:'foo'}
replacer = replacements.get  # For faster gets.


print([replacer(n, n) for n in a])


> [10, 20, 'foo', 4, 10, 5, 'foo', 20, 6, 10, 10]

注意,这种方法仅在要替换的元素是可哈希的情况下才有效。这是因为字典键必须是可哈希的。

我知道这是一个非常古老的问题,有无数种方法来解决它。我发现更简单的是使用numpy包。

import numpy


arr = numpy.asarray([1, 6, 1, 9, 8])
arr[ arr == 8 ] = 0 # change all occurrences of 8 by 0
print(arr)

在长列表和很少出现的情况下,使用list.index()比其他答案中给出的单步迭代方法快3倍左右。

def list_replace(lst, old=1, new=10):
"""replace list elements (inplace)"""
i = -1
try:
while True:
i = lst.index(old, i + 1)
lst[i] = new
except ValueError:
pass

我的用例是用一些默认值替换None

我已经计算了这里给出的解决这个问题的方法,包括@kxr的方法——使用str.count

使用Python 3.8.1在ipython中测试代码:

def rep1(lst, replacer = 0):
''' List comprehension, new list '''


return [item if item is not None else replacer for item in lst]




def rep2(lst, replacer = 0):
''' List comprehension, in-place '''
lst[:] =  [item if item is not None else replacer for item in lst]


return lst




def rep3(lst, replacer = 0):
''' enumerate() with comparison - in-place '''
for idx, item in enumerate(lst):
if item is None:
lst[idx] = replacer


return lst




def rep4(lst, replacer = 0):
''' Using str.index + Exception, in-place '''


idx = -1
# none_amount = lst.count(None)
while True:
try:
idx = lst.index(None, idx+1)
except ValueError:
break
else:
lst[idx] = replacer


return lst




def rep5(lst, replacer = 0):
''' Using str.index + str.count, in-place '''


idx = -1
for _ in range(lst.count(None)):
idx = lst.index(None, idx+1)
lst[idx] = replacer


return lst




def rep6(lst, replacer = 0):
''' Using map, return map iterator '''


return map(lambda item: item if item is not None else replacer, lst)




def rep7(lst, replacer = 0):
''' Using map, return new list '''


return list(map(lambda item: item if item is not None else replacer, lst))




lst = [5]*10**6
# lst = [None]*10**6


%timeit rep1(lst)
%timeit rep2(lst)
%timeit rep3(lst)
%timeit rep4(lst)
%timeit rep5(lst)
%timeit rep6(lst)
%timeit rep7(lst)

我得到:

26.3 ms ± 163 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
29.3 ms ± 206 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
33.8 ms ± 191 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
11.9 ms ± 37.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
11.9 ms ± 60.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
260 ns ± 1.84 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
56.5 ms ± 204 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

使用内部str.index实际上比任何手动比较都要快。

我不知道测试4中的异常是否会比使用str.count更费力,区别似乎可以忽略不计。

注意map()(测试6)返回一个迭代器,而不是一个实际的列表,因此测试7。

对于这个古老但相关的问题,答案的速度千差万别。

kxr发布的解决方案最快

然而,这甚至是,否则不在这里:

def f1(arr, find, replace):
# fast and readable
base=0
for cnt in range(arr.count(find)):
offset=arr.index(find, base)
arr[offset]=replace
base=offset+1

下面是各种解决方案的时间安排。比较快的答案是快3倍,而比较慢的答案是快5倍

公平地说,所有的方法都需要对发送给函数的数组进行替换。

请参阅以下计时代码:

def f1(arr, find, replace):
# fast and readable
base=0
for cnt in range(arr.count(find)):
offset=arr.index(find, base)
arr[offset]=replace
base=offset+1
        

def f2(arr,find,replace):
# accepted answer
for i,e in enumerate(arr):
if e==find:
arr[i]=replace
        

def f3(arr,find,replace):
# in place list comprehension
arr[:]=[replace if e==find else e for e in arr]
    

def f4(arr,find,replace):
# in place map and lambda -- SLOW
arr[:]=list(map(lambda x: x if x != find else replace, arr))
    

def f5(arr,find,replace):
# find index with comprehension
for i in [i for i, e in enumerate(arr) if e==find]:
arr[i]=replace
        

def f6(arr,find,replace):
# FASTEST but a little les clear
try:
while True:
arr[arr.index(find)]=replace
except ValueError:
pass


def f7(lst, old, new):
"""replace list elements (inplace)"""
i = -1
try:
while 1:
i = lst.index(old, i + 1)
lst[i] = new
except ValueError:
pass
    

    

import time


def cmpthese(funcs, args=(), cnt=1000, rate=True, micro=True):
"""Generate a Perl style function benchmark"""
def pprint_table(table):
"""Perl style table output"""
def format_field(field, fmt='{:,.0f}'):
if type(field) is str: return field
if type(field) is tuple: return field[1].format(field[0])
return fmt.format(field)


def get_max_col_w(table, index):
return max([len(format_field(row[index])) for row in table])


col_paddings=[get_max_col_w(table, i) for i in range(len(table[0]))]
for i,row in enumerate(table):
# left col
row_tab=[row[0].ljust(col_paddings[0])]
# rest of the cols
row_tab+=[format_field(row[j]).rjust(col_paddings[j]) for j in range(1,len(row))]
print(' '.join(row_tab))


results={}
for i in range(cnt):
for f in funcs:
start=time.perf_counter_ns()
f(*args)
stop=time.perf_counter_ns()
results.setdefault(f.__name__, []).append(stop-start)
results={k:float(sum(v))/len(v) for k,v in results.items()}
fastest=sorted(results,key=results.get, reverse=True)
table=[['']]
if rate: table[0].append('rate/sec')
if micro: table[0].append('\u03bcsec/pass')
table[0].extend(fastest)
for e in fastest:
tmp=[e]
if rate:
tmp.append('{:,}'.format(int(round(float(cnt)*1000000.0/results[e]))))


if micro:
tmp.append('{:,.1f}'.format(results[e]/float(cnt)))


for x in fastest:
if x==e: tmp.append('--')
else: tmp.append('{:.1%}'.format((results[x]-results[e])/results[e]))
table.append(tmp)


pprint_table(table)






if __name__=='__main__':
import sys
import time
print(sys.version)
cases=(
('small, found', 9, 100),
('small, not found', 99, 100),
('large, found', 9, 1000),
('large, not found', 99, 1000)
)
for txt, tgt, mul in cases:
print(f'\n{txt}:')
arr=[1,2,3,4,5,6,7,8,9,0]*mul
args=(arr,tgt,'X')
cmpthese([f1,f2,f3, f4, f5, f6, f7],args)

结果是:

3.9.1 (default, Feb  3 2021, 07:38:02)
[Clang 12.0.0 (clang-1200.0.32.29)]


small, found:
rate/sec μsec/pass     f4     f3     f5     f2     f6     f7     f1
f4  133,982       7.5     -- -38.8% -49.0% -52.5% -78.5% -78.6% -82.9%
f3  219,090       4.6  63.5%     -- -16.6% -22.4% -64.8% -65.0% -72.0%
f5  262,801       3.8  96.1%  20.0%     --  -6.9% -57.8% -58.0% -66.4%
f2  282,259       3.5 110.7%  28.8%   7.4%     -- -54.6% -54.9% -63.9%
f6  622,122       1.6 364.3% 184.0% 136.7% 120.4%     --  -0.7% -20.5%
f7  626,367       1.6 367.5% 185.9% 138.3% 121.9%   0.7%     -- -19.9%
f1  782,307       1.3 483.9% 257.1% 197.7% 177.2%  25.7%  24.9%     --


small, not found:
rate/sec μsec/pass     f4     f5     f2     f3     f6     f7     f1
f4   13,846      72.2     -- -40.3% -41.4% -47.8% -85.2% -85.4% -86.2%
f5   23,186      43.1  67.5%     --  -1.9% -12.5% -75.2% -75.5% -76.9%
f2   23,646      42.3  70.8%   2.0%     -- -10.8% -74.8% -75.0% -76.4%
f3   26,512      37.7  91.5%  14.3%  12.1%     -- -71.7% -72.0% -73.5%
f6   93,656      10.7 576.4% 303.9% 296.1% 253.3%     --  -1.0%  -6.5%
f7   94,594      10.6 583.2% 308.0% 300.0% 256.8%   1.0%     --  -5.6%
f1  100,206      10.0 623.7% 332.2% 323.8% 278.0%   7.0%   5.9%     --


large, found:
rate/sec μsec/pass     f4     f2     f5     f3     f6     f7     f1
f4      145   6,889.4     -- -33.3% -34.8% -48.6% -85.3% -85.4% -85.8%
f2      218   4,593.5  50.0%     --  -2.2% -22.8% -78.0% -78.1% -78.6%
f5      223   4,492.4  53.4%   2.3%     -- -21.1% -77.5% -77.6% -78.2%
f3      282   3,544.0  94.4%  29.6%  26.8%     -- -71.5% -71.6% -72.3%
f6      991   1,009.5 582.4% 355.0% 345.0% 251.1%     --  -0.4%  -2.8%
f7      995   1,005.4 585.2% 356.9% 346.8% 252.5%   0.4%     --  -2.4%
f1    1,019     981.3 602.1% 368.1% 357.8% 261.2%   2.9%   2.5%     --


large, not found:
rate/sec μsec/pass     f4     f5     f2     f3     f6     f7     f1
f4      147   6,812.0     -- -35.0% -36.4% -48.9% -85.7% -85.8% -86.1%
f5      226   4,424.8  54.0%     --  -2.0% -21.3% -78.0% -78.1% -78.6%
f2      231   4,334.9  57.1%   2.1%     -- -19.6% -77.6% -77.7% -78.2%
f3      287   3,484.0  95.5%  27.0%  24.4%     -- -72.1% -72.2% -72.8%
f6    1,028     972.3 600.6% 355.1% 345.8% 258.3%     --  -0.4%  -2.7%
f7    1,033     968.2 603.6% 357.0% 347.7% 259.8%   0.4%     --  -2.3%
f1    1,057     946.2 619.9% 367.6% 358.1% 268.2%   2.8%   2.3%     --

我可能是个笨蛋,但我会为它写一个单独的简单函数:

def convertElements( oldlist, convert_dict ):
newlist = []
for e in oldlist:
if e in convert_dict:
newlist.append(convert_dict[e])
else:
newlist.append(e)
return newlist

然后根据需要调用它,如下所示:

a = [1,2,3,4,5,1,2,3,4,5,1]
a_new = convertElements(a, {1: 10})
## OUTPUT: a_new=[10, 2, 3, 4, 5, 10, 2, 3, 4, 5, 10]