Python 列表中重复项的索引

有人知道如何获得 Python 列表中重复项的索引位置吗? 我已经尝试这样做,它一直只给我的索引的第一次出现的项目在列表中。

List = ['A', 'B', 'A', 'C', 'E']

我希望它能给我:

index 0: A
index 2: A
190263 次浏览
>>> def indices(lst, item):
...   return [i for i, x in enumerate(lst) if x == item]
...
>>> indices(List, "A")
[0, 2]

要获得所有副本,可以使用下面的方法,但是效率不是很高。如果效率很重要,那么应该考虑 Ignacio 的解决方案。

>>> dict((x, indices(List, x)) for x in set(List) if List.count(x) > 1)
{'A': [0, 2]}

至于用 listindex方法来解决这个问题,这个方法需要第二个可选参数来指示从哪里开始,所以你可以用前面的索引加1重复调用它。

>>> List.index("A")
0
>>> List.index("A", 1)
2
dups = collections.defaultdict(list)
for i, e in enumerate(L):
dups[e].append(i)
for k, v in sorted(dups.iteritems()):
if len(v) >= 2:
print '%s: %r' % (k, v)

从这里推断。

您需要将可选的第二个参数传递给 index,这是您希望 index 开始查找的位置。找到每个匹配项后,将此参数重置为刚好在找到匹配项之后的位置。

def list_duplicates_of(seq,item):
start_at = -1
locs = []
while True:
try:
loc = seq.index(item,start_at+1)
except ValueError:
break
else:
locs.append(loc)
start_at = loc
return locs


source = "ABABDBAAEDSBQEWBAFLSAFB"
print(list_duplicates_of(source, 'B'))

印刷品:

[1, 3, 5, 11, 15, 22]

通过使用默认设置来保存任何项目的所有可见位置的列表,并返回那些不止一次被看到的项目,您可以在单次通过源代码的过程中一次性找到所有重复项目。

from collections import defaultdict


def list_duplicates(seq):
tally = defaultdict(list)
for i,item in enumerate(seq):
tally[item].append(i)
return ((key,locs) for key,locs in tally.items()
if len(locs)>1)


for dup in sorted(list_duplicates(source)):
print(dup)

印刷品:

('A', [0, 2, 6, 7, 16, 20])
('B', [1, 3, 5, 11, 15, 22])
('D', [4, 9])
('E', [8, 13])
('F', [17, 21])
('S', [10, 19])

如果您想要针对同一个源对不同的键进行重复测试,您可以使用 function tools.part 创建一个新的函数变量,使用一个“部分完成”的参数列表,即指定 seq,但是省略要搜索的项:

from functools import partial
dups_in_source = partial(list_duplicates_of, source)


for c in "ABDEFS":
print(c, dups_in_source(c))

印刷品:

A [0, 2, 6, 7, 16, 20]
B [1, 3, 5, 11, 15, 22]
D [4, 9]
E [8, 13]
F [17, 21]
S [10, 19]

根据 Lazyr 的回答,在 Collection 模块中使用新的“ Counter”类:

>>> import collections
>>> def duplicates(n): #n="123123123"
...     counter=collections.Counter(n) #{'1': 3, '3': 3, '2': 3}
...     dups=[i for i in counter if counter[i]!=1] #['1','3','2']
...     result={}
...     for item in dups:
...             result[item]=[i for i,j in enumerate(n) if j==item]
...     return result
...
>>> duplicates("123123123")
{'1': [0, 3, 6], '3': [2, 5, 8], '2': [1, 4, 7]}
from collections import Counter, defaultdict


def duplicates(lst):
cnt= Counter(lst)
return [key for key in cnt.keys() if cnt[key]> 1]


def duplicates_indices(lst):
dup, ind= duplicates(lst), defaultdict(list)
for i, v in enumerate(lst):
if v in dup: ind[v].append(i)
return ind


lst= ['a', 'b', 'a', 'c', 'b', 'a', 'e']
print duplicates(lst) # ['a', 'b']
print duplicates_indices(lst) # ..., {'a': [0, 2, 5], 'b': [1, 4]})

一个稍微正交一点(因此更有用)的实现应该是:

from collections import Counter, defaultdict


def duplicates(lst):
cnt= Counter(lst)
return [key for key in cnt.keys() if cnt[key]> 1]


def indices(lst, items= None):
items, ind= set(lst) if items is None else items, defaultdict(list)
for i, v in enumerate(lst):
if v in items: ind[v].append(i)
return ind


lst= ['a', 'b', 'a', 'c', 'b', 'a', 'e']
print indices(lst, duplicates(lst)) # ..., {'a': [0, 2, 5], 'b': [1, 4]})

我想我找到了一个简单的解决办法:

if elem in string_list:
counter = 0
elem_pos = []
for i in string_list:
if i == elem:
elem_pos.append(counter)
counter = counter + 1
print(elem_pos)

这将打印一个列表,提供特定元素(“ elem”)的索引

我将提到在列表中处理重复的更明显的方法。就复杂性而言,字典是最佳选择,因为每次查找都是 O (1)。如果你只对复制品感兴趣,你会更聪明。

my_list = [1,1,2,3,4,5,5]
my_dict = {}
for (ind,elem) in enumerate(my_list):
if elem in my_dict:
my_dict[elem].append(ind)
else:
my_dict.update({elem:[ind]})


for key,value in my_dict.iteritems():
if len(value) > 1:
print "key(%s) has indices (%s)" %(key,value)

印刷如下:

key(1) has indices ([0, 1])
key(5) has indices ([5, 6])

我对这里建议的所有解决方案做了一个基准测试,并且还为这个问题添加了另一个解决方案(在答案的末尾描述)。

基准

首先是基准。我在范围 [1, n/2]内初始化一个 n随机整数列表,然后在所有算法上调用 timeit

@ Paul McGuire和@Ignacio Vazquez-Abrams的解的运行速度是100整数列表中其他解的两倍:

Testing algorithm on the list of 100 items using 10000 loops
Algorithm: dupl_eat
Timing: 1.46247477189
####################
Algorithm: dupl_utdemir
Timing: 2.93324529055
####################
Algorithm: dupl_lthaulow
Timing: 3.89198786645
####################
Algorithm: dupl_pmcguire
Timing: 0.583058259784
####################
Algorithm: dupl_ivazques_abrams
Timing: 0.645062989076
####################
Algorithm: dupl_rbespal
Timing: 1.06523873786
####################

如果你把项目数改为1000,差别就会大得多(顺便说一句,如果有人能解释一下原因,我会很高兴的) :

Testing algorithm on the list of 1000 items using 1000 loops
Algorithm: dupl_eat
Timing: 5.46171654555
####################
Algorithm: dupl_utdemir
Timing: 25.5582547323
####################
Algorithm: dupl_lthaulow
Timing: 39.284285326
####################
Algorithm: dupl_pmcguire
Timing: 0.56558489513
####################
Algorithm: dupl_ivazques_abrams
Timing: 0.615980005148
####################
Algorithm: dupl_rbespal
Timing: 1.21610942322
####################

在更大的列表中,@Paul McGuire的解决方案仍然是最有效的,我的算法开始出现问题。

Testing algorithm on the list of 1000000 items using 1 loops
Algorithm: dupl_pmcguire
Timing: 1.5019953958
####################
Algorithm: dupl_ivazques_abrams
Timing: 1.70856155898
####################
Algorithm: dupl_rbespal
Timing: 3.95820421595
####################

基准测试的完整代码是 给你

另一个算法

下面是我对同一问题的解决方案:

def dupl_rbespal(c):
alreadyAdded = False
dupl_c = dict()
sorted_ind_c = sorted(range(len(c)), key=lambda x: c[x]) # sort incoming list but save the indexes of sorted items


for i in xrange(len(c) - 1): # loop over indexes of sorted items
if c[sorted_ind_c[i]] == c[sorted_ind_c[i+1]]: # if two consecutive indexes point to the same value, add it to the duplicates
if not alreadyAdded:
dupl_c[c[sorted_ind_c[i]]] = [sorted_ind_c[i], sorted_ind_c[i+1]]
alreadyAdded = True
else:
dupl_c[c[sorted_ind_c[i]]].append( sorted_ind_c[i+1] )
else:
alreadyAdded = False
return dupl_c

尽管它不是最好的,但是它允许我为我的问题生成一个稍微不同的结构(我需要一个相同值的索引链表之类的东西)

我只是简单地说:

i = [1,2,1,3]
k = 0
for ii in i:
if ii == 1 :
print ("index of 1 = ", k)
k = k+1

产出:

 index of 1 =  0


index of 1 =  2
string_list = ['A', 'B', 'C', 'B', 'D', 'B']
pos_list = []
for i in range(len(string_list)):
if string_list[i] = ='B':
pos_list.append(i)
print pos_list

哇,每个人的回答都这么长。我只是使用了 熊猫数据框架掩饰复制品函数(keep=False将所有重复项标记为 True,而不仅仅是第一个或最后一个) :

import pandas as pd
import numpy as np
np.random.seed(42)  # make results reproducible


int_df = pd.DataFrame({'int_list': np.random.randint(1, 20, size=10)})
dupes = int_df['int_list'].duplicated(keep=False)
print(int_df['int_list'][dupes].index)

这应该返回 Int64Index([0, 2, 3, 4, 6, 7, 9], dtype='int64')

a= [2,3,4,5,6,2,3,2,4,2]
search=2
pos=0
positions=[]


while (search in a):
pos+=a.index(search)
positions.append(pos)
a=a[a.index(search)+1:]
pos+=1


print "search found at:",positions
def index(arr, num):
for i, x in enumerate(arr):
if x == num:
print(x, i)

#index(List, 'A')

def find_duplicate(list_):
duplicate_list=[""]


for k in range(len(list_)):
if duplicate_list.__contains__(list_[k]):
continue
for j in range(len(list_)):
if k == j:
continue
if list_[k] == list_[j]:
duplicate_list.append(list_[j])
print("duplicate "+str(list_.index(list_[j]))+str(list_.index(list_[k])))

下面是一个适用于 多个副本的方法,您不需要指定任何值:

List = ['A', 'B', 'A', 'C', 'E', 'B'] # duplicate two 'A's two 'B's


ix_list = []
for i in range(len(List)):
try:
dup_ix = List[(i+1):].index(List[i]) + (i + 1) # dup onwards + (i + 1)
ix_list.extend([i, dup_ix]) # if found no error, add i also
except:
pass
    

ix_list.sort()


print(ix_list)
[0, 1, 2, 5]

pandas 1.2.2numpy的单行中:

 import numpy as np
import pandas as pd
 

idx = np.where(pd.DataFrame(List).duplicated(keep=False))

参数 keep=False将把每个重复数据标记为 True,而 np.where()将返回一个包含索引的数组,其中数组中的元素为 True

def dup_list(my_list, value):
'''
dup_list(list,value)
This function finds the indices of values in a list including duplicated values.


list: the list you are working on


value: the item of the list you want to find the index of


NB: if a value is duplcated, its indices are stored in a list
If only one occurence of the value, the index is stored as an integer.


Therefore use isinstance method to know how to handle the returned value
'''
value_list = []
index_list = []
index_of_duped = []


if my_list.count(value) == 1:
return my_list.index(value)
        

elif my_list.count(value) < 1:
return 'Your argument is not in the list'


else:
for item in my_list:
value_list.append(item)
length = len(value_list)
index = length - 1
index_list.append(index)


if item == value:
index_of_duped.append(max(index_list))


return index_of_duped


# function call eg dup_list(my_list, 'john')
def duplicates(list,dup):
a=[list.index(dup)]
for i in list:
try:
a.append(list.index(dup,a[-1]+1))
except:
for i in a:
print(f'index {i}: '+dup)
break
duplicates(['A', 'B', 'A', 'C', 'E'],'A')


Output:
index 0: A
index 2: A

如果你想得到不同类型的所有重复元素的索引,你可以试试这个解决方案:

# note: below list has more than one kind of duplicates
List = ['A', 'B', 'A', 'C', 'E', 'E', 'A', 'B', 'A', 'A', 'C']
d1 = {item:List.count(item) for item in List}  # item and their counts
elems = list(filter(lambda x: d1[x] > 1, d1))  # get duplicate elements
d2 = dict(zip(range(0, len(List)), List))  # each item and their indices


# item and their list of duplicate indices
res = {item: list(filter(lambda x: d2[x] == item, d2)) for item in elems}

现在,如果你 print(res)你会看到这个:

{'A': [0, 2, 6, 8, 9], 'B': [1, 7], 'C': [3, 10], 'E': [4, 5]}

这是一个很好的问题,有很多方法可以解决。

下面的代码是实现这一点的方法之一

letters = ["a", "b", "c", "d", "e", "a", "a", "b"]


lettersIndexes = [i for i in range(len(letters))] # i created a list that contains the indexes of my previous list
counter = 0
for item in letters:
if item == "a":
print(item, lettersIndexes[counter])
counter += 1 # for each item it increases the counter which means the index

获取索引的另一种方法,但这次存储在列表中

letters = ["a", "b", "c", "d", "e", "a", "a", "b"]
lettersIndexes = [i for i in range(len(letters)) if letters[i] == "a" ]
print(lettersIndexes) # as you can see we get a list of the indexes that we want.

再见

使用一种基于 setdefault实例方法的字典方法。

List = ['A', 'B', 'A', 'C', 'B', 'E', 'B']


# keep track of all indices of every term
duplicates = {}
for i, key in enumerate(List):
duplicates.setdefault(key, []).append(i)


# print only those terms with more than one index
template = 'index {}: {}'
for k, v in duplicates.items():
if len(v) > 1:
print(template.format(k, str(v).strip('][')))

备注: Counterdefaultdict和其他来自 collections的容器类是 dict的子类,因此也共享 setdefault方法