找到列表中最常见的元素

找到Python列表中最常见元素的有效方法是什么?

我的列表项可能不是哈希的,所以不能使用字典。 同样,在抽取的情况下,应返回索引最低的项。例子:< / p >

>>> most_common(['duck', 'duck', 'goose'])
'duck'
>>> most_common(['goose', 'duck', 'duck', 'goose'])
'goose'
355966 次浏览

如果它们是不可哈希的,您可以对它们进行排序,并对结果进行一次循环,以计数项(相同的项将彼此相邻)。但是使它们可哈希并使用字典可能更快。

def most_common(lst):
cur_length = 0
max_length = 0
cur_i = 0
max_i = 0
cur_item = None
max_item = None
for i, item in sorted(enumerate(lst), key=lambda x: x[1]):
if cur_item is None or cur_item != item:
if cur_length > max_length or (cur_length == max_length and cur_i < max_i):
max_length = cur_length
max_i = cur_i
max_item = cur_item
cur_length = 1
cur_i = i
cur_item = item
else:
cur_length += 1
if cur_length > max_length or (cur_length == max_length and cur_i < max_i):
return cur_item
return max_item

对列表的一个副本排序并找到运行时间最长的。您可以在用每个元素的索引对列表排序之前对其进行修饰,然后在并列的情况下选择从最低索引开始的运行。

如果排序和哈希都不可行的情况下,这是明显的慢速解决方案(O(n²)),但相等比较(==)可用:

def most_common(items):
if not items:
raise ValueError
fitems = []
best_idx = 0
for item in items:
item_missing = True
i = 0
for fitem in fitems:
if fitem[0] == item:
fitem[1] += 1
d = fitem[1] - fitems[best_idx][1]
if d > 0 or (d == 0 and fitems[best_idx][2] > fitem[2]):
best_idx = i
item_missing = False
break
i += 1
if item_missing:
fitems.append([item, 1, i])
return items[best_idx]

但是,如果你的列表(n)的长度很大,那么让你的项目可哈希或可排序(正如其他答案所建议的那样)几乎总是能更快地找到最常见的元素。哈希时平均为O(n),排序时最差为O(n*log(n))。

在这里:

def most_common(l):
max = 0
maxitem = None
for x in set(l):
count =  l.count(x)
if count > max:
max = count
maxitem = x
return maxitem

我有一种模糊的感觉,在标准库的某个地方有一个方法可以给你每个元素的计数,但我找不到它。

>>> li  = ['goose', 'duck', 'duck']


>>> def foo(li):
st = set(li)
mx = -1
for each in st:
temp = li.count(each):
if mx < temp:
mx = temp
h = each
return h


>>> foo(li)
'duck'

一行程序:

def most_common (lst):
return max(((item, lst.count(item)) for item in set(lst)), key=lambda a: a[1])[0]

简单的一行代码:

def most_common(lst):
return max(set(lst), key=lst.count)
# use Decorate, Sort, Undecorate to solve the problem


def most_common(iterable):
# Make a list with tuples: (item, index)
# The index will be used later to break ties for most common item.
lst = [(x, i) for i, x in enumerate(iterable)]
lst.sort()


# lst_final will also be a list of tuples: (count, index, item)
# Sorting on this list will find us the most common item, and the index
# will break ties so the one listed first wins.  Count is negative so
# largest count will have lowest value and sort first.
lst_final = []


# Get an iterator for our new list...
itr = iter(lst)


# ...and pop the first tuple off.  Setup current state vars for loop.
count = 1
tup = next(itr)
x_cur, i_cur = tup


# Loop over sorted list of tuples, counting occurrences of item.
for tup in itr:
# Same item again?
if x_cur == tup[0]:
# Yes, same item; increment count
count += 1
else:
# No, new item, so write previous current item to lst_final...
t = (-count, i_cur, x_cur)
lst_final.append(t)
# ...and reset current state vars for loop.
x_cur, i_cur = tup
count = 1


# Write final item after loop ends
t = (-count, i_cur, x_cur)
lst_final.append(t)


lst_final.sort()
answer = lst_final[0][2]


return answer


print most_common(['x', 'e', 'a', 'e', 'a', 'e', 'e']) # prints 'e'
print most_common(['goose', 'duck', 'duck', 'goose']) # prints 'goose'

这是O(n)解。

mydict   = {}
cnt, itm = 0, ''
for item in reversed(lst):
mydict[item] = mydict.get(item, 0) + 1
if mydict[item] >= cnt :
cnt, itm = mydict[item], item


print itm

(reversed用于确保它返回最低的索引项)

提出了这么多解决方案,我很惊讶没有人提出我认为明显的一个(对于不可哈希但可比的元素)——[itertools.groupby][1]。itertools提供了快速、可重用的功能,并允许您将一些棘手的逻辑委托给经过良好测试的标准库组件。举个例子:

import itertools
import operator


def most_common(L):
# get an iterable of (item, iterable) pairs
SL = sorted((x, i) for i, x in enumerate(L))
# print 'SL:', SL
groups = itertools.groupby(SL, key=operator.itemgetter(0))
# auxiliary function to get "quality" for an item
def _auxfun(g):
item, iterable = g
count = 0
min_index = len(L)
for _, where in iterable:
count += 1
min_index = min(min_index, where)
# print 'item %r, count %r, minind %r' % (item, count, min_index)
return count, -min_index
# pick the highest-count/earliest item
return max(groups, key=_auxfun)[0]

当然,这可以写得更简洁,但我的目标是尽可能清晰。这两个print语句可以取消注释,以便更好地看到运行中的机制;例如,打印不加注释:

print most_common(['goose', 'duck', 'duck', 'goose'])

发出:

SL: [('duck', 1), ('duck', 2), ('goose', 0), ('goose', 3)]
item 'duck', count 2, minind 1
item 'goose', count 2, minind 0
goose

如你所见,SL是一个对列表,每对都是一个项,后面跟着该项在原始列表中的索引(为了实现关键条件,如果具有相同最高计数的“最常见”项是> 1,则结果必须是最早出现的项)。

groupby仅按项分组(通过operator.itemgetter)。辅助函数在max计算过程中每分组调用一次,接收并在内部解包一个组——一个包含两个项的元组(item, iterable),其中可迭代对象的项也是两个项的元组(item, original index) [[SL的项]]。

然后辅助函数使用循环来确定组的可迭代对象中的条目数,而且(最小原始索引);它将返回这些组合的“quality key”,并更改了最小索引符号,因此max操作将考虑原始列表中较早出现的那些项。

如果不太担心在时间和空间上的大o问题,这段代码就会简单得多,例如....:

def most_common(L):
groups = itertools.groupby(sorted(L))
def _auxfun((item, iterable)):
return len(list(iterable)), -L.index(item)
return max(groups, key=_auxfun)[0]

同样的基本思想,只是表达得更简单简洁……但是,唉,额外的O(N)辅助空间(将组的可迭代对象包含到列表中)和O(N平方)时间(获得每个项的L.index)。虽然过早的优化是编程中的万恶之源,但当O(N log N)方法可用时,却故意选择O(N²)方法,这太违背可伸缩性了!-)

最后,对于那些更喜欢“单行程序”而不是清晰和性能的人来说,一个额外的单行程序版本,它的名字被适当地扭曲了:-)。

from itertools import groupby as g
def most_common_oneliner(L):
return max(g(sorted(L)), key=lambda(x, v):(len(list(v)),-L.index(x)))[0]

你可能不再需要这个了,但这是我对一个类似问题所做的。(因为评论,它看起来比实际要长。)

itemList = ['hi', 'hi', 'hello', 'bye']


counter = {}
maxItemCount = 0
for item in itemList:
try:
# Referencing this will cause a KeyError exception
# if it doesn't already exist
counter[item]
# ... meaning if we get this far it didn't happen so
# we'll increment
counter[item] += 1
except KeyError:
# If we got a KeyError we need to create the
# dictionary key
counter[item] = 1


# Keep overwriting maxItemCount with the latest number,
# if it's higher than the existing itemCount
if counter[item] > maxItemCount:
maxItemCount = counter[item]
mostPopularItem = item


print mostPopularItem

借用在这里,可以在Python 2.7中使用:

from collections import Counter


def Most_Common(lst):
data = Counter(lst)
return data.most_common(1)[0][0]

比Alex的解决方案快4-6倍,比newacct提出的一行程序快50倍。

在CPython 3.6+(任何Python 3.7+)上,上面将选择第一个看到的元素。如果你在旧的Python上运行,为了检索列表中第一个出现的元素,你需要进行两次传递来保持顺序:

# Only needed pre-3.6!
def most_common(lst):
data = Counter(lst)
return max(lst, key=data.get)

你想要的在统计中被称为模式,Python当然有一个内置函数来为你做这件事:

>>> from statistics import mode
>>> mode([1, 2, 2, 3, 3, 3, 3, 3, 4, 5, 6, 6, 6])
3

注意,如果没有“最常见元素”;比如前两名打平的情况,这将在Python上引发StatisticsError <=3.7,从3.8开始,它将返回遇到的第一个

def popular(L):
C={}
for a in L:
C[a]=L.count(a)
for b in C.keys():
if C[b]==max(C.values()):
return b
L=[2,3,5,3,6,3,6,3,6,3,7,467,4,7,4]
print popular(L)
 def most_common(lst):
if max([lst.count(i)for i in lst]) == 1:
return False
else:
return max(set(lst), key=lst.count)

我在最近的一个项目中需要这样做。我承认,我无法理解Alex的回答,所以这就是我最后得到的答案。

def mostPopular(l):
mpEl=None
mpIndex=0
mpCount=0
curEl=None
curCount=0
for i, el in sorted(enumerate(l), key=lambda x: (x[1], x[0]), reverse=True):
curCount=curCount+1 if el==curEl else 1
curEl=el
if curCount>mpCount \
or (curCount==mpCount and i<mpIndex):
mpEl=curEl
mpIndex=i
mpCount=curCount
return mpEl, mpCount, mpIndex

我根据Alex的解决方案计时,对于短列表,它要快10-15%,但一旦超过100个或更多元素(测试多达20万个),它就会慢20%。

这是一个很简单的解,时间复杂度是线性的

L =['鹅','鸭','鸭']

def most_common (L):

current_winner = 0
max_repeated = None
for i in L:
amount_times = L.count(i)
if amount_times > current_winner:
current_winner = amount_times
max_repeated = i


return max_repeated

print (most_common (L))

“duck"

number是列表中重复次数最多的元素吗

基于路易斯的回答构建,但满足"在抽取的情况下,应返回索引最低的项"条件:

from statistics import mode, StatisticsError


def most_common(l):
try:
return mode(l)
except StatisticsError as e:
# will only return the first element if no unique mode found
if 'no unique mode' in e.args[0]:
return l[0]
# this is for "StatisticsError: no mode for empty data"
# after calling mode([])
raise

例子:

>>> most_common(['a', 'b', 'b'])
'b'
>>> most_common([1, 2])
1
>>> most_common([])
StatisticsError: no mode for empty data

简单的一行解决方案

moc= max([(lst.count(chr),chr) for chr in set(lst)])

它会返回频率最高的元素。

def mostCommonElement(list):
count = {} // dict holder
max = 0 // keep track of the count by key
result = None // holder when count is greater than max
for i in list:
if i not in count:
count[i] = 1
else:
count[i] += 1
if count[i] > max:
max = count[i]
result = i
return result

mostCommonElement(["a","b","a","c"]) -> "a"

如果没有最低索引的要求,你可以使用collections.Counter来实现:

from collections import Counter


a = [1936, 2401, 2916, 4761, 9216, 9216, 9604, 9801]


c = Counter(a)


print(c.most_common(1)) # the one most common element... 2 would mean the 2 most common
[(9216, 2)] # a set containing the element, and it's count in 'a'

我这样做使用scipy统计模块和lambda:

import scipy.stats
lst = [1,2,3,4,5,6,7,5]
most_freq_val = lambda x: scipy.stats.mode(x)[0][0]
print(most_freq_val(lst))

结果:

 most_freq_val = 5
ans  = [1, 1, 0, 0, 1, 1]
all_ans = {ans.count(ans[i]): ans[i] for i in range(len(ans))}
print(all_ans)
all_ans={4: 1, 2: 0}
max_key = max(all_ans.keys())

4

print(all_ans[max_key])

1

最常见的元素应该是在数组中出现次数超过N/2次的元素,其中Nlen(array)。下面的技术将在O(n)时间复杂度下完成,只消耗O(1)辅助空间。

from collections import Counter


def majorityElement(arr):
majority_elem = Counter(arr)
size = len(arr)
for key, val in majority_elem.items():
if val > size/2:
return key
return -1
#This will return the list sorted by frequency:


def orderByFrequency(list):


listUniqueValues = np.unique(list)
listQty = []
listOrderedByFrequency = []
    

for i in range(len(listUniqueValues)):
listQty.append(list.count(listUniqueValues[i]))
for i in range(len(listQty)):
index_bigger = np.argmax(listQty)
for j in range(listQty[index_bigger]):
listOrderedByFrequency.append(listUniqueValues[index_bigger])
listQty[index_bigger] = -1
return listOrderedByFrequency


#And this will return a list with the most frequent values in a list:


def getMostFrequentValues(list):
    

if (len(list) <= 1):
return list
    

list_most_frequent = []
list_ordered_by_frequency = orderByFrequency(list)
    

list_most_frequent.append(list_ordered_by_frequency[0])
frequency = list_ordered_by_frequency.count(list_ordered_by_frequency[0])
    

index = 0
while(index < len(list_ordered_by_frequency)):
index = index + frequency
        

if(index < len(list_ordered_by_frequency)):
testValue = list_ordered_by_frequency[index]
testValueFrequency = list_ordered_by_frequency.count(testValue)
            

if (testValueFrequency == frequency):
list_most_frequent.append(testValue)
else:
break
    

return list_most_frequent


#tests:
print(getMostFrequentValues([]))
print(getMostFrequentValues([1]))
print(getMostFrequentValues([1,1]))
print(getMostFrequentValues([2,1]))
print(getMostFrequentValues([2,2,1]))
print(getMostFrequentValues([1,2,1,2]))
print(getMostFrequentValues([1,2,1,2,2]))
print(getMostFrequentValues([3,2,3,5,6,3,2,2]))
print(getMostFrequentValues([1,2,2,60,50,3,3,50,3,4,50,4,4,60,60]))


Results:
[]
[1]
[1]
[1, 2]
[2]
[1, 2]
[2]
[2, 3]
[3, 4, 50, 60]
def most_frequent(List):


counter = 0


num = List[0]


 



for i in List:


curr_frequency = List.count(i)


if(curr_frequency> counter):


counter = curr_frequency


num = i




return num




List = [2, 1, 2, 2, 1, 3]


print(most_frequent(List))
numbers = [1, 3, 7, 4, 3, 0, 3, 6, 3]
max_repeat_num = max(numbers, key=numbers.count)     *# which number most* frequently
max_repeat = numbers.count(max_repeat_num)           *#how many times*
print(f" the number {max_repeat_num} is repeated{max_repeat} times")