从一组(类似的)字符串中确定前缀

我有一套绳子。

my_prefix_what_ever
my_prefix_what_so_ever
my_prefix_doesnt_matter

我只是想找到这些字符串中最长的公共部分,这里是前缀。在上面的结果应该是

my_prefix_

弦乐

my_prefix_what_ever
my_prefix_what_so_ever
my_doesnt_matter

应该导致前缀

my_

在 Python 中是否有一种相对简单的方法来确定前缀(而不必手动迭代每个字符) ?

附言: 我正在使用 Python 2.6.3。

48793 次浏览

The following is an working, but probably quite inefficient solution.

a = ["my_prefix_what_ever", "my_prefix_what_so_ever", "my_prefix_doesnt_matter"]
b = zip(*a)
c = [x[0] for x in b if x==(x[0],)*len(x)]
result = "".join(c)

For small sets of strings, the above is no problem at all. But for larger sets, I personally would code another, manual solution that checks each character one after another and stops when there are differences.

Algorithmically, this yields the same procedure, however, one might be able to avoid constructing the list c.

Here's my solution:

a = ["my_prefix_what_ever", "my_prefix_what_so_ever", "my_prefix_doesnt_matter"]


prefix_len = len(a[0])
for x in a[1 : ]:
prefix_len = min(prefix_len, len(x))
while not x.startswith(a[0][ : prefix_len]):
prefix_len -= 1


prefix = a[0][ : prefix_len]

Never rewrite what is provided to you: os.path.commonprefix does exactly this:

Return the longest path prefix (taken character-by-character) that is a prefix of all paths in list. If list is empty, return the empty string (''). Note that this may return invalid paths because it works a character at a time.

For comparison to the other answers, here's the code:

# Return the longest prefix of all list elements.
def commonprefix(m):
"Given a list of pathnames, returns the longest common leading component"
if not m: return ''
s1 = min(m)
s2 = max(m)
for i, c in enumerate(s1):
if c != s2[i]:
return s1[:i]
return s1

Ned Batchelder is probably right. But for the fun of it, here's a more efficient version of phimuemue's answer using itertools.

import itertools


strings = ['my_prefix_what_ever',
'my_prefix_what_so_ever',
'my_prefix_doesnt_matter']


def all_same(x):
return all(x[0] == y for y in x)


char_tuples = itertools.izip(*strings)
prefix_tuples = itertools.takewhile(all_same, char_tuples)
''.join(x[0] for x in prefix_tuples)

As an affront to readability, here's a one-line version :)

>>> from itertools import takewhile, izip
>>> ''.join(c[0] for c in takewhile(lambda x: all(x[0] == y for y in x), izip(*strings)))
'my_prefix_'

Just out of curiosity I figured out yet another way to do this:

def common_prefix(strings):


if len(strings) == 1:#rule out trivial case
return strings[0]


prefix = strings[0]


for string in strings[1:]:
while string[:len(prefix)] != prefix and prefix:
prefix = prefix[:len(prefix)-1]
if not prefix:
break


return prefix


strings = ["my_prefix_what_ever","my_prefix_what_so_ever","my_prefix_doesnt_matter"]


print common_prefix(strings)
#Prints "my_prefix_"

As Ned pointed out it's probably better to use os.path.commonprefix, which is a pretty elegant function.

Here is another way of doing this using OrderedDict with minimal code.

import collections
import itertools


def commonprefix(instrings):
""" Common prefix of a list of input strings using OrderedDict """


d = collections.OrderedDict()


for instring in instrings:
for idx,char in enumerate(instring):
# Make sure index is added into key
d[(char, idx)] = d.get((char,idx), 0) + 1


# Return prefix of keys while value == length(instrings)
return ''.join([k[0] for k in itertools.takewhile(lambda x: d[x] == len(instrings), d)])

The second line of this employs the reduce function on each character in the input strings. It returns a list of N+1 elements where N is length of the shortest input string.

Each element in lot is either (a) the input character, if all input strings match at that position, or (b) None. lot.index(None) is the position of the first None in lot: the length of the common prefix. out is that common prefix.

val = ["axc", "abc", "abc"]
lot = [reduce(lambda a, b: a if a == b else None, x) for x in zip(*val)] + [None]
out = val[0][:lot.index(None)]

Here's a simple clean solution. The idea is to use zip() function to line up all the characters by putting them in a list of 1st characters, list of 2nd characters,...list of nth characters. Then iterate each list to check if they contain only 1 value.

a = ["my_prefix_what_ever", "my_prefix_what_so_ever", "my_prefix_doesnt_matter"]


list = [all(x[i] == x[i+1] for i in range(len(x)-1)) for x in zip(*a)]


print a[0][:list.index(0) if list.count(0) > 0 else len(list)]

output: my_prefix_

I had a slight variation of the problem and google sends me here, so I think it will be useful to document:

I have a list like:

  • my_prefix_what_ever
  • my_prefix_what_so_ever
  • my_prefix_doesnt_matter
  • some_noise
  • some_other_noise

So I would expect my_prefix to be returned. That can be done with:

from collections import Counter


def get_longest_common_prefix(values, min_length):
substrings = [value[0: i-1] for value in values for i in range(min_length, len(value))]
counter = Counter(substrings)
# remove count of 1
counter -= Counter(set(substrings))
return max(counter, key=len)

In one line without using itertools, for no particular reason, although it does iterate through each character:

''.join([z[0] for z in zip(*(list(s) for s in strings)) if all(x==z[0] for x in z)])

Find the common prefix in all words from the given input string, if there is no common prefix print -1

stringList = ['my_prefix_what_ever', 'my_prefix_what_so_ever', 'my_prefix_doesnt_matter']
len2 = len( stringList )
if len2 != 0:
# let shortest word is prefix
prefix = min( stringList )
for i in range( len2 ):
word = stringList[ i ]
len1 = len( prefix )
# slicing each word as lenght of prefix
word = word[ 0:len1 ]
for j in range( len1 ):
# comparing each letter of word and prefix
if word[ j ] != prefix[ j ]:
# if letter does not match slice the prefix
prefix = prefix[ :j ]
break # after getting comman prefix move to next word
if len( prefix ) != 0:
print("common prefix: ",prefix)
else:
print("-1")
else:
print("string List is empty")